PostgreSQL Reconsiders Its Process-Based Model 54

Posted by msmash on Monday June 19, 2023 @01:23PM from the up-next dept.

Jonathan Corbet, writing at LWN: In the fast-moving open-source world, programs can come and go quickly; a tool that has many users today can easily be eclipsed by something better next week. Even in this environment, though, some programs endure for a long time. As an example, consider the PostgreSQL database system, which traces its history back to 1986. Making fundamental changes to a large code base with that much history is never an easy task. As fundamental changes go, moving PostgreSQL away from its process-oriented model is not a small one, but it is one that the project is considering seriously.

A PostgreSQL instance runs as a large set of cooperating processes, including one for each connected client. These processes communicate through a number of shared-memory regions using an elaborate library that enables the creation of complex data structures in a setting where not all processes have the same memory mapped at the same address. This model has served the project well for many years, but the world has changed a lot over the history of this project. As a result, PostgreSQL developers are increasingly thinking that it may be time to make a change.

PostgreSQL Reconsiders Its Process-Based Model

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 54 Comments Log In/Create an Account

Comments Filter:

Missing Key Info in Quote (Score:5, Informative)

by rabbirta ( 10188987 ) writes: on Monday June 19, 2023 @01:31PM (#63615752) Homepage

Ugh. From the article, here's what they're doing instead:
The overhead of cross-process context switches is inherently higher than switching between threads in the same process - and my suspicion is that that overhead will continue to increase.

- Skipped because they're a CS person? (Score:4, Interesting)
  
  by OneOfMany07 ( 4921667 ) writes: on Monday June 19, 2023 @03:08PM (#63616026)
  
  Yes, that is a general reason when separate processes might be a bad design choice where you have a lot of data being passed around. But it also keeps some things separate from each other. Makes it impossible to cheat in certain ways.
  Processes share an address space. Meaning all the data in RAM that one thread can access is also physically possible for other threads to access. Separate processes should prevent that, even by mistake.
  I assume data would go from untrusted (you'd want separate processes and higher security) to usable (reviewed or created from a trusted source). And there should be a good design pattern for that situation. I don't happen to know it, but... this isn't a rare situation in software.
  
  - Re: (Score:2)
    
    by greytree ( 7124971 ) writes:
    
    "Processes share an address space."
    
    I think you mean "Threads in a process share a single address space." ?
    - - Re: (Score:1)
        
        by Tablizer ( 95088 ) writes:
        
        If it were a big problem, wouldn't a new chip architecture sell well?
- Re: (Score:2)
  
  by crunchygranola ( 1954152 ) writes:
  
  Right. The original article summary just laments the problem, but does not mention the proposed solution and the /. TFS just copies it.
  - Re: (Score:2)
    
    by arglebargle_xiv ( 2212710 ) writes:
    
    The proposed solution isn't really a solution though. The problem is that they've got close to four decades of code built around the process-based model, along with huge amounts of third-party addons and modules, that would all need to be extensively rewritten. We have a code base that's about ten years newer than PostgreSQL and started in the Windows world so it was thread-based rather than process-based, meaning we were lucky and backed the right horse by accident, and I can't even begin to imagine what
About time (Score:1)

by Anonymous Coward writes:

I like PostgresSQL but there are a couple things they're stubborn about because they're so incredibly conservative when it comes to database design. No index hinting is annoyingly paternalistic.
At least some of those annoying things can be circumvented by using modules (for example, the lack of a single-byte int type is provided by pguint)
Glad to see they're catching up with these newfangled "threads"
- Re: (Score:2)
  
  by arglebargle_xiv ( 2212710 ) writes:
  
  I can see why they'd be nervous about adding that, index hinting is a giant footgun in that in almost all cases the query optimiser does a much better job than you can at index selection, and if you do provide hints and then anything at all in the schema or indexing changes you're pessimising the query without even being aware of it. It's like C's 'register' keyword, it made sense with primitive compilers in the 1970s but hasn't really been needed for decades, which is why pretty much all compilers have ig
They need to get with the times (Score:5, Funny)

by Waffle Iron ( 339739 ) writes: on Monday June 19, 2023 @01:49PM (#63615796)

For each individual SQL query, a new Docker container should be launched with a fresh web server to handle the request. Javascript on the server will convert the request to REST calls to a pool of backend VMs. These will then convert all of the data to YAML, which will also be the new on-disk format. This all needs to run exclusively in the cloud.

- Re:They need to get with the times (Score:5, Funny)
  
  by rlwinm ( 6158720 ) writes: on Monday June 19, 2023 @01:51PM (#63615806)
  
  I'm angry you didn't have Rust anywhere in your stack. Maybe compile Rust to WASM and run it under Node.JS? I like the YAML aspect but you definitely want some XML in there to make old-timer Java guys happy.
  
  - you will burn in a lake of hellfire, and like it (Score:1)
    
    by Anonymous Coward writes:
    
    Shame on both of you.
    - Re: (Score:3, Insightful)
      
      by rlwinm ( 6158720 ) writes:
      
      Did you at least get a chuckle out of it? I did from the OP. We all know people who actually think like that.
    - Re: (Score:2)
      
      by Mr. Barky ( 152560 ) writes:
      
      I assume you meant a data lake of hellfire. We can’t forget that in our new architecture. And of course we’ll like it. While we’re at it, we’ll just make Postgres into a NoSQL database.
  - Re:They need to get with the times (Score:5, Funny)
    
    by Junta ( 36770 ) writes: on Monday June 19, 2023 @02:01PM (#63615840)
    
    Stop accurately describing my employer's software architecture!
    
  - Re: (Score:3)
    
    by CAIMLAS ( 41445 ) writes:
    
    Have we worked for some of the same companies?
  - Re: (Score:2)
    
    by SuperKendall ( 25149 ) writes:
    
    I'm angry you didn't have Rust anywhere in your stack. Maybe compile Rust to WASM and run it under Node.JS?
    That sounds non-performant, unless your Node.JS instance was written in Go.
    - Re: (Score:1)
      
      by rlwinm ( 6158720 ) writes:
      
      That sounds non-performant, unless your Node.JS instance was written in Go.
      
      Excellent point! We definitely should port V8 to Go. That will be sure this architecture is both clean and performant! It's scary how long this can go -- I feel that many of us have encountered this sort of thing at work.
      
      There needs to be a DailyWTF comic strip or something.
    - - Re: (Score:1)
        
        by rlwinm ( 6158720 ) writes:
        
        You are really bad at humor and sarcasm, eh? We all realize it's a joke. Each addition just makes it funnier.
  - Re:They need to get with the times (Score:4, Funny)
    
    by thegarbz ( 1787294 ) writes: on Monday June 19, 2023 @05:01PM (#63616344)
    
    As a CEO myself I find this conversation deeply arousing. Can you provide contact details so we may retain your consulting services?
    
- Re: (Score:3)
  
  by andrewbaldwin ( 442273 ) writes:
  
  I'm shocked! shocked I tell you!
  You haven't got a blockchain running anywhere !
  And I'm assuming that you're mounting this in a J2EE environment with (n+1) layers of almost empty files, XML definitions and multiple empty directories.
  - Re: (Score:2)
    
    by arglebargle_xiv ( 2212710 ) writes:
    
    There's also no Excel spreadsheet embedded in a Word document there. Apparently Excel makes a great database, you could use that as a backend to PostgreSQL. It's also threaded, so that solves that problem too.
- It's the new new thing (Score:3)
  
  by SuperKendall ( 25149 ) writes:
  
  For each individual SQL query, a new Docker container should be launched
  Yes it's the hot new server tech called "macro services"
  - Re: (Score:2)
    
    by airport76 ( 7682176 ) writes:
    
    Yes it's the hot new server tech called "macro services"
    It looks more like "server full" code to me.
    - Great followup! (Score:1)
      
      by SuperKendall ( 25149 ) writes:
      
      It looks more like "server full" code to me.
      Even though this comes late enough few will ever see it, thought I'd let you know *I* appreciate the humor. :-)
- Re: (Score:2)
  
  by The_Other_Kelly ( 44440 ) writes:
  
  But where is the Event DB?
  - Re: (Score:1)
    
    by rlwinm ( 6158720 ) writes:
    
    We can store the EventDB binary in Mongo and stream it out every time we spin up the docker container!
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  Nah, each request should be queued and handled by a lambda. Then it can be cloud-native.
- PostgreSQL de-microservicing? (Score:1)
  
  by weberjn ( 771517 ) writes:
  
  Do the god's of architecture allow that?
Not a good idea (Score:5, Interesting)

by rlwinm ( 6158720 ) writes: on Monday June 19, 2023 @01:50PM (#63615800)

Tom Lane made a few arguments in the past as to why the process model worked for them. While there is some additional overhead in terms of TLB thrashing on a process context switch it's not nearly a big pain point in Postgres. Plus the PG code base is not reentrant in many places.

If you want to fix some warts in Postgres efforts would be better spent by:

(1) Using io_uring or direct I/O rather than double buffering everything in the page cache.
(2) Handling correlated columns better so the planner is smarter about certain kinds of joins.
(3) Fixing the terrible plans that come out of queries with OR-clauses and some star-queries.
(4) Reducing write amplification in indexes if possible, ala HOT updates.

- Re: (Score:2)
  
  by nester ( 14407 ) writes:
  
  Doesn't x86 have address-space #s now? (Like other arches have had for decades.) If the working set for the TLB is close to its total size, then it wouldn't help much, but if it's a problem of TLB-invalidate on context switch, address space numbers allow the TLB and other virtually addressed caches to avoid flushing entries that might hit again when context is switched back.
  - Re: (Score:1)
    
    by rlwinm ( 6158720 ) writes:
    
    A typical Postgres process has a large number of shared page tables with other Postgres processes (the buffer cache and code pages). While I was quite a CPU architecture buff I haven't kept up with the changes to x86 for a long time - so this may very well help once it filters down to production equipment.
- Is that only on recent x86 CPU's? (Score:2)
  
  by OneOfMany07 ( 4921667 ) writes:
  
  I know a lot of features in modern CPU's is to fix issues like this. And that ARM compatible CPU's are not as high performing.
  Would this situation change on a simpler CPU? Say with a smaller TLB and cache sizes in general.
  There are companies making 100's of ARM compatible cores (per socket? board?) for cloud and server companies. Having software that works well on a range of devices is useful.
  And maybe there are other issues with the processes, like making debugging harder? Or scaring away new open so
  - Re: (Score:1)
    
    by rlwinm ( 6158720 ) writes:
    
    TLB churn is something that is also system dependent. A modern server is quite complex and modeling performance is difficult. The reality is that if you wanted to make HW improvements for workloads like Postgres then do what you can to streamline I/O.
    
    Almost no real-world Postgres databases fit in RAM - and because of ACID, even with synchronous_commit off, you will be doing writes unless you have a read-only database. So worrying about RAM (i.e. TLB) isn't as likely to have a good payoff as optimizing th
  - Re: (Score:2)
    
    by edwdig ( 47888 ) writes:
    
    I know a lot of features in modern CPU's is to fix issues like this. And that ARM compatible CPU's are not as high performing.
    Would this situation change on a simpler CPU? Say with a smaller TLB and cache sizes in general.
    On any database where the performance really matters, you're going to have far more data than RAM. You'll be churning through tons of data on a big query, so you'll be thrashing the TLB and cache regardless.
    I think the bigger issue is that Postgres will try to share data between processes when possible. Doing that adds a lot of overhead. You can't use direct pointers to the data, as the memory will be mapped to different addresses in each process. All your data accesses will have a layer of indirection adde
- Re: Not a good idea (Score:1)
  
  by elgholm ( 2746939 ) writes:
  
  Ohhh... They read/write through the standard OS fs-libraries, thereby - as you write - hit that cache as well as their own? Sounds like something they should fix a.s.a.p
  . and shouldn't be that hard to implement. Libraries for Direct I/O, or Asynch I/O, bypassing the OS fs, exists almost everywhere nowadays, right?
  I was thinking of switching to PostgreSQL, but I understand I need to investigate this a bit further. ðY'
  - Re: (Score:2)
    
    by rlwinm ( 6158720 ) writes:
    
    So they have been talking about moving to direct I/O for a while. It's a bit trickier than it sounds once you take concurrency in mind. I have been using Postgres for real-world systems and am quite happy with it.
    
    PostgreSQL gets so many things right. Even updating the system catalog is ACID, that's nice. PostgreSQL also handles dates well. In fact it handles many data types really, really well - far better than MySQL.
    
    I also have found that when you start piling on the load (real world OLTP) Postgres h
    - Re: (Score:2)
      
      by StormReaver ( 59959 ) writes:
      
      If your queries are dominated by primary key lookups MySQL will probably win.
      Why would you think that? My tables have hundreds of millions of rows, and I can construct a query that does direct primary key lookups for multiple columns (they usually have nine or ten columns) using subqueries, and pull in targeted results in three to five milliseconds.
      - Re: (Score:1)
        
        by rlwinm ( 6158720 ) writes:
        
        I meant "queries" in the more general sense. MySQL doesn't have quite the write overhead that Postgres does for updates. Then again, MySQL doesn't have good compliance with the SQL standard either. At least from the last time I checked its secondary indexes don't use a ctid type structure but rather point to the primary key of the row.
        
        This means for updates on tables with lots of secondary indexes (sometimes you just need those) you get far more writes in Postgres than MySQL.
        
        In case my posts here didn
A quick summary and extrapolation (Score:5, Insightful)

by CAIMLAS ( 41445 ) writes: on Monday June 19, 2023 @02:06PM (#63615866)

The crux of the issue is that postgres is using a more expensive multi-process model instead of using kernel threading to communicate, duplicating both memory resources used by each process/connection but also requiring a non-trivial amount of IPC which would not otherwise happen with lighter threads.
Some OSes have a concept called "lightweight threads" which sort of helps with this condition. I'm somewhat surprised postgres still uses this model, honestly - it's exceedingly long in the tooth. What I've seen typically is, even with applications which have this kind of model they'll use multiple worker threads within each process, at least, to spread resource use across cores more effectively.
For reference, the Linux kernel treats both threads and processes roughly the same (as "tasks"), and the fundamental problem is simply in how postgres communicates.
From kernel docs -
"Linux uses a different implementation for threads. The basic unit is called a task (hence the struct task_struct) and it is used for both tasks and processes. Instead of embedding resources in the task structure it has pointers to these resources.
Thus, if two threads are the same process will point to the same resource structure instance. If two threads are in different processes they will point to different resource structure instances."
https://linux-kernel-labs.github.io/refs/heads/master/lectures/processes.html
Effectively, this has interesting implications for administration, as each thread has the same kind of limit a process would normally (eg. open file limits, for instance). You can get a better accounting of what's actually running on your system by using `ps -eLf` - and of course, /proc/ contains all the appropriate info as well.

- Re: (Score:2)
  
  by rlwinm ( 6158720 ) writes:
  
  Actually the memory overhead is pretty much close to equal. Postgres processes fork from a common parent which preloaded lots of data. Everything else is done in a shared buffer pool. As fork() has copy-on-write semantics once a server process has run for a while the number of COW page faults flat lines.
  
  And the typical use case of Postgres is to do some sort of connection pooling. Either externally, like PGbouncer, or within your application code (by maintaining a pool of connections at the ready).
  - Re: (Score:2)
    
    by CAIMLAS ( 41445 ) writes:
    
    All those things might be true in the specific context to postgres, which would be curious - because it's definitely not the norm for that to be the case with process vs thread models.
    The IPC overhead, though... that's definitely an added expense, and is going to get more expensive/less performant disproportionate to the work being done as load increases, particularly with our ever-expanding many-cored NUMA systems.
    - Re: (Score:2)
      
      by rlwinm ( 6158720 ) writes:
      
      So the IPC overhead is pretty low as most of it involves data in shared memory. The way Postgres works is when it starts the postmaster process will setup some core data structures and then allocate (via mmap, typically) the shared buffers. The interesting bit is that the shared buffers are in a shared VMA. When postmaster forks to handle a connection those mappings to the shared memory are kept in the child process. The child process can read/write the buffer cache without much IPC. It also has locks, obvi
- Re: (Score:2)
  
  by crunchygranola ( 1954152 ) writes:
  
  I'm somewhat surprised postgres still uses this model, honestly - it's exceedingly long in the tooth.
  After reading the whole article the issue seems to me fundamentally a project management and staffing one. Sure you can posit rewriting the process/memory management using threads in shared memory like you probably would if writing a new server from scratch, but how to not break stuff right and left (2000 gloal variables, etc.) and still support all old installs and have regular releases with new features?
  A fork of the current system, which continues to be maintained, while the new version is developed and
  - Re: (Score:1)
    
    by rlwinm ( 6158720 ) writes:
    
    I am not sure what problem moving to threads will solve? The only case I can think of is in terms of parallel query - you would shuffle a little less data between workers. To be honest I turn parallel query off to get more predictable utilization. Most of my queries are short lived and don't have enough tuples to make parallel query all that useful.
    
    The performance gains are likely to be quite modest. I say this as someone pretty familiar with the Postgres code base. I'm certainly a fan of threading in ma
Way over my head (Score:2)

by DarkRookie2 ( 5551422 ) writes:

This is way over my head
Go the other way (Score:2)

by Pravetz-82 ( 1259458 ) writes:

I would like to see them go the other way instead. Instead of ditching the process model in favor of threads, I would like to see this process model expanded to work over the network. Several hosts glued together and behaving as one giant database.
Maybe this has to happen at the Linux kernel level, however, I am not quite sure.
But does it have web scale? (Score:2)

by UnknownSoldier ( 67820 ) writes:

That's good and all but the important question is:
Does it have web scale? [youtube.com] =P
Since when is the FOSS world "fast moving"? (Score:2)

by gweihir ( 88907 ) writes:

As it is not about getting rich on some hype before the victims/customers notice what something actually can and cannot do, FISS is typically moing at a rather sedate phase and that is how it should be for all solid engineering.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Missing Key Info in Quote (Score:5, Informative)

Skipped because they're a CS person? (Score:4, Interesting)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

About time (Score:1)

Re: (Score:2)

They need to get with the times (Score:5, Funny)

Re:They need to get with the times (Score:5, Funny)

you will burn in a lake of hellfire, and like it (Score:1)

Re: (Score:3, Insightful)

Re: (Score:2)

Re:They need to get with the times (Score:5, Funny)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re:They need to get with the times (Score:4, Funny)

Re: (Score:3)

Re: (Score:2)

It's the new new thing (Score:3)

Re: (Score:2)

Great followup! (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

PostgreSQL de-microservicing? (Score:1)

Not a good idea (Score:5, Interesting)

Re: (Score:2)

Re: (Score:1)

Is that only on recent x86 CPU's? (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: Not a good idea (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

A quick summary and extrapolation (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Way over my head (Score:2)

Go the other way (Score:2)

But does it have web scale? (Score:2)

Since when is the FOSS world "fast moving"? (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals