How's that for saving memory?

PostgreSQL on Windows has suffered from running out of "Desktop Heap" since day one. Things got worse with 8.3beta1, but Dave got that fixed for beta2. However, someone on the list actually managed to run into a different memory scalability problem - running out of address space in the postmaster.

So how could this happen? Because there were too many threads running. Hold on, you think, PostgreSQL isn't threaded?! Which is correct - PostgreSQL on Windows uses the same process based model as it does on Unix. The reason for this is to make sure that the database works the same way on all platforms, which cuts down platform-specific bugs and makes maintaining easier. However, a couple of win32 specific functions are implemented as separate threads. In the postmaster, for example, we start one thread for each child process spawned. The only purpose of this thread is to wait until the child process dies, and then fire off a SIGCHLD signal - since Windows doesn't provide this functionality built in like Unix does.

So with a lot of backends, this leaves us with a whole lot of threads running in the postmaster that doesn't really do anything. Since they don't do anything, this method was chosen originally for 8.0 with the motivation that the code is trivial and that the overhead of starting a backend process is so big that the starting of a secondary thread won't be noticed at all. Which is correct in all aspects except for one - the thread will require 4Mb of address space for it's stack. Most of this is never put into real memory (only the required initial 64Kb are), but it counts against the address space and the commit charge. This gives two possible resources that can run out - room in the pagefile (for the commit charge) and the address space in the process (max 2Gb memory can be addressed by a 32-bit windows process by default). And with 4Mb / thread, you get up there fairly quickly - something nobody thought about at the time.

So for 8.3beta2, I have rewritten the handling of SIGCHLD from scratch. Instead of creating a thread for each backend, we now use an operating system thread pool to do the waiting, and then queue up the child process exists on an I/O completion port. The second part of switching to using an I/O completion port actually simplified and reduced the amount of code necessary pretty substantially, which is always nice.

So did this help? Some pgbench figures from Dave shows:

Without patch:
VM: 1,322,792K
Idle threads: 6     
Peak threads: 306   


With patch:
VM: 98,088K     
Idle threads: 3 
Peak threads: 7

So yes, it makes a huge difference. Yay! Apart from fixing the address space issue, this should also decrease the pagefile requirements for any environments with a lot of backends. You're still going to see significantly worse performance than on Unix with lots of backends - shared memory access is still slow, etc. But it does represent a fairly big step forward.

Even the fact that mingw is once again missing the required functions thus causing breakage on the buildfarm makes that difference smaller. And a patch is in the works to fix that...

Thanks to Trevor and Dave for helping me track this down and test the resulting patch.


Add comment

New comments can no longer be posted on this entry.

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

PGDay Chicago 2024
Apr 26, 2024
Chicago, USA
PGConf.DEV 2024
May 28-31, 2024
Vancouver, Canada

Past

SCaLE 2024
Mar 14-17, 2024
Pasadena, USA
Nordic PGDay 2024
Mar 12, 2024
Oslo, Norway
FOSDEM PGDay 2024
Feb 2-4, 2024
Brussels, Belgium
PGConf.EU 2023
Dec 12-15, 2023
Prague, Czechia
PGConf.NYC 2023
Oct 3-5, 2023
New York, USA
More past conferences