I think I (we) may have finally nailed the stats issue. The symptoms were the stats collector stopping to process data under load (this is different from the old stats-collector crashes under load that we had back in early 8.0). Since the latest changes regarding autovacuum made the probability for it happening around 100%25 when running the parallel regression tests, debugging was suddenly a lot easier.
Turns out there was a big problem in the pgwin32_select() emulation code (src/backend/port/win32/socket.c for the interested), that simply had it stop telling the caller that there was data available on the socket when it was under high load. And it only happens for UDP sockets (stats collector uses UDP because it's designed to drop packets under really heavy load in order not to slow down the actual database work).
pgwin32_select() still isn't fixed, but with the current architecture for the collector it's not needed anymore. Originally, when we had both a collector and a bufferer, the same code had to look at two sockets and needed it. Now it doesn't need to look at more than one socket, so we can use pgwin32_waitforsinglesocket(), which doesn't have this problem (that I've been able to tell, at least).
It's been applied to 8.2 and HEAD, and I'll hopefully have time to see if it is an easy backport to 8.1 sometime tomorrow. Per discussions on the list, we are not likely to backport it to 8.0 at all.
Hopefully, I'll be able to get rid of the hack that is pgwin32_select() altogether soon - with this change there is only one place left that uses it, and I'm going to look at replacing that with a proper native implementation as well.
New comments can no longer be posted on this entry.