Todays insanely funny quote

Got this from Stefan:

(21:49:25) Stefan Kaltenbrunner: my niece (13 year old) just asked me what I'm doing right now

(21:49:56) Stefan Kaltenbrunner: I answered "well working for a project I'm part of" - look at for more information

(21:50:19) Stefan Kaltenbrunner: and she came back with "ok so this is a website about an elephant called fujitsu?"

People everywhere

Third and final day of SolutionsLinux 2007 today. There's interesting people all over to meet yet again - and no, we haven't exhausted the supply yet.

As planned, Greg Heikki from EDB showed up - they actually managed to get here before me and Devrim, even though we weren't that late. Nice to meet Greg again, and I've never med Heikki before. But they're both good pg hackers, and when you meet such people, interesting discussions are always going to happen.

Had a nice chat with a guy from Ingres, and also with some of the guys from fon. Jean-Paul led the track of OSS databases in the "commercial talks" before lunch. I'm told there was a very good talk on migration to PostgreSQL, but I didn't even bother pretending to go there since it was all in French.

And I got to do the good deed of the day fairly early - taught the salesguy from Continuent that there is no such thing as Postgre. There are many options for how to say it, but that's not one of them. And since he's selling a product based on it, he should know. Which he now does...

(Oh, Devrim - when will you fix the date problem on planet?)

More meetups happening tomorrow

So it looks like we'll have even more international PostgreSQL people showing up for SolutionsLinux tomorrow - Greg Stark and Heikki Linnakangas of EnterpriseDB UK should be showing up in the morning and hang out for a day. Not sure how long they'll be around in the evening, but it'll be nice getting a chance to meet them again.

And yes, we're back at the beer-bar with free WiFi - it helps when you don't have any in the hotel. The PostgreSQLFR guys just had their yearly meeting to approve next years plans and things like that. Unfortunately it was all in French, so I can't really comment on the contents.

And yes, this is very blog-intensive times. But I blame it all on Devrim, and maybe a little on the great elephant-cookies served in our booth.

Oh yeah, and Gavin - thanks, man. I'm sure you know what I mean. If not, you will soon enough...


So the talk is over since a couple of hours. I survived. Wasn't all that many people there, and they appeared to deal fairly well with the fact that I didn't speak French. Nowhere near as many as the Mozilla guy had later, but hey, everybody needs a browser and not everybody needs a proper database.

Had some big issues with the microphone/speaker system (thanks Devrim for fixing some of them), and given that I was entirely unprepared (last night I looked at my slides for 5 minutes, before that it was almost a year since I saw it) I'm pretty happy with how it went off.

Jean-Paul had a talk at the Mandriva booth, which was probably a lot harder - he had people walking around in the booth doing other things the whole time. OTOH, he got a proper working microphone, so he shouldn't be complaining /mha/templates/default/img/emoticons/wink.png alt=";-)" style="display: inline; vertical-align: bottom;" class="emoticon" /

Oh, and they ran out of water (!) in the speakers/exhibitors bar. Not very good before you have to do a talk, but it worked out. Being French, I guess they stocked up on wine and champagne instead (yes, they did. Not really used to that...) Oh yeah, and they ran out of glasses too, but they still had espresso-cups...

Speaking in Paris

So I'm at SolutionsLinux in Paris with Devrim and Jean-Paul and the other guys from PostgreSQLFR. Turns out that Jean-Paul had scheduled me for a talk. We had talked about it loosely, but I didn't really realize it was actually going to happen. Seems it is.

So if you're here, there's going to be a pg talk at 11:30 tomorrow (wednesday). AFAIK there is only one room for talks in them, and it's right next to the postgresqlfr booth. I'm going to do a repeat of my "The PostgreSQL Advantage" talk from last years linuxforum, since I really haven't prepared anything. Hopefully it'll work even without preparation, otherwise I'll just blame the language barrier...

And if you're around and not interested in the talk, just pop over to the postgresqlfr booth (we're at E44) and have a chat. And try the little elephant-cakes, they're great!

As for the show itself, there are a lot of people here. Very interesting. As often, much more people in the non-profit area than around the companies. Of the open source databases, mysql and ingres are both represented here (by their respective companies, not by community people from what I can tell), but no sign of firebird.

Network connection is pretty crap (only wired, very slow, and only http+ssh (we negotiated ourselves to get https as well), no vpn, no imap and such things). But then we're not here to surf the net either, we're here to talk to people about PostgreSQL.

Nailed the stats problem?

I think I (we) may have finally nailed the stats issue. The symptoms were the stats collector stopping to process data under load (this is different from the old stats-collector crashes under load that we had back in early 8.0). Since the latest changes regarding autovacuum made the probability for it happening around 100%25 when running the parallel regression tests, debugging was suddenly a lot easier.

Turns out there was a big problem in the pgwin32_select() emulation code (src/backend/port/win32/socket.c for the interested), that simply had it stop telling the caller that there was data available on the socket when it was under high load. And it only happens for UDP sockets (stats collector uses UDP because it's designed to drop packets under really heavy load in order not to slow down the actual database work).

pgwin32_select() still isn't fixed, but with the current architecture for the collector it's not needed anymore. Originally, when we had both a collector and a bufferer, the same code had to look at two sockets and needed it. Now it doesn't need to look at more than one socket, so we can use pgwin32_waitforsinglesocket(), which doesn't have this problem (that I've been able to tell, at least).

It's been applied to 8.2 and HEAD, and I'll hopefully have time to see if it is an easy backport to 8.1 sometime tomorrow. Per discussions on the list, we are not likely to backport it to 8.0 at all.

Hopefully, I'll be able to get rid of the hack that is pgwin32_select() altogether soon - with this change there is only one place left that uses it, and I'm going to look at replacing that with a proper native implementation as well.

vcbuild progressing nicely

Some of you may have the impression that 8.2 already builds with Visual C++. And it does. Just not quite all of it, and with a lot of rough edges. Hopefully, 8.3 will be much nicer in that regard. The current goal is to get it up on the buildfarm as soon as possible to make it a "proper platform". Unfortunately, I don't know the buildfarm code well enough to do that part myself without a lot of learning to do first.

It's not that far off. There's still one fairly large patch to the regression tests themselves to make it work (or rather, to the regression test execution program), but the really big parts of it are in. Then it needs some glue to make it run nicer. Having to remember a three lines long command line is not very nice...

There have been plenty of small fixes to make it work in environments that aren't identical to mine, and I know there is at least one more pending. And I just submitted a patch to clean up most of the compiler warnings created (because VC++ sees other warnings than gcc and friends).

Bottom line - it's now definitely supposed to be in a state where others can try it out. So if you are interested and have the time, more testers are definitly needed. You need VC++ 2005 (Express should work fine) and current CVS-HEAD of PostgreSQL. All the vc stuff is in src/tools/msvc, see the README file.

Time to move to NZ?

Per a news report I noticed today, it seems New Zealand's Electoral Enrollment Centre has "consolidated the country's electoral roll on the open source PostgreSQL database". Sounds like an interesting project, and a very good use for an Open Source database in general and PostgreSQL in particular.

It's not a particularly huge database, but it's distributed. And they're even running it on 7.4 (though they plan to upgrade) which certainly means they're missing out on a lot of things in 8.x, so they'll likely be even more pleased in the future.

And it's a matter of principle when it comes to sensitive data belonging to the public /mha/templates/default/img/emoticons/wink.png alt=";-)" style="display: inline; vertical-align: bottom;" class="emoticon" / Dealing with it using open source software just makes a lot of sense (even more than just the monetary savings and the very stable platform that you get). Now I just wish something similar would happen here... statistics

Now that the new GIN/tsearch2 based search engine has been running on it's time for a bit of statistics of the searches being performed. The following statistics are pulled from the most recent 99,733 searches performed. Note that these statistics are taken pre-lexing, so searches for different variants of the same word count as different words.

A total of 32,810 unique search terms were submitted, meaning that each search term was used on an average only 3 times.

4930 of the searches resulted in stepping pas the first page of results, meaning that 95%25 of all searchers only looked at the first page of results.

17 searches stepped all the way to the 50th page (the maximum number of pages shown). My guess is that all of these were people testing and not actually looking for the hits there.

The 25 most common search terms, and the number of searches for them, were:

foobar 3873
search 1796
alter table 732
create table 588
jdbc 517
select 508
update 507
insert 475
copy 463
pg_dump 458
odbc 386
sequence 381
vacuum 368
date 361
cast 350
case 349
replication 330
grant 320
psql 318
join 296
create user 275
between 266
tsearch2 264
timestamp 260
cluster 250

The first two are certainly very interesting, with foobar representing almost 4%25 of the searches. You may wonder why this is, and it has a very logical explanation - that's the nagios system that monitors the servers functionality...

The second one is explained by that being what you get if you hit "search" on the frontpage without typing anything in the search box first.

The good news is that system load has remained vastly below what it was with the old search engine. Most of the time, you can't see anything on the server at all, and the load avg is 0.02/0.01/0.00 right now (my stats processing being what brought it so far up).

Rebuilding large GIN indexes

takes time. A lot of time. With rebuilding, I mean when you for example have to UPDATE all your tsvector fields due to a change in tsearch2 configuration. (Yes, I had to do that because I had a slightly incorrect tsearch2 configuration for the archives search database). So don't do it. Instead use the fact that PostgreSQL has nice transactional DDL and do something like this: BEGIN TRANSACTION;

CREATE TABLE messages_new AS SELECT id,txt,to_tsvector(txt) AS fti

FROM messages;

CREATE INDEX messages_new_fti ON messages_new USING gin(fti);

ANALYZE messags_new;

ALTER TABLE messages RENAME TO messages_old;

ALTER TABLE messages_new RENAME TO messages;


DROP TABLE messages_old;

ALTER INDEX messages_new_fti RENAME TO messages_fti; ``` (apologies for any typos, I didn't bother to actually type these commands into the database again, and I lost my cut-and-paste of what I ran)

This way, the messages table can still serve up searches without any disruption to the searches at all. And creating the new index is a lot faster than updating the existing one if you have to touch all rows.


I speak at and organize conferences around Open Source in general and PostgreSQL in particular.



Stockholm PUG Oct 2020
Oct 27, 2020
Stockholm/Online, Sweden
Percona Live
Oct 21, 2020
Online, Online
Warsaw User Group
Jun 29, 2020
Virtual, Virtual
Postgres Vision
Jun 23-24, 2020
Online, Virtual
PGCon 2020
May 26-29, 2020
Online, Virtual
More past conferences