So the talk is over since a couple of hours. I survived. Wasn't all that many people there, and they appeared to deal fairly well with the fact that I didn't speak French. Nowhere near as many as the Mozilla guy had later, but hey, everybody needs a browser and not everybody needs a proper database.

Had some big issues with the microphone/speaker system (thanks Devrim for fixing some of them), and given that I was entirely unprepared (last night I looked at my slides for 5 minutes, before that it was almost a year since I saw it) I'm pretty happy with how it went off.

Jean-Paul had a talk at the Mandriva booth, which was probably a lot harder - he had people walking around in the booth doing other things the whole time. OTOH, he got a proper working microphone, so he shouldn't be complaining /mha/templates/default/img/emoticons/wink.png alt=";-)" style="display: inline; vertical-align: bottom;" class="emoticon" /

Oh, and they ran out of water (!) in the speakers/exhibitors bar. Not very good before you have to do a talk, but it worked out. Being French, I guess they stocked up on wine and champagne instead (yes, they did. Not really used to that...) Oh yeah, and they ran out of glasses too, but they still had espresso-cups...

Speaking in Paris

So I'm at SolutionsLinux in Paris with Devrim and Jean-Paul and the other guys from PostgreSQLFR. Turns out that Jean-Paul had scheduled me for a talk. We had talked about it loosely, but I didn't really realize it was actually going to happen. Seems it is.

So if you're here, there's going to be a pg talk at 11:30 tomorrow (wednesday). AFAIK there is only one room for talks in them, and it's right next to the postgresqlfr booth. I'm going to do a repeat of my "The PostgreSQL Advantage" talk from last years linuxforum, since I really haven't prepared anything. Hopefully it'll work even without preparation, otherwise I'll just blame the language barrier...

And if you're around and not interested in the talk, just pop over to the postgresqlfr booth (we're at E44) and have a chat. And try the little elephant-cakes, they're great!

As for the show itself, there are [HTML_REMOVED]a lot[HTML_REMOVED] of people here. Very interesting. As often, much more people in the non-profit area than around the companies. Of the open source databases, mysql and ingres are both represented here (by their respective companies, not by community people from what I can tell), but no sign of firebird.

Network connection is pretty crap (only wired, very slow, and only http+ssh (we negotiated ourselves to get https as well), no vpn, no imap and such things). But then we're not here to surf the net either, we're here to talk to people about PostgreSQL.

Nailed the stats problem?

I think I (we) may have finally nailed the stats issue. The symptoms were the stats collector stopping to process data under load (this is different from the old stats-collector crashes under load that we had back in early 8.0). Since the latest changes regarding autovacuum made the probability for it happening around 100%25 when running the parallel regression tests, debugging was suddenly a lot easier.

Turns out there was a big problem in the pgwin32_select() emulation code (src/backend/port/win32/socket.c for the interested), that simply had it stop telling the caller that there was data available on the socket when it was under high load. And it only happens for UDP sockets (stats collector uses UDP because it's designed to drop packets under really heavy load in order not to slow down the actual database work).

pgwin32_select() still isn't fixed, but with the current architecture for the collector it's not needed anymore. Originally, when we had both a collector and a bufferer, the same code had to look at two sockets and needed it. Now it doesn't need to look at more than one socket, so we can use pgwin32_waitforsinglesocket(), which doesn't have this problem (that I've been able to tell, at least).

It's been applied to 8.2 and HEAD, and I'll hopefully have time to see if it is an easy backport to 8.1 sometime tomorrow. Per discussions on the list, we are not likely to backport it to 8.0 at all.

Hopefully, I'll be able to get rid of the hack that is pgwin32_select() altogether soon - with this change there is only one place left that uses it, and I'm going to look at replacing that with a proper native implementation as well.

vcbuild progressing nicely

Some of you may have the impression that 8.2 already builds with Visual C++. And it does. Just not quite all of it, and with a lot of rough edges. Hopefully, 8.3 will be much nicer in that regard. The current goal is to get it up on the buildfarm as soon as possible to make it a "proper platform". Unfortunately, I don't know the buildfarm code well enough to do that part myself without a lot of learning to do first.

It's not that far off. There's still one fairly large patch to the regression tests themselves to make it work (or rather, to the regression test execution program), but the really big parts of it are in. Then it needs some glue to make it run nicer. Having to remember a three lines long command line is not very nice...

There have been plenty of small fixes to make it work in environments that aren't identical to mine, and I know there is at least one more pending. And I just submitted a patch to clean up most of the compiler warnings created (because VC++ sees other warnings than gcc and friends).

Bottom line - it's now definitely supposed to be in a state where others can try it out. So if you are interested and have the time, more testers are definitly needed. You need VC++ 2005 (Express should work fine) and current CVS-HEAD of PostgreSQL. All the vc stuff is in src/tools/msvc, see the README file.

Time to move to NZ?

Per a news report I noticed today, it seems New Zealand's Electoral Enrollment Centre has "consolidated the country's electoral roll on the open source PostgreSQL database". Sounds like an interesting project, and a very good use for an Open Source database in general and PostgreSQL in particular.

It's not a particularly huge database, but it's distributed. And they're even running it on 7.4 (though they plan to upgrade) which certainly means they're missing out on a lot of things in 8.x, so they'll likely be even more pleased in the future.

And it's a matter of principle when it comes to sensitive data belonging to the public /mha/templates/default/img/emoticons/wink.png alt=";-)" style="display: inline; vertical-align: bottom;" class="emoticon" / Dealing with it using open source software just makes a lot of sense (even more than just the monetary savings and the very stable platform that you get). Now I just wish something similar would happen here... statistics

Now that the new GIN/tsearch2 based search engine has been running on it's time for a bit of statistics of the searches being performed. The following statistics are pulled from the most recent 99,733 searches performed. Note that these statistics are taken pre-lexing, so searches for different variants of the same word count as different words.

A total of 32,810 unique search terms were submitted, meaning that each search term was used on an average only 3 times.

4930 of the searches resulted in stepping pas the first page of results, meaning that 95%25 of all searchers only looked at the first page of results.

17 searches stepped all the way to the 50th page (the maximum number of pages shown). My guess is that all of these were people testing and not actually looking for the hits there.

The 25 most common search terms, and the number of searches for them, were:

foobar 3873
search 1796
alter table 732
create table 588
jdbc 517
select 508
update 507
insert 475
copy 463
pg_dump 458
odbc 386
sequence 381
vacuum 368
date 361
cast 350
case 349
replication 330
grant 320
psql 318
join 296
create user 275
between 266
tsearch2 264
timestamp 260
cluster 250

The first two are certainly very interesting, with foobar representing almost 4%25 of the searches. You may wonder why this is, and it has a very logical explanation - that's the nagios system that monitors the servers functionality...

The second one is explained by that being what you get if you hit "search" on the frontpage without typing anything in the search box first.

The good news is that system load has remained vastly below what it was with the old search engine. Most of the time, you can't see anything on the server at all, and the load avg is 0.02/0.01/0.00 right now (my stats processing being what brought it so far up).

Rebuilding large GIN indexes

takes time. A lot of time. With rebuilding, I mean when you for example have to UPDATE all your tsvector fields due to a change in tsearch2 configuration. (Yes, I had to do that because I had a slightly incorrect tsearch2 configuration for the archives search database). So don't do it. Instead use the fact that PostgreSQL has nice transactional DDL and do something like this: [HTML_REMOVED] BEGIN TRANSACTION;

CREATE TABLE messages_new AS SELECT id,txt,to_tsvector(txt) AS fti

FROM messages;

CREATE INDEX messages_new_fti ON messages_new USING gin(fti);

ANALYZE messags_new;

ALTER TABLE messages RENAME TO messages_old;

ALTER TABLE messages_new RENAME TO messages;


DROP TABLE messages_old;

ALTER INDEX messages_new_fti RENAME TO messages_fti; ``` (apologies for any typos, I didn't bother to actually type these commands into the database again, and I lost my cut-and-paste of what I ran)

This way, the messages table can still serve up searches without any disruption to the searches at all. And creating the new index is a lot faster than updating the existing one if you have to touch all rows.

New search live on

Today I finally got around to cleaning up the final things, and activating the new tsearch2 based search engine for Other than Stefan finding an escaping bug (hello magic_quotes, you should be off) and one encoding bug (yes, even though everything is utf-8 everywhere, you still have to explicitly tell htmlentities that it should be utf-8), things are looking good. The load on the search machine is way down, mainly because the indexing part of aspseek loaded things up horribly. The search itself wasn't too bad, and it's not too bad now either.

So, feel free to pound it a bit, and let me know if there are issues.

Oh, and yes, I'll be doing a complete writeup on it. Eventually :-)

Update: As Devrim so kindly informed me, I forgot to credit him. He of course helped me rsync some changes to the archives box at commandprompt, but most importantly he restored from backup the file that I accidentally removed even though it wasn't in CVS. So - thanks Devrim, couldn't have done it without you!

Custom synonym dictionaries in tsearch2

(this could be considered part 4 of the PostgreSQL website search update)

Custom synonym dictionaries is not actually new functionality for 8.2 - I think it's actually been around since 7.4 or so. But it's definitly been a function I did not know about, and it can be really useful in scenarios when you know some specifics about your data. In general, selecting and tuning your dictionaries in tsearch2 can give a big boost to the accuracy of your searches, so it's definitly worth looking into.

The concept of custom synonym dictionaries is really simple (and so is the implementation) - you get to define synonyms for certain words that all the tsearch2 functions will then apply to both indexing and searching, including the headline() function. For example, how many people incorrectly call PostgreSQL "postgre"? Answer: way too many. But that doesn't mean we shouldn't help them by providing whatever search matches we can. Also, some people refer to it as pgsql, and why should the user need to know that? [HTML_REMOVED] So how is this done in tsearch2? With the dict_syn dictionary it's very simple, and this dictionary is built and installed by default as long as you install tsearch2 (from source - I assume it's in all binary packages as well, since there is no config switch to turn it off automaticaly).

The first step is to create the dictionary file itself. It's a simple textfile (though you could keep it in the database and update it with a trigger in for example pl/perlu if you wanted to) with two whitespace separated fields listing the word and it's synonym. For the websearch, I started out with the following, a simple set of synonyms for the word postgres (all words are case-insensitive at this point):

[HTML_REMOVED] postgresql postgres

pgsql postgres

postgre postgres

I chose to store this along with other tsearch2 files in *<pgdir>/share/contrib/pg_dict*, but you can of course store this anywhere you like as long as the backend can read the file.

With this done, you need to add the dictionary to the list of available ones in tsearch2, using a command like this (this assumes you have installed tsearch2 in your database already):

<code style="white-space:pre">
INSERT INTO pg_ts_dict (SELECT 'pg_dict',dict_init,


 dict_lexize, 'pg-specific dictionary'

 FROM pg_ts_dict WHERE dict_name='synonym'


At this time, you should be able to test the new dictionary, using the lexize function. Notice how any word defined in the dictionary returns the synonym, in this case postgres. [HTML_REMOVED] search=# select lexize('pg_dict','postgresql'),lexize('pg_dict','pgsql');

lexize | lexize


{postgres} | {postgres}

(1 row)

Finally, you need to modify the tsearch2 configuration for it. Normally you'll probalby be working off a custom configuration, but if you're working off the default one, the command below will activate the *pg_dict* dictionary along with the english stemmer for the default config:
<code style="white-space:pre">
UPDATE pg_ts_cfgmap SET dict_name='{pg_dict,en_stem}'

 WHERE tok_alias IN ( 'lword', 'lhword', 'lpart_hword' )

  AND ts_name='default';

With this done, you should now be able to verify that the system works properly by converting a string to a tsvector:

[HTML_REMOVED] search=# SELECT to_tsvector('postgresql');



(1 row) ```

And that's all there is to it. You can easily modify the dictionary by just editing the file. Note that you will need to re-index your tables (or more specifically, re-create the contents of the tsvector columns) if you change the dictionary and want those changes to be usable to search data. Also, the contents of the synonym dictionary is cached the first time it's used in a backend, so if you're using connection pooling you may need to forcibly close your connections in order for the changes to show up.

Finally, a big thanks to Oleg Bartunov for helping me sort out this configuration stuff in the first place, and the rapid fixing of the couple of issues I came across during the testing of 8.2.

Homepage facelift

A short while ago I committed a bunch of changes to the frontpage of, trying to include a couple of things that have been much requested. Most of the work is based on a design proposal by Simon Riggs.

The most obvious change is that we now list more items under news and events, and also a short aggregation of some posts from Planet PostgreSQL. Further incremental changes are sure to come...


I speak at and organize conferences around Open Source in general and PostgreSQL in particular.


PGConf.EU 2020
Oct 20-23, 2020
Berlin, Germany


Warsaw User Group
Jun 29, 2020
Virtual, Virtual
Postgres Vision
Jun 23-24, 2020
Online, Virtual
PGCon 2020
May 26-29, 2020
Online, Virtual 2020
Mar 26, 2020
Paris, France
Nordic PGDay 2020
Mar 24, 2020
Helsinki, Finland
More past conferences