My pgday.eu pictures are up

I've finally gotten around to uploading my pictures from PGDay.EU 2009 to [ my smugmug gallery].

Clearly the conference was tiring, and we all needed a rest... (yes, this was during JD's talk)

And as the picture says, don't forget to submit your feedback - the site is still open for that!

PGDay.EU 2009 - it's a wrap

I'm currently sitting on my flight home from Paris CDG, after a couple of very hectic days. It's going to be a couple of days (which in reality is going to drag out into a couple of weeks due to other work engagements and then travel for the JPUG conference) before it'll be possible to completely evaluate the conference and things around it, but here's what I have so far.

I'm going to leave the evaluation of the talks themselves to somebody else. There were many others of the "regular PostgreSQL bloggers" present at the conference and we've already seen some posts around it. Hopefully there will be more, both in French and English. If you are blogging about this and your blog isn't already up on Planet PostgreSQL, please consider adding it so that the community at large gets notification of your posts.

Continue reading

PGDay.EU day one

align="right" So, day one is almost finished, when it comes to the conference itself - and then it's off to the EnterpriseDB evening party. A quick summary of the day is: awesome.

Going into a little more detail, the day started with us actually getting up painfully early. Got to ParisTech in the morning, right as they opened up our room. I have to say the facilities at ParisTech have been great - the rooms are in great shape, perfect size, and all the A/V equipment is working perfectly. (Yes, there is a slight flicker on one of the projectors, but it's not bad).

I did the intro section with Jean-Paul, and there's really not much to say about that. We (well, I) forgot to add a slide about the feedback - oops. I bet that's one reason we don't have as much feedback entered yet as we'd like.

Simon took over with a keynote, which was very good. Simon is a very good speaker, and he found a good balance between technical and non-technical talks. It was a nice way to kick off the conference. At this time, we had somewhere between 125-150 people had shown up, which is definitely not bad.

I half-followed the English track after that, with talks about PostGIS and Data Warehousing, which were both very good talks. Also spent some time in the organization, which has really worked pretty smoothly. We've had a few minor issues, but they were all solved quickly.

Lunch was fantastic. Many thanks to our great caterers who served us an amazing lunch, with good organization, and more than enough food. Couldn't be better!

align="left" Watched Gavin's talk on scalability after lunch, which is always a good one. After that I had my own talk which went at least Ok, though I did finish a bit early. After that it was off to a pgadmin developer meeting, which is where I am now.

So I'd better go now, so I don't miss out on the activities.

80 pages about PostgreSQL!

I was just told that the latest edition of GNU/Linux Magazine in France is dedicated to PostgreSQL. A full 80 pages about your favorite RDBMS! I'm told all the articles are written by our own Guillaume Lelarge. Well done, Guillaume!

Unfortunately for those of us who don't speak the language, the whole thing is in French. But if you do speak French, it's probably well worth checking out. There's a preview available online, and the magazine should be available in stores. Guillaume has also told me the contents will be available downloaded later on, but not for a few months.

PGDay.EU open for business

Yesterday we announced the schedule for PGDay.EU 2009. The Friday will have one track in English and one in French, and the Saturday will have two tracks in English and one in French. There are a lot of good talks scheduled - I wish I could trust my French enough to go see a couple of those as well...

We are also now open for registration. The cost of the conference is from €60 for a full price two day entry with discounts for single-day and for students. See the registration page for details. While we expect to be able to accommodate all interested people, if we are unable to do so those that register first will obviously be the ones we can take. We also prefer that you register as soon as you can if you know you're coming, since that makes our planning much easier.

Testing PostgreSQL patches on Windows using Amazon EC2

Many people who develop patches for PostgreSQL don't have access to Windows machines to test their patches on. Particularly not with complete build environments for the MSVC build on them. The net result of this is that a fair amount of patches are never tested on Windows until after they are committed. For most patches this doesn't actually matter, since it's changes that don't deal with anything platform specific other than that which is already taken care of by our build system. But Windows is not Posix, so the platform differences are generally larger than between the different Unix platforms PostgreSQL builds on, and in MSVC the build system is completely different. In a non-trivial number of cases it ends up with breaking the buildfarm until somebody with access to a Windows build environment can fix it. Lucky, we have a number of machines running on the buildfarm with Windows on them, so we do catch these things long before release.

There are a couple of reasons why it's not easy for developers to have a Windows machine ready for testing, even a virtual one. For one, it requires a Windows license. In this case the same problem with availability for testing exists for other proprietary platforms such as for example Mac OSX, but it's different from all the free Linux/Unix platforms available. Second, setting up the build environment is quite complex - not at all as easy as on the most common Linux platforms for example. This second point is particularly difficult for those not used to Windows.

A third reason I noticed myself was that running the builds, and regression tests, is very very slow at least on my laptop using VirtualBox. It works, but it takes ages. For this reason, a while back I started investigating using Amazon EC2 to do my Windows builds on, for my own usage. Turns out this was a very good solution to my problem - the time for a complete rebuild on a typical EC2 instance is around 7 minutes, whereas it can easily take over 45 minutes on my laptop.

Now, EC2 provides a pretty nice way to create what's called an AMI (Amazon Machine Image) that can be shared. Using these facilities, I have created an AMI that contains Windows plus a complete PostgreSQL build environment. Since this AMI has been made public, anybody who wants to can boot up an instance of it to run tests. Each of these instances are completely independent of each other - the AMI only provides a common starting point.

I usually run these on a medium size Amazon instance. The cost for such an instance is, currently, $0.30 per hour that the instance is running. The big advantage here is that this includes the Windows license. That makes it a very cost-effective way to do quick builds and tests on Windows.

Read on for a full step-by-step instruction on how to get started with this AMI (screenshot overload warning).

Continue reading

Some planet updates

I found myself unexpected with a day home with nothing but boring chores to do really, so I figured a good way to get out of doing those would be to do some work on the backlog of things that I've been planning to do for planet.postgresql.org. I realize that my blog is turning into a release-notes-for-planet lately since I haven't had much time to blog about other things. So I may as well confess right away that one reason to post is to make sure the updates I deployed actually work...

This round of updates have been around the twitter integration:

  • Since it turned out that a lot of people didn't actually know there was a twitter integration for planet, it is now linked clearly from the planet frontpage.
  • The twitter integration scripts (originally by Selena have been rewritten to work directly with our database of posts instead of pulling back in the RSS feed that the system had just generated, and also to keep the status of posts in the database. With luck, this will fix the very rare case where posts sometimes got dropped, and it made the code a lot simpler.
  • The posts made by the system will refer to the twitter username of the blog owner, if it's registered. For your own blogs, you can see what username is registered by going to the registration site. We've added some of the twitter usernames we know about - if yours is not listed, please let us know at planet@postgresql.org what twitter username to connect with what blog url.
  • The system has been prepared to pull out some usage statistics, but nothing is actually done with that yet.

Help us test a patch for the Win32 shared memory issue

We currently have a patch sitting in the queue from Tsutomu Yamada with modifications from me, all based on an idea from Trevor Talbot some time back. (That should do it for credits) It tries to pre-reserve the shared memory region during DLL initialization, and then releases it just in time to reallocate it as shared memory. (that will do for technical detail for now) This should hopefully fix the infamous "failed to re-attach to shared memory" errors we've been seeing on Windows.

We need your help to test it!

We need help both from people who are experiencing the problem - to see if it solves it, and from people who are not experiencing it - to make sure it doesn't cause any new problems.

Dave has built binaries for 8.3.7 and 8.4.0. To test the patch, stop your server, take a backup copy of your postgres.exe file, and replace it with the file from the appropriate ZIP file before. Restart the server, and see if it works!

Once you have tested, please report your success to the pgsql-hackers list, or directly to me and I'll tally it up.

Update: These patched binaries will only work if you installed from the One-click installer. Specifically, they will not work if you installed from the MSI installer due to a mismatch in the configuration option for integer vs floating point datetime handling.

Planet updates

I've just deployed a new version of the code that runs http://planet.postgresql.org. Most of this code was written by Selena and me during the initial days at PGCon. It just needed some minor polishing, which I didn't get around to until now. So, the new things are:

Support for Team blogs : This is just a grouping of existing blogs, not actually something new we parse. The idea is to give some exposure to a team someone works for - for example, a specific PostgreSQL support company.

Top posters list : The list of all subscriptions has been replaced with a list of top posters. The list was becoming a bit too large to manage, and didn't really fill a purpose. And it was hard to integrate nicely with the Team blogs feature.

There has also been a bunch of internal changes : Details available in the git repo on http://git.postgresql.org.

If you want to make use of the Team blogs feature, this has unfortunately not been implemented in the admin interface. We (well, me, really) were just a bit too lazy for that. So if you want to make use of it, please just send an email to planet@postgresql.org letting us know what name you want for the team, and which blogs to add to it (these blogs should already be subscribed to planet).

Getting a range of entries centered around a point

I had a question yesterday on an internal IRC channel from one of my colleagues in Norway about a SQL query that would "for a given id value, return the 50 rows centered around the row with this id", where the id column can contain gaps (either because they were inserted with gaps, or because there are further WHERE restrictions in the query).

I came up with a reasonably working solution fairly quickly, but I made one mistake. For fun, I asked around a number of my PostgreSQL contacts on IM and IRC for their solutions, and it turns out that almost everybody made the exact same mistake at first. I'm pretty sure all of them, like me, would've found and fixed that issue within seconds if they were in front of a psql console. But I figured that was a good excuse to write a blog post about it.

The solution itself becomes pretty simple if you rephrase the problem as "for a given id value, return the 25 rows preceding and the 25 rows following the row with this id". That pretty much spells a UNION query. Thus, the solution to the problem is:


    SELECT * FROM (
        SELECT id,field1,field2 from mytable where id >= 123456 order by id limit 26
    ) AS a
UNION ALL
    SELECT * FROM (
        SELECT id,field1,field2 from mytable where id < 123456 order by id desc limit 25
    ) AS b
ORDER BY id;

The mistake everybody made? Forgetting that you need a subselect in order to use LIMIT. Without subselects, you can't put ORDER BY or LIMIT inside the two separate parts of the query, only at the outer end of it. But we specifically need to apply the LIMIT individually, and the ORDER BY needs to be different for the two parts.

Another question I got around this was, why use UNION ALL. We know, after all, that there are no overlapping rows so the result should be the same as for UNION. And this is exactly the reason why UNION ALL should be used, rather than a plain UNION. We know it - the database doesn't. A UNION query will generate a plan that requires an extra unique node at the top, to make sure that there are no overlapping rows. So the tip here is - always use UNION ALL rather than UNION whenever you know that the results are not overlapping.

All things considered, this query produces a pretty quick plan even for large datasets, since it allows us to do two independent index scans, one backwards. Since there are LIMIT nodes on the scans, they will stop running as soon as they have produced the required number of rows, which is going to be very small compared to the size of the table. This is the query plan I got on my test data:


 Sort  (cost=54.60..54.73 rows=51 width=86)
   Sort Key: id
   ->  Append  (cost=0.00..53.15 rows=51 width=86)
         ->  Limit  (cost=0.00..35.09 rows=26 width=51)
               ->  Index Scan using mytable_pk on mytable  (cost=0.00..55425.06 rows=41062 width=51)
                     Index Cond: (id >= 100000)
         ->  Limit  (cost=0.00..17.04 rows=25 width=51)
               ->  Index Scan Backward using mytable_pk on mytable  (cost=0.00..56090.47 rows=82306 width=51)
                     Index Cond: (id < 100000)

And yes, the final ORDER BY is still needed if we want the total result to come out in the correct order. With the default query plan, it will come out in the wrong order after the append node. But it's important to remember that by the specification the database is free to return the rows in any order it chooses unless there is an explicit ORDER BY in the query. The rows may otherwise be returned in a completely different order between different runs, depending on the size/width of the table and other parameters.

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

Postgres Open 2017
Sep 6-8, 2017
San Francisco, USA
PGConf.EU 2017
Oct 24-27, 2017
Warsaw, Poland
PGConf.Asia
Dec 4-6, 2017
Tokyo, Japan

Past

PGDay.RU
Jul 5-7, 2017
St Petersburg, Russia
PGDay.UK
Jul 4, 2017
London, UK
Amsterdam PUG
Jun 29, 2017
Amsterdam, Netherlands
PGCon 2017
May 23-26, 2017
Ottawa, Canada
FOSS-North
Apr 26, 2017
Gothenburg, Sweden
More past conferences