PGDay.EU open for business

Yesterday we announced the schedule for PGDay.EU 2009. The Friday will have one track in English and one in French, and the Saturday will have two tracks in English and one in French. There are a lot of good talks scheduled - I wish I could trust my French enough to go see a couple of those as well...

We are also now open for registration. The cost of the conference is from €60 for a full price two day entry with discounts for single-day and for students. See the registration page for details. While we expect to be able to accommodate all interested people, if we are unable to do so those that register first will obviously be the ones we can take. We also prefer that you register as soon as you can if you know you're coming, since that makes our planning much easier.

Testing PostgreSQL patches on Windows using Amazon EC2

Many people who develop patches for PostgreSQL don't have access to Windows machines to test their patches on. Particularly not with complete build environments for the MSVC build on them. The net result of this is that a fair amount of patches are never tested on Windows until after they are committed. For most patches this doesn't actually matter, since it's changes that don't deal with anything platform specific other than that which is already taken care of by our build system. But Windows is not Posix, so the platform differences are generally larger than between the different Unix platforms PostgreSQL builds on, and in MSVC the build system is completely different. In a non-trivial number of cases it ends up with breaking the buildfarm until somebody with access to a Windows build environment can fix it. Lucky, we have a number of machines running on the buildfarm with Windows on them, so we do catch these things long before release.

There are a couple of reasons why it's not easy for developers to have a Windows machine ready for testing, even a virtual one. For one, it requires a Windows license. In this case the same problem with availability for testing exists for other proprietary platforms such as for example Mac OSX, but it's different from all the free Linux/Unix platforms available. Second, setting up the build environment is quite complex - not at all as easy as on the most common Linux platforms for example. This second point is particularly difficult for those not used to Windows.

A third reason I noticed myself was that running the builds, and regression tests, is very very slow at least on my laptop using VirtualBox. It works, but it takes ages. For this reason, a while back I started investigating using Amazon EC2 to do my Windows builds on, for my own usage. Turns out this was a very good solution to my problem - the time for a complete rebuild on a typical EC2 instance is around 7 minutes, whereas it can easily take over 45 minutes on my laptop.

Now, EC2 provides a pretty nice way to create what's called an AMI (Amazon Machine Image) that can be shared. Using these facilities, I have created an AMI that contains Windows plus a complete PostgreSQL build environment. Since this AMI has been made public, anybody who wants to can boot up an instance of it to run tests. Each of these instances are completely independent of each other - the AMI only provides a common starting point.

I usually run these on a medium size Amazon instance. The cost for such an instance is, currently, $0.30 per hour that the instance is running. The big advantage here is that this includes the Windows license. That makes it a very cost-effective way to do quick builds and tests on Windows.

Read on for a full step-by-step instruction on how to get started with this AMI (screenshot overload warning).

Continue reading

Some planet updates

I found myself unexpected with a day home with nothing but boring chores to do really, so I figured a good way to get out of doing those would be to do some work on the backlog of things that I've been planning to do for planet.postgresql.org. I realize that my blog is turning into a release-notes-for-planet lately since I haven't had much time to blog about other things. So I may as well confess right away that one reason to post is to make sure the updates I deployed actually work...

This round of updates have been around the twitter integration:

  • Since it turned out that a lot of people didn't actually know there was a twitter integration for planet, it is now linked clearly from the planet frontpage.
  • The twitter integration scripts (originally by Selena have been rewritten to work directly with our database of posts instead of pulling back in the RSS feed that the system had just generated, and also to keep the status of posts in the database. With luck, this will fix the very rare case where posts sometimes got dropped, and it made the code a lot simpler.
  • The posts made by the system will refer to the twitter username of the blog owner, if it's registered. For your own blogs, you can see what username is registered by going to the registration site. We've added some of the twitter usernames we know about - if yours is not listed, please let us know at planet@postgresql.org what twitter username to connect with what blog url.
  • The system has been prepared to pull out some usage statistics, but nothing is actually done with that yet.

Help us test a patch for the Win32 shared memory issue

We currently have a patch sitting in the queue from Tsutomu Yamada with modifications from me, all based on an idea from Trevor Talbot some time back. (That should do it for credits) It tries to pre-reserve the shared memory region during DLL initialization, and then releases it just in time to reallocate it as shared memory. (that will do for technical detail for now) This should hopefully fix the infamous "failed to re-attach to shared memory" errors we've been seeing on Windows.

We need your help to test it!

We need help both from people who are experiencing the problem - to see if it solves it, and from people who are not experiencing it - to make sure it doesn't cause any new problems.

Dave has built binaries for 8.3.7 and 8.4.0. To test the patch, stop your server, take a backup copy of your postgres.exe file, and replace it with the file from the appropriate ZIP file before. Restart the server, and see if it works!

Once you have tested, please report your success to the pgsql-hackers list, or directly to me and I'll tally it up.

Update: These patched binaries will only work if you installed from the One-click installer. Specifically, they will not work if you installed from the MSI installer due to a mismatch in the configuration option for integer vs floating point datetime handling.

Planet updates

I've just deployed a new version of the code that runs http://planet.postgresql.org. Most of this code was written by Selena and me during the initial days at PGCon. It just needed some minor polishing, which I didn't get around to until now. So, the new things are:

Support for Team blogs : This is just a grouping of existing blogs, not actually something new we parse. The idea is to give some exposure to a team someone works for - for example, a specific PostgreSQL support company.

Top posters list : The list of all subscriptions has been replaced with a list of top posters. The list was becoming a bit too large to manage, and didn't really fill a purpose. And it was hard to integrate nicely with the Team blogs feature.

There has also been a bunch of internal changes : Details available in the git repo on http://git.postgresql.org.

If you want to make use of the Team blogs feature, this has unfortunately not been implemented in the admin interface. We (well, me, really) were just a bit too lazy for that. So if you want to make use of it, please just send an email to planet@postgresql.org letting us know what name you want for the team, and which blogs to add to it (these blogs should already be subscribed to planet).

Getting a range of entries centered around a point

I had a question yesterday on an internal IRC channel from one of my colleagues in Norway about a SQL query that would "for a given id value, return the 50 rows centered around the row with this id", where the id column can contain gaps (either because they were inserted with gaps, or because there are further WHERE restrictions in the query).

I came up with a reasonably working solution fairly quickly, but I made one mistake. For fun, I asked around a number of my PostgreSQL contacts on IM and IRC for their solutions, and it turns out that almost everybody made the exact same mistake at first. I'm pretty sure all of them, like me, would've found and fixed that issue within seconds if they were in front of a psql console. But I figured that was a good excuse to write a blog post about it.

The solution itself becomes pretty simple if you rephrase the problem as "for a given id value, return the 25 rows preceding and the 25 rows following the row with this id". That pretty much spells a UNION query. Thus, the solution to the problem is:


    SELECT * FROM (
        SELECT id,field1,field2 from mytable where id >= 123456 order by id limit 26
    ) AS a
UNION ALL
    SELECT * FROM (
        SELECT id,field1,field2 from mytable where id < 123456 order by id desc limit 25
    ) AS b
ORDER BY id;

The mistake everybody made? Forgetting that you need a subselect in order to use LIMIT. Without subselects, you can't put ORDER BY or LIMIT inside the two separate parts of the query, only at the outer end of it. But we specifically need to apply the LIMIT individually, and the ORDER BY needs to be different for the two parts.

Another question I got around this was, why use UNION ALL. We know, after all, that there are no overlapping rows so the result should be the same as for UNION. And this is exactly the reason why UNION ALL should be used, rather than a plain UNION. We know it - the database doesn't. A UNION query will generate a plan that requires an extra unique node at the top, to make sure that there are no overlapping rows. So the tip here is - always use UNION ALL rather than UNION whenever you know that the results are not overlapping.

All things considered, this query produces a pretty quick plan even for large datasets, since it allows us to do two independent index scans, one backwards. Since there are LIMIT nodes on the scans, they will stop running as soon as they have produced the required number of rows, which is going to be very small compared to the size of the table. This is the query plan I got on my test data:


 Sort  (cost=54.60..54.73 rows=51 width=86)
   Sort Key: id
   ->  Append  (cost=0.00..53.15 rows=51 width=86)
         ->  Limit  (cost=0.00..35.09 rows=26 width=51)
               ->  Index Scan using mytable_pk on mytable  (cost=0.00..55425.06 rows=41062 width=51)
                     Index Cond: (id >= 100000)
         ->  Limit  (cost=0.00..17.04 rows=25 width=51)
               ->  Index Scan Backward using mytable_pk on mytable  (cost=0.00..56090.47 rows=82306 width=51)
                     Index Cond: (id < 100000)

And yes, the final ORDER BY is still needed if we want the total result to come out in the correct order. With the default query plan, it will come out in the wrong order after the append node. But it's important to remember that by the specification the database is free to return the rows in any order it chooses unless there is an explicit ORDER BY in the query. The rows may otherwise be returned in a completely different order between different runs, depending on the size/width of the table and other parameters.

pgcon photos

Just a quick note to let people know I have uploaded my [ photos from pgcon]. They're not as many as last year, and not really good, but there are at least some for people to look at :-)

I have only started tagging up names. If you know more of them, just drop me an email with photo link and name. Thanks!

pgcon is done

I'm currently sitting in Frankfurt Airport waiting for my connecting flight back home to Stockholm, and I figure this is a good time to sum up the rest pgcon that ended a couple of days ago.

The second day of talks, Friday, began with what must almost be called a developer keynote. PGDG "giants" Tom Land and Bruce Momjian gave a talk on how to get your patch accepted into PostgreSQL. I think they did a good job of showing some of the general thoughts that are behind this process in a good way. And it was fun to finally get to see Tom do a talk at one of these conferences...

After this I split a slot between the Wisconsin Courts talk and Selenas VACUUM talk, since I had to take a phonecall in the middle of the talk. Why does this always happen? Thus, didn't see enough of either talk to really make any comments..

After lunch I did the temporal data talk, but I admit to not following it too closely - not really something I was deeply interested in, but this was really the only time when there wasn't a talk in any of the tracks that really interested me.

In the last of the regular talks, I went to Gavin's talk about Golconde. Sounds like a very interesting piece of technology. I don't actually have any use-case for it at this time, but I'm sure I will come across them eventually - and at least now I know how to pronounce it (which I hear Gavin's colleagues are having some issues with)

The last scheduled slot was the lightning talks. This year they were not scheduled up against any regular talk - good move by the schedulers (I was on the program committee, but didn't help out with the scheduling, so I can take no credit myself). Several very interesting and a couple of fun talks, and some that did both. The award for best lightning talk this year has to go to Josh Tolley and his talk on How to not review a patch (Josh: you get no link since your endpoint blog seems to not support author links?!)

Writing up this reminds me: I have yet to review several of these talks on the pgcon website. If you were there and haven't done so yet - please do it now! Most speakers really appreciate the feedback - I know I certainly do. It's what helps us be better next year! It will also help the program committee pick which talks are most interesting for next year.

I skipped out on the tourism-in-ottawa tour by Dan since I've done that the previous years, and instead took a train up to Montreal with Greg Stark, Dave Page, Selena Deckelmann and Bruce Momjian. Greg gave us a nice tour of that city instead (where he's originally from). And it was certainly thorough - there's this one roundabout that we did at least 3 laps in... Obviously we failed to completely stop talking about PostgreSQL, but at least that wasn't the main focus.

Left Montreal Sunday evening and arrived back in Europe Monday morning, and am now just waiting for the connecting flight to do the last leg back to Stockholm, and back to the regular work.

So, the short version of the pgcon summary:
Talks track : excellent
Hallway track : excellent
Bar track : excellent
Shawarma track : good
$(other) track : excellent

If you didn't go to pgcon this year, this is a good time to start thinking about going next! And don't forget pgday.eu in Paris this November!

pgcon, 1st talk day

We're now up to the third day of pgcon, the first one of the actual conference - the previous ones being dedicated to tutorials. The day started with Selena, me and Dave doing a semi-improvised keynote. Well, it started with Dan saying welcome and going through some details, but he doesn't count... I doubt we actually spread any knowledge with that talk, but at least we got to plug some interesting talks at the conference, and show pictures of elephants.

Missed the start of the Aster talk on Petabyte databases using standard PostgreSQL, but the parts I caught sounded very interesting. I'm especially excited to hear they are planning to contribute a whole set of very interesting features back to core PostgreSQL. This makes a lot of sense since they're building their scaling on standard PostgreSQL and not a heavily modified one like some other players in the area, and it's very nice to see that they are realizing this.

After this talk, it was time for my own talk on PostgreSQL Encryption. I had a hard time deciding the split between pgcrypto and SSL when I made the talk, but I think it came out fairly well. Had a number of very good questions at the end, so clearly some people were interested. Perhaps even Bruce managed to learn something...

After this we had lunch, and I'm now sitting in Greg Smiths talk about benchmarking hardware. This is some very low level stuff compared to what you usually see around database benchmarking, but since this is what sits underneath the database, it's important stuff. And very interesting.

The rest of the day has a lineup of some very nice talks, I think. So there'll be no sitting around in the hallway! And in the evening there is the EnterpriseDB party, of course!

Yesterday had the developer meeting, where a bunch (~20) of the most active developers that are here in Ottawa sat down together for the whole day to discuss topics around the next version of PostgreSQL, and how our development model works. Got some very important discussions started, and actually managed to get agreement on a couple of issues that have previously been going in circles. All in all, a very useful day.

Getting started at pgcon

I arrived in Ottawa on Sunday evening after a pretty long flight over from Stockholm. Completely by random I met Josh Berkus at Chicago O'Hare, and it turned out we were on the same flight to Ottawa. Had a nice dinner with Josh and Dan Langille, the PgCon organizer, at an Indian place.

Monday morning, met up with Selena, Dan and Josh again for breakfast close to our hotel. And somehow we got suckered into doing the keynote on thursday. Actually, I think it went down this way: Josh volunteered Selena to do it. Selena volunteered me to do it. And I volunteered Dave. In the end we'll end up doing it together - and of course Dan will do the general conference introduction. We haven't really gotten started on the actual talk itself, so if you have good ideas for it, feel free to let us know...

Not much time go to be spent on the slides yesterday, as me and Selena a some kind of mini web-hackathon. We spent time working on some features for Planet PostgreSQL. Some cleanup took a bit longer than expected so they're not actually out yet, but they will be soon... I see Selena thinks we're going to deploy it to the production server today, but I'm very doubtful about that. We'll see.

Last night was really when a lot of known people started turning up, first for dinner at Works Burger and then for beer a the Royal Oak. The big news from the Oak was that Stephen didn't fall asleep this year. Other than that, things were pretty much as usual.

Today has been spent mostly working on slides for my regular talk. I skipped out on both Stephen Frost's tutorial on access control and Josh's updated version of performance whack-a-mole - my dedicated slide-making time will go to the keynote, so I have to finish the slides now.

Rumor has it Dave has now arrived in Ottawa and should show up soon. So keynote work (at the Oak) will probably start shortly.

Tomorrow's second tutorial day, but for me and many of the most active backend hackers that are here, it's a day of meeting up with other developers and discuss what's going to happen in PostgreSQL for the next year or so. It was a great success last year, and I'm sure everybody is expecting an equally valuable as last year. And the appropriate thanks out to Dave and EnterpriseDB for arranging the meetup and picking up the tab.

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

FOSDEM+PGDay 2019
Feb 1-3, 2019
Brussels, Belgium
Nordic PGDay 2019
Mar 19, 2019
Copenhagen, Denmark

Past

PGConf.Asia 2018
Dec 10-12, 2018
Tokyo, Japan
DC PostgreSQL Users Group
Nov 14, 2018
Washington DC, USA
New York City PostgreSQL User Group
Nov 13, 2018
New York City, NY, USA
Driving IT 2018
Nov 2, 2018
Copenhagen, Denmark
PGConf.EU 2018
Oct 23-26, 2018
Lisbon, Portugal
More past conferences