I was just told that the latest edition of GNU/Linux Magazine in France is dedicated to PostgreSQL. A full 80 pages about your favorite RDBMS! I'm told all the articles are written by our own Guillaume Lelarge. Well done, Guillaume!
Unfortunately for those of us who don't speak the language, the whole thing is in French. But if you do speak French, it's probably well worth checking out. There's a preview available online, and the magazine should be available in stores. Guillaume has also told me the contents will be available downloaded later on, but not for a few months.
Yesterday we announced the schedule for PGDay.EU 2009. The Friday will have one track in English and one in French, and the Saturday will have two tracks in English and one in French. There are a lot of good talks scheduled - I wish I could trust my French enough to go see a couple of those as well...
We are also now open for registration. The cost of the conference is from €60 for a full price two day entry with discounts for single-day and for students. See the registration page for details. While we expect to be able to accommodate all interested people, if we are unable to do so those that register first will obviously be the ones we can take. We also prefer that you register as soon as you can if you know you're coming, since that makes our planning much easier.
Many people who develop patches for PostgreSQL don't have access to Windows machines to test their patches on. Particularly not with complete build environments for the MSVC build on them. The net result of this is that a fair amount of patches are never tested on Windows until after they are committed. For most patches this doesn't actually matter, since it's changes that don't deal with anything platform specific other than that which is already taken care of by our build system. But Windows is not Posix, so the platform differences are generally larger than between the different Unix platforms PostgreSQL builds on, and in MSVC the build system is completely different. In a non-trivial number of cases it ends up with breaking the buildfarm until somebody with access to a Windows build environment can fix it. Lucky, we have a number of machines running on the buildfarm with Windows on them, so we do catch these things long before release.
There are a couple of reasons why it's not easy for developers to have a Windows machine ready for testing, even a virtual one. For one, it requires a Windows license. In this case the same problem with availability for testing exists for other proprietary platforms such as for example Mac OSX, but it's different from all the free Linux/Unix platforms available. Second, setting up the build environment is quite complex - not at all as easy as on the most common Linux platforms for example. This second point is particularly difficult for those not used to Windows.
A third reason I noticed myself was that running the builds, and regression tests, is very very slow at least on my laptop using VirtualBox. It works, but it takes ages. For this reason, a while back I started investigating using Amazon EC2 to do my Windows builds on, for my own usage. Turns out this was a very good solution to my problem - the time for a complete rebuild on a typical EC2 instance is around 7 minutes, whereas it can easily take over 45 minutes on my laptop.
Now, EC2 provides a pretty nice way to create what's called an AMI (Amazon Machine Image) that can be shared. Using these facilities, I have created an AMI that contains Windows plus a complete PostgreSQL build environment. Since this AMI has been made public, anybody who wants to can boot up an instance of it to run tests. Each of these instances are completely independent of each other - the AMI only provides a common starting point.
I usually run these on a medium size Amazon instance. The cost for such an instance is, currently, $0.30 per hour that the instance is running. The big advantage here is that this includes the Windows license. That makes it a very cost-effective way to do quick builds and tests on Windows.
Read on for a full step-by-step instruction on how to get started with this AMI (screenshot overload warning).
I found myself unexpected with a day home with nothing but boring chores to do really, so I figured a good way to get out of doing those would be to do some work on the backlog of things that I've been planning to do for planet.postgresql.org. I realize that my blog is turning into a release-notes-for-planet lately since I haven't had much time to blog about other things. So I may as well confess right away that one reason to post is to make sure the updates I deployed actually work...
This round of updates have been around the twitter integration:
We currently have a patch sitting in the queue from Tsutomu Yamada with modifications from me, all based on an idea from Trevor Talbot some time back. (That should do it for credits) It tries to pre-reserve the shared memory region during DLL initialization, and then releases it just in time to reallocate it as shared memory. (that will do for technical detail for now) This should hopefully fix the infamous "failed to re-attach to shared memory" errors we've been seeing on Windows.
We need your help to test it!
We need help both from people who are experiencing the problem - to see if it solves it, and from people who are not experiencing it - to make sure it doesn't cause any new problems.
Dave has built binaries for 8.3.7 and 8.4.0. To test the patch, stop your server, take a backup copy of your postgres.exe file, and replace it with the file from the appropriate ZIP file before. Restart the server, and see if it works!
Once you have tested, please report your success to the pgsql-hackers list, or directly to me and I'll tally it up.
Update: These patched binaries will only work if you installed from the One-click installer. Specifically, they will not work if you installed from the MSI installer due to a mismatch in the configuration option for integer vs floating point datetime handling.
Support for Team blogs : This is just a grouping of existing blogs, not actually something new we parse. The idea is to give some exposure to a team someone works for - for example, a specific PostgreSQL support company.
Top posters list : The list of all subscriptions has been replaced with a list of top posters. The list was becoming a bit too large to manage, and didn't really fill a purpose. And it was hard to integrate nicely with the Team blogs feature.
There has also been a bunch of internal changes : Details available in the git repo on http://git.postgresql.org.
If you want to make use of the Team blogs feature, this has unfortunately not been implemented in the admin interface. We (well, me, really) were just a bit too lazy for that. So if you want to make use of it, please just send an email to planet@postgresql.org letting us know what name you want for the team, and which blogs to add to it (these blogs should already be subscribed to planet).
I had a question yesterday on an internal IRC channel from one of my colleagues in Norway about a SQL query that would "for a given id value, return the 50 rows centered around the row with this id", where the id column can contain gaps (either because they were inserted with gaps, or because there are further WHERE restrictions in the query).
I came up with a reasonably working solution fairly quickly, but I made one mistake. For fun, I asked around a number of my PostgreSQL contacts on IM and IRC for their solutions, and it turns out that almost everybody made the exact same mistake at first. I'm pretty sure all of them, like me, would've found and fixed that issue within seconds if they were in front of a psql console. But I figured that was a good excuse to write a blog post about it.
The solution itself becomes pretty simple if you rephrase the problem as "for a given id value, return the 25 rows preceding and the 25 rows following the row with this id". That pretty much spells a UNION query. Thus, the solution to the problem is:
SELECT * FROM (
SELECT id,field1,field2 from mytable where id >= 123456 order by id limit 26
) AS a
UNION ALL
SELECT * FROM (
SELECT id,field1,field2 from mytable where id < 123456 order by id desc limit 25
) AS b
ORDER BY id;
The mistake everybody made? Forgetting that you need a subselect in order to use LIMIT. Without subselects, you can't put ORDER BY or LIMIT inside the two separate parts of the query, only at the outer end of it. But we specifically need to apply the LIMIT individually, and the ORDER BY needs to be different for the two parts.
Another question I got around this was, why use UNION ALL. We know, after all, that there are no overlapping rows so the result should be the same as for UNION. And this is exactly the reason why UNION ALL should be used, rather than a plain UNION. We know it - the database doesn't. A UNION query will generate a plan that requires an extra unique node at the top, to make sure that there are no overlapping rows. So the tip here is - always use UNION ALL rather than UNION whenever you know that the results are not overlapping.
All things considered, this query produces a pretty quick plan even for large datasets, since it allows us to do two independent index scans, one backwards. Since there are LIMIT nodes on the scans, they will stop running as soon as they have produced the required number of rows, which is going to be very small compared to the size of the table. This is the query plan I got on my test data:
Sort (cost=54.60..54.73 rows=51 width=86)
Sort Key: id
-> Append (cost=0.00..53.15 rows=51 width=86)
-> Limit (cost=0.00..35.09 rows=26 width=51)
-> Index Scan using mytable_pk on mytable (cost=0.00..55425.06 rows=41062 width=51)
Index Cond: (id >= 100000)
-> Limit (cost=0.00..17.04 rows=25 width=51)
-> Index Scan Backward using mytable_pk on mytable (cost=0.00..56090.47 rows=82306 width=51)
Index Cond: (id < 100000)
And yes, the final ORDER BY is still needed if we want the total result to come out in the correct order. With the default query plan, it will come out in the wrong order after the append node. But it's important to remember that by the specification the database is free to return the rows in any order it chooses unless there is an explicit ORDER BY in the query. The rows may otherwise be returned in a completely different order between different runs, depending on the size/width of the table and other parameters.
Just a quick note to let people know I have uploaded my [ photos from pgcon]. They're not as many as last year, and not really good, but there are at least some for people to look at :-)
I have only started tagging up names. If you know more of them, just drop me an email with photo link and name. Thanks!
I'm currently sitting in Frankfurt Airport waiting for my connecting flight back home to Stockholm, and I figure this is a good time to sum up the rest pgcon that ended a couple of days ago.
The second day of talks, Friday, began with what must almost be called a developer keynote. PGDG "giants" Tom Land and Bruce Momjian gave a talk on how to get your patch accepted into PostgreSQL. I think they did a good job of showing some of the general thoughts that are behind this process in a good way. And it was fun to finally get to see Tom do a talk at one of these conferences...
After this I split a slot between the Wisconsin Courts talk and Selenas VACUUM talk, since I had to take a phonecall in the middle of the talk. Why does this always happen? Thus, didn't see enough of either talk to really make any comments..
After lunch I did the temporal data talk, but I admit to not following it too closely - not really something I was deeply interested in, but this was really the only time when there wasn't a talk in any of the tracks that really interested me.
In the last of the regular talks, I went to Gavin's talk about Golconde. Sounds like a very interesting piece of technology. I don't actually have any use-case for it at this time, but I'm sure I will come across them eventually - and at least now I know how to pronounce it (which I hear Gavin's colleagues are having some issues with)
The last scheduled slot was the lightning talks. This year they were not scheduled up against any regular talk - good move by the schedulers (I was on the program committee, but didn't help out with the scheduling, so I can take no credit myself). Several very interesting and a couple of fun talks, and some that did both. The award for best lightning talk this year has to go to Josh Tolley and his talk on How to not review a patch (Josh: you get no link since your endpoint blog seems to not support author links?!)
Writing up this reminds me: I have yet to review several of these talks on the pgcon website. If you were there and haven't done so yet - please do it now! Most speakers really appreciate the feedback - I know I certainly do. It's what helps us be better next year! It will also help the program committee pick which talks are most interesting for next year.
I skipped out on the tourism-in-ottawa tour by Dan since I've done that the previous years, and instead took a train up to Montreal with Greg Stark, Dave Page, Selena Deckelmann and Bruce Momjian. Greg gave us a nice tour of that city instead (where he's originally from). And it was certainly thorough - there's this one roundabout that we did at least 3 laps in... Obviously we failed to completely stop talking about PostgreSQL, but at least that wasn't the main focus.
Left Montreal Sunday evening and arrived back in Europe Monday morning, and am now just waiting for the connecting flight to do the last leg back to Stockholm, and back to the regular work.
If you didn't go to pgcon this year, this is a good time to start thinking about going next! And don't forget pgday.eu in Paris this November!
We're now up to the third day of pgcon, the first one of the actual conference - the previous ones being dedicated to tutorials. The day started with Selena, me and Dave doing a semi-improvised keynote. Well, it started with Dan saying welcome and going through some details, but he doesn't count... I doubt we actually spread any knowledge with that talk, but at least we got to plug some interesting talks at the conference, and show pictures of elephants.
Missed the start of the Aster talk on Petabyte databases using standard PostgreSQL, but the parts I caught sounded very interesting. I'm especially excited to hear they are planning to contribute a whole set of very interesting features back to core PostgreSQL. This makes a lot of sense since they're building their scaling on standard PostgreSQL and not a heavily modified one like some other players in the area, and it's very nice to see that they are realizing this.
After this talk, it was time for my own talk on PostgreSQL Encryption. I had a hard time deciding the split between pgcrypto and SSL when I made the talk, but I think it came out fairly well. Had a number of very good questions at the end, so clearly some people were interested. Perhaps even Bruce managed to learn something...
After this we had lunch, and I'm now sitting in Greg Smiths talk about benchmarking hardware. This is some very low level stuff compared to what you usually see around database benchmarking, but since this is what sits underneath the database, it's important stuff. And very interesting.
The rest of the day has a lineup of some very nice talks, I think. So there'll be no sitting around in the hallway! And in the evening there is the EnterpriseDB party, of course!
Yesterday had the developer meeting, where a bunch (~20) of the most active developers that are here in Ottawa sat down together for the whole day to discuss topics around the next version of PostgreSQL, and how our development model works. Got some very important discussions started, and actually managed to get agreement on a couple of issues that have previously been going in circles. All in all, a very useful day.