Viewing entries tagged with postgresql. Return to full view.

Testing PostgreSQL patches on Windows using Amazon EC2

Many people who develop patches for PostgreSQL don't have access to Windows machines to test their patches on. Particularly not with complete build environments for the MSVC build on them. The net result of this is that a fair amount of patches are never tested on Windows until after they are committed. For most patches this doesn't actually matter, since it's changes that don't deal with anything platform specific other than that which is already taken care of by our build system. But Windows is not Posix, so the platform differences are generally larger than between the different Unix platforms PostgreSQL builds on, and in MSVC the build system is completely different. In a non-trivial number of cases it ends up with breaking the buildfarm until somebody with access to a Windows build environment can fix it. Lucky, we have a number of machines running on the buildfarm with Windows on them, so we do catch these things long before release.

There are a couple of reasons why it's not easy for developers to have a Windows machine ready for testing, even a virtual one. For one, it requires a Windows license. In this case the same problem with availability for testing exists for other proprietary platforms such as for example Mac OSX, but it's different from all the free Linux/Unix platforms available. Second, setting up the build environment is quite complex - not at all as easy as on the most common Linux platforms for example. This second point is particularly difficult for those not used to Windows.

A third reason I noticed myself was that running the builds, and regression tests, is very very slow at least on my laptop using VirtualBox. It works, but it takes ages. For this reason, a while back I started investigating using Amazon EC2 to do my Windows builds on, for my own usage. Turns out this was a very good solution to my problem - the time for a complete rebuild on a typical EC2 instance is around 7 minutes, whereas it can easily take over 45 minutes on my laptop.

Now, EC2 provides a pretty nice way to create what's called an AMI (Amazon Machine Image) that can be shared. Using these facilities, I have created an AMI that contains Windows plus a complete PostgreSQL build environment. Since this AMI has been made public, anybody who wants to can boot up an instance of it to run tests. Each of these instances are completely independent of each other - the AMI only provides a common starting point.

I usually run these on a medium size Amazon instance. The cost for such an instance is, currently, $0.30 per hour that the instance is running. The big advantage here is that this includes the Windows license. That makes it a very cost-effective way to do quick builds and tests on Windows.

Read on for a full step-by-step instruction on how to get started with this AMI (screenshot overload warning).

Continue reading

Some planet updates

I found myself unexpected with a day home with nothing but boring chores to do really, so I figured a good way to get out of doing those would be to do some work on the backlog of things that I've been planning to do for planet.postgresql.org. I realize that my blog is turning into a release-notes-for-planet lately since I haven't had much time to blog about other things. So I may as well confess right away that one reason to post is to make sure the updates I deployed actually work...

This round of updates have been around the twitter integration:

  • Since it turned out that a lot of people didn't actually know there was a twitter integration for planet, it is now linked clearly from the planet frontpage.
  • The twitter integration scripts (originally by Selena have been rewritten to work directly with our database of posts instead of pulling back in the RSS feed that the system had just generated, and also to keep the status of posts in the database. With luck, this will fix the very rare case where posts sometimes got dropped, and it made the code a lot simpler.
  • The posts made by the system will refer to the twitter username of the blog owner, if it's registered. For your own blogs, you can see what username is registered by going to the registration site. We've added some of the twitter usernames we know about - if yours is not listed, please let us know at planet@postgresql.org what twitter username to connect with what blog url.
  • The system has been prepared to pull out some usage statistics, but nothing is actually done with that yet.

Help us test a patch for the Win32 shared memory issue

We currently have a patch sitting in the queue from Tsutomu Yamada with modifications from me, all based on an idea from Trevor Talbot some time back. (That should do it for credits) It tries to pre-reserve the shared memory region during DLL initialization, and then releases it just in time to reallocate it as shared memory. (that will do for technical detail for now) This should hopefully fix the infamous "failed to re-attach to shared memory" errors we've been seeing on Windows.

We need your help to test it!

We need help both from people who are experiencing the problem - to see if it solves it, and from people who are not experiencing it - to make sure it doesn't cause any new problems.

Dave has built binaries for 8.3.7 and 8.4.0. To test the patch, stop your server, take a backup copy of your postgres.exe file, and replace it with the file from the appropriate ZIP file before. Restart the server, and see if it works!

Once you have tested, please report your success to the pgsql-hackers list, or directly to me and I'll tally it up.

Update: These patched binaries will only work if you installed from the One-click installer. Specifically, they will not work if you installed from the MSI installer due to a mismatch in the configuration option for integer vs floating point datetime handling.

Planet updates

I've just deployed a new version of the code that runs http://planet.postgresql.org. Most of this code was written by Selena and me during the initial days at PGCon. It just needed some minor polishing, which I didn't get around to until now. So, the new things are:

Support for Team blogs : This is just a grouping of existing blogs, not actually something new we parse. The idea is to give some exposure to a team someone works for - for example, a specific PostgreSQL support company.

Top posters list : The list of all subscriptions has been replaced with a list of top posters. The list was becoming a bit too large to manage, and didn't really fill a purpose. And it was hard to integrate nicely with the Team blogs feature.

There has also been a bunch of internal changes : Details available in the git repo on http://git.postgresql.org.

If you want to make use of the Team blogs feature, this has unfortunately not been implemented in the admin interface. We (well, me, really) were just a bit too lazy for that. So if you want to make use of it, please just send an email to planet@postgresql.org letting us know what name you want for the team, and which blogs to add to it (these blogs should already be subscribed to planet).

Getting a range of entries centered around a point

I had a question yesterday on an internal IRC channel from one of my colleagues in Norway about a SQL query that would "for a given id value, return the 50 rows centered around the row with this id", where the id column can contain gaps (either because they were inserted with gaps, or because there are further WHERE restrictions in the query).

I came up with a reasonably working solution fairly quickly, but I made one mistake. For fun, I asked around a number of my PostgreSQL contacts on IM and IRC for their solutions, and it turns out that almost everybody made the exact same mistake at first. I'm pretty sure all of them, like me, would've found and fixed that issue within seconds if they were in front of a psql console. But I figured that was a good excuse to write a blog post about it.

The solution itself becomes pretty simple if you rephrase the problem as "for a given id value, return the 25 rows preceding and the 25 rows following the row with this id". That pretty much spells a UNION query. Thus, the solution to the problem is:


    SELECT * FROM (
        SELECT id,field1,field2 from mytable where id >= 123456 order by id limit 26
    ) AS a
UNION ALL
    SELECT * FROM (
        SELECT id,field1,field2 from mytable where id < 123456 order by id desc limit 25
    ) AS b
ORDER BY id;

The mistake everybody made? Forgetting that you need a subselect in order to use LIMIT. Without subselects, you can't put ORDER BY or LIMIT inside the two separate parts of the query, only at the outer end of it. But we specifically need to apply the LIMIT individually, and the ORDER BY needs to be different for the two parts.

Another question I got around this was, why use UNION ALL. We know, after all, that there are no overlapping rows so the result should be the same as for UNION. And this is exactly the reason why UNION ALL should be used, rather than a plain UNION. We know it - the database doesn't. A UNION query will generate a plan that requires an extra unique node at the top, to make sure that there are no overlapping rows. So the tip here is - always use UNION ALL rather than UNION whenever you know that the results are not overlapping.

All things considered, this query produces a pretty quick plan even for large datasets, since it allows us to do two independent index scans, one backwards. Since there are LIMIT nodes on the scans, they will stop running as soon as they have produced the required number of rows, which is going to be very small compared to the size of the table. This is the query plan I got on my test data:


 Sort  (cost=54.60..54.73 rows=51 width=86)
   Sort Key: id
   ->  Append  (cost=0.00..53.15 rows=51 width=86)
         ->  Limit  (cost=0.00..35.09 rows=26 width=51)
               ->  Index Scan using mytable_pk on mytable  (cost=0.00..55425.06 rows=41062 width=51)
                     Index Cond: (id >= 100000)
         ->  Limit  (cost=0.00..17.04 rows=25 width=51)
               ->  Index Scan Backward using mytable_pk on mytable  (cost=0.00..56090.47 rows=82306 width=51)
                     Index Cond: (id < 100000)

And yes, the final ORDER BY is still needed if we want the total result to come out in the correct order. With the default query plan, it will come out in the wrong order after the append node. But it's important to remember that by the specification the database is free to return the rows in any order it chooses unless there is an explicit ORDER BY in the query. The rows may otherwise be returned in a completely different order between different runs, depending on the size/width of the table and other parameters.

pgcon photos

Just a quick note to let people know I have uploaded my [ photos from pgcon]. They're not as many as last year, and not really good, but there are at least some for people to look at :-)

I have only started tagging up names. If you know more of them, just drop me an email with photo link and name. Thanks!

pgcon is done

I'm currently sitting in Frankfurt Airport waiting for my connecting flight back home to Stockholm, and I figure this is a good time to sum up the rest pgcon that ended a couple of days ago.

The second day of talks, Friday, began with what must almost be called a developer keynote. PGDG "giants" Tom Land and Bruce Momjian gave a talk on how to get your patch accepted into PostgreSQL. I think they did a good job of showing some of the general thoughts that are behind this process in a good way. And it was fun to finally get to see Tom do a talk at one of these conferences...

After this I split a slot between the Wisconsin Courts talk and Selenas VACUUM talk, since I had to take a phonecall in the middle of the talk. Why does this always happen? Thus, didn't see enough of either talk to really make any comments..

After lunch I did the temporal data talk, but I admit to not following it too closely - not really something I was deeply interested in, but this was really the only time when there wasn't a talk in any of the tracks that really interested me.

In the last of the regular talks, I went to Gavin's talk about Golconde. Sounds like a very interesting piece of technology. I don't actually have any use-case for it at this time, but I'm sure I will come across them eventually - and at least now I know how to pronounce it (which I hear Gavin's colleagues are having some issues with)

The last scheduled slot was the lightning talks. This year they were not scheduled up against any regular talk - good move by the schedulers (I was on the program committee, but didn't help out with the scheduling, so I can take no credit myself). Several very interesting and a couple of fun talks, and some that did both. The award for best lightning talk this year has to go to Josh Tolley and his talk on How to not review a patch (Josh: you get no link since your endpoint blog seems to not support author links?!)

Writing up this reminds me: I have yet to review several of these talks on the pgcon website. If you were there and haven't done so yet - please do it now! Most speakers really appreciate the feedback - I know I certainly do. It's what helps us be better next year! It will also help the program committee pick which talks are most interesting for next year.

I skipped out on the tourism-in-ottawa tour by Dan since I've done that the previous years, and instead took a train up to Montreal with Greg Stark, Dave Page, Selena Deckelmann and Bruce Momjian. Greg gave us a nice tour of that city instead (where he's originally from). And it was certainly thorough - there's this one roundabout that we did at least 3 laps in... Obviously we failed to completely stop talking about PostgreSQL, but at least that wasn't the main focus.

Left Montreal Sunday evening and arrived back in Europe Monday morning, and am now just waiting for the connecting flight to do the last leg back to Stockholm, and back to the regular work.

So, the short version of the pgcon summary:
Talks track : excellent
Hallway track : excellent
Bar track : excellent
Shawarma track : good
$(other) track : excellent

If you didn't go to pgcon this year, this is a good time to start thinking about going next! And don't forget pgday.eu in Paris this November!

pgcon, 1st talk day

We're now up to the third day of pgcon, the first one of the actual conference - the previous ones being dedicated to tutorials. The day started with Selena, me and Dave doing a semi-improvised keynote. Well, it started with Dan saying welcome and going through some details, but he doesn't count... I doubt we actually spread any knowledge with that talk, but at least we got to plug some interesting talks at the conference, and show pictures of elephants.

Missed the start of the Aster talk on Petabyte databases using standard PostgreSQL, but the parts I caught sounded very interesting. I'm especially excited to hear they are planning to contribute a whole set of very interesting features back to core PostgreSQL. This makes a lot of sense since they're building their scaling on standard PostgreSQL and not a heavily modified one like some other players in the area, and it's very nice to see that they are realizing this.

After this talk, it was time for my own talk on PostgreSQL Encryption. I had a hard time deciding the split between pgcrypto and SSL when I made the talk, but I think it came out fairly well. Had a number of very good questions at the end, so clearly some people were interested. Perhaps even Bruce managed to learn something...

After this we had lunch, and I'm now sitting in Greg Smiths talk about benchmarking hardware. This is some very low level stuff compared to what you usually see around database benchmarking, but since this is what sits underneath the database, it's important stuff. And very interesting.

The rest of the day has a lineup of some very nice talks, I think. So there'll be no sitting around in the hallway! And in the evening there is the EnterpriseDB party, of course!

Yesterday had the developer meeting, where a bunch (~20) of the most active developers that are here in Ottawa sat down together for the whole day to discuss topics around the next version of PostgreSQL, and how our development model works. Got some very important discussions started, and actually managed to get agreement on a couple of issues that have previously been going in circles. All in all, a very useful day.

Getting started at pgcon

I arrived in Ottawa on Sunday evening after a pretty long flight over from Stockholm. Completely by random I met Josh Berkus at Chicago O'Hare, and it turned out we were on the same flight to Ottawa. Had a nice dinner with Josh and Dan Langille, the PgCon organizer, at an Indian place.

Monday morning, met up with Selena, Dan and Josh again for breakfast close to our hotel. And somehow we got suckered into doing the keynote on thursday. Actually, I think it went down this way: Josh volunteered Selena to do it. Selena volunteered me to do it. And I volunteered Dave. In the end we'll end up doing it together - and of course Dan will do the general conference introduction. We haven't really gotten started on the actual talk itself, so if you have good ideas for it, feel free to let us know...

Not much time go to be spent on the slides yesterday, as me and Selena a some kind of mini web-hackathon. We spent time working on some features for Planet PostgreSQL. Some cleanup took a bit longer than expected so they're not actually out yet, but they will be soon... I see Selena thinks we're going to deploy it to the production server today, but I'm very doubtful about that. We'll see.

Last night was really when a lot of known people started turning up, first for dinner at Works Burger and then for beer a the Royal Oak. The big news from the Oak was that Stephen didn't fall asleep this year. Other than that, things were pretty much as usual.

Today has been spent mostly working on slides for my regular talk. I skipped out on both Stephen Frost's tutorial on access control and Josh's updated version of performance whack-a-mole - my dedicated slide-making time will go to the keynote, so I have to finish the slides now.

Rumor has it Dave has now arrived in Ottawa and should show up soon. So keynote work (at the Oak) will probably start shortly.

Tomorrow's second tutorial day, but for me and many of the most active backend hackers that are here, it's a day of meeting up with other developers and discuss what's going to happen in PostgreSQL for the next year or so. It was a great success last year, and I'm sure everybody is expecting an equally valuable as last year. And the appropriate thanks out to Dave and EnterpriseDB for arranging the meetup and picking up the tab.

Why are you not logging your DDL?

Last week I had yet another customer issue where "someone" had been issuing DDL statements in the database. And nobody knew who. Or why. But (surprise!) it broke things (and they weren't even running Slony!). There are two simple lessons to be learned from this:

In a production environment, arbitrary DDL statements are normally not run. If they are, you really need to look over your application design, because it's broken. Note that this does not include temporary tables. Also things like automating the creation of new partitions are also pretty normal. But the important thing there is that it's controlled and scheduled work, not arbitrary statements.

So, you'll want to keep track of your DDL. PostgreSQL provides a very simple and good way to do this. Set the configuration parameter log_statement='ddl'. The default value for this parameter is none, and there are also options for logging all DML and all statements period. But for a production environment, I find the ddl option to be very useful. So useful, in fact, that I'd consider it an installation bug in most environments if it's not set. So if this parameter is not set in your production environment, now is a good time to reconsider that decision.

The second thing to learn comes from the fact that once we tracked it down, it turned out that the DDL was issued from the application server. Which was running with superuser privileges. Now that's a much larger bug in the deployment, and a failure waiting to happen. There's a very simple lesson to learn from this: the application server should never run with superuser privileges. It should also not run with a user that has permissions to issue any DDL. This is simply the principle of least privilege - or at least principle of not insanely high privileges.

Yes, there are a number of application servers and frameworks that issue their own DDL as part of their ORM. The best way to handle them is, IMHO, to have them generate the SQL output and then manually apply that using a high privilege account. Because DDL should only be issued as part of upgrades and similar things, this should not be an issue. If the application server does not support this, a workaround is to give the application server DDL permissions during the upgrade only, and then take them away as soon as the upgrade is completed.

And yes, you should do this on your developer systems as well, and not just in production. Because if you only do it in production, you won't notice your bugs until you have deployed. It may seem like a lot of extra work to begin with, but it really is only a little extra work once you have got the procedures in place. And it can save you a lot of forensics work once something has happened.

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

PGConf.EU 2020
Oct 20-23, 2020
Berlin, Germany

Past

Warsaw User Group
Jun 29, 2020
Virtual, Virtual
Postgres Vision
Jun 23-24, 2020
Online, Virtual
PGCon 2020
May 26-29, 2020
Online, Virtual
pgDay.paris 2020
Mar 26, 2020
Paris, France
Nordic PGDay 2020
Mar 24, 2020
Helsinki, Finland
More past conferences