Planet updates

I've just deployed a new version of the code that runs http://planet.postgresql.org. Most of this code was written by Selena and me during the initial days at PGCon. It just needed some minor polishing, which I didn't get around to until now. So, the new things are:

Support for Team blogs : This is just a grouping of existing blogs, not actually something new we parse. The idea is to give some exposure to a team someone works for - for example, a specific PostgreSQL support company.

Top posters list : The list of all subscriptions has been replaced with a list of top posters. The list was becoming a bit too large to manage, and didn't really fill a purpose. And it was hard to integrate nicely with the Team blogs feature.

There has also been a bunch of internal changes : Details available in the git repo on http://git.postgresql.org.

If you want to make use of the Team blogs feature, this has unfortunately not been implemented in the admin interface. We (well, me, really) were just a bit too lazy for that. So if you want to make use of it, please just send an email to planet@postgresql.org letting us know what name you want for the team, and which blogs to add to it (these blogs should already be subscribed to planet).

Getting a range of entries centered around a point

I had a question yesterday on an internal IRC channel from one of my colleagues in Norway about a SQL query that would "for a given id value, return the 50 rows centered around the row with this id", where the id column can contain gaps (either because they were inserted with gaps, or because there are further WHERE restrictions in the query).

I came up with a reasonably working solution fairly quickly, but I made one mistake. For fun, I asked around a number of my PostgreSQL contacts on IM and IRC for their solutions, and it turns out that almost everybody made the exact same mistake at first. I'm pretty sure all of them, like me, would've found and fixed that issue within seconds if they were in front of a psql console. But I figured that was a good excuse to write a blog post about it.

The solution itself becomes pretty simple if you rephrase the problem as "for a given id value, return the 25 rows preceding and the 25 rows following the row with this id". That pretty much spells a UNION query. Thus, the solution to the problem is:


    SELECT * FROM (
        SELECT id,field1,field2 from mytable where id >= 123456 order by id limit 26
    ) AS a
UNION ALL
    SELECT * FROM (
        SELECT id,field1,field2 from mytable where id < 123456 order by id desc limit 25
    ) AS b
ORDER BY id;

The mistake everybody made? Forgetting that you need a subselect in order to use LIMIT. Without subselects, you can't put ORDER BY or LIMIT inside the two separate parts of the query, only at the outer end of it. But we specifically need to apply the LIMIT individually, and the ORDER BY needs to be different for the two parts.

Another question I got around this was, why use UNION ALL. We know, after all, that there are no overlapping rows so the result should be the same as for UNION. And this is exactly the reason why UNION ALL should be used, rather than a plain UNION. We know it - the database doesn't. A UNION query will generate a plan that requires an extra unique node at the top, to make sure that there are no overlapping rows. So the tip here is - always use UNION ALL rather than UNION whenever you know that the results are not overlapping.

All things considered, this query produces a pretty quick plan even for large datasets, since it allows us to do two independent index scans, one backwards. Since there are LIMIT nodes on the scans, they will stop running as soon as they have produced the required number of rows, which is going to be very small compared to the size of the table. This is the query plan I got on my test data:


 Sort  (cost=54.60..54.73 rows=51 width=86)
   Sort Key: id
   ->  Append  (cost=0.00..53.15 rows=51 width=86)
         ->  Limit  (cost=0.00..35.09 rows=26 width=51)
               ->  Index Scan using mytable_pk on mytable  (cost=0.00..55425.06 rows=41062 width=51)
                     Index Cond: (id >= 100000)
         ->  Limit  (cost=0.00..17.04 rows=25 width=51)
               ->  Index Scan Backward using mytable_pk on mytable  (cost=0.00..56090.47 rows=82306 width=51)
                     Index Cond: (id < 100000)

And yes, the final ORDER BY is still needed if we want the total result to come out in the correct order. With the default query plan, it will come out in the wrong order after the append node. But it's important to remember that by the specification the database is free to return the rows in any order it chooses unless there is an explicit ORDER BY in the query. The rows may otherwise be returned in a completely different order between different runs, depending on the size/width of the table and other parameters.

pgcon photos

Just a quick note to let people know I have uploaded my [ photos from pgcon]. They're not as many as last year, and not really good, but there are at least some for people to look at :-)

I have only started tagging up names. If you know more of them, just drop me an email with photo link and name. Thanks!

pgcon is done

I'm currently sitting in Frankfurt Airport waiting for my connecting flight back home to Stockholm, and I figure this is a good time to sum up the rest pgcon that ended a couple of days ago.

The second day of talks, Friday, began with what must almost be called a developer keynote. PGDG "giants" Tom Land and Bruce Momjian gave a talk on how to get your patch accepted into PostgreSQL. I think they did a good job of showing some of the general thoughts that are behind this process in a good way. And it was fun to finally get to see Tom do a talk at one of these conferences...

After this I split a slot between the Wisconsin Courts talk and Selenas VACUUM talk, since I had to take a phonecall in the middle of the talk. Why does this always happen? Thus, didn't see enough of either talk to really make any comments..

After lunch I did the temporal data talk, but I admit to not following it too closely - not really something I was deeply interested in, but this was really the only time when there wasn't a talk in any of the tracks that really interested me.

In the last of the regular talks, I went to Gavin's talk about Golconde. Sounds like a very interesting piece of technology. I don't actually have any use-case for it at this time, but I'm sure I will come across them eventually - and at least now I know how to pronounce it (which I hear Gavin's colleagues are having some issues with)

The last scheduled slot was the lightning talks. This year they were not scheduled up against any regular talk - good move by the schedulers (I was on the program committee, but didn't help out with the scheduling, so I can take no credit myself). Several very interesting and a couple of fun talks, and some that did both. The award for best lightning talk this year has to go to Josh Tolley and his talk on How to not review a patch (Josh: you get no link since your endpoint blog seems to not support author links?!)

Writing up this reminds me: I have yet to review several of these talks on the pgcon website. If you were there and haven't done so yet - please do it now! Most speakers really appreciate the feedback - I know I certainly do. It's what helps us be better next year! It will also help the program committee pick which talks are most interesting for next year.

I skipped out on the tourism-in-ottawa tour by Dan since I've done that the previous years, and instead took a train up to Montreal with Greg Stark, Dave Page, Selena Deckelmann and Bruce Momjian. Greg gave us a nice tour of that city instead (where he's originally from). And it was certainly thorough - there's this one roundabout that we did at least 3 laps in... Obviously we failed to completely stop talking about PostgreSQL, but at least that wasn't the main focus.

Left Montreal Sunday evening and arrived back in Europe Monday morning, and am now just waiting for the connecting flight to do the last leg back to Stockholm, and back to the regular work.

So, the short version of the pgcon summary:
Talks track : excellent
Hallway track : excellent
Bar track : excellent
Shawarma track : good
$(other) track : excellent

If you didn't go to pgcon this year, this is a good time to start thinking about going next! And don't forget pgday.eu in Paris this November!

pgcon, 1st talk day

We're now up to the third day of pgcon, the first one of the actual conference - the previous ones being dedicated to tutorials. The day started with Selena, me and Dave doing a semi-improvised keynote. Well, it started with Dan saying welcome and going through some details, but he doesn't count... I doubt we actually spread any knowledge with that talk, but at least we got to plug some interesting talks at the conference, and show pictures of elephants.

Missed the start of the Aster talk on Petabyte databases using standard PostgreSQL, but the parts I caught sounded very interesting. I'm especially excited to hear they are planning to contribute a whole set of very interesting features back to core PostgreSQL. This makes a lot of sense since they're building their scaling on standard PostgreSQL and not a heavily modified one like some other players in the area, and it's very nice to see that they are realizing this.

After this talk, it was time for my own talk on PostgreSQL Encryption. I had a hard time deciding the split between pgcrypto and SSL when I made the talk, but I think it came out fairly well. Had a number of very good questions at the end, so clearly some people were interested. Perhaps even Bruce managed to learn something...

After this we had lunch, and I'm now sitting in Greg Smiths talk about benchmarking hardware. This is some very low level stuff compared to what you usually see around database benchmarking, but since this is what sits underneath the database, it's important stuff. And very interesting.

The rest of the day has a lineup of some very nice talks, I think. So there'll be no sitting around in the hallway! And in the evening there is the EnterpriseDB party, of course!

Yesterday had the developer meeting, where a bunch (~20) of the most active developers that are here in Ottawa sat down together for the whole day to discuss topics around the next version of PostgreSQL, and how our development model works. Got some very important discussions started, and actually managed to get agreement on a couple of issues that have previously been going in circles. All in all, a very useful day.

Getting started at pgcon

I arrived in Ottawa on Sunday evening after a pretty long flight over from Stockholm. Completely by random I met Josh Berkus at Chicago O'Hare, and it turned out we were on the same flight to Ottawa. Had a nice dinner with Josh and Dan Langille, the PgCon organizer, at an Indian place.

Monday morning, met up with Selena, Dan and Josh again for breakfast close to our hotel. And somehow we got suckered into doing the keynote on thursday. Actually, I think it went down this way: Josh volunteered Selena to do it. Selena volunteered me to do it. And I volunteered Dave. In the end we'll end up doing it together - and of course Dan will do the general conference introduction. We haven't really gotten started on the actual talk itself, so if you have good ideas for it, feel free to let us know...

Not much time go to be spent on the slides yesterday, as me and Selena a some kind of mini web-hackathon. We spent time working on some features for Planet PostgreSQL. Some cleanup took a bit longer than expected so they're not actually out yet, but they will be soon... I see Selena thinks we're going to deploy it to the production server today, but I'm very doubtful about that. We'll see.

Last night was really when a lot of known people started turning up, first for dinner at Works Burger and then for beer a the Royal Oak. The big news from the Oak was that Stephen didn't fall asleep this year. Other than that, things were pretty much as usual.

Today has been spent mostly working on slides for my regular talk. I skipped out on both Stephen Frost's tutorial on access control and Josh's updated version of performance whack-a-mole - my dedicated slide-making time will go to the keynote, so I have to finish the slides now.

Rumor has it Dave has now arrived in Ottawa and should show up soon. So keynote work (at the Oak) will probably start shortly.

Tomorrow's second tutorial day, but for me and many of the most active backend hackers that are here, it's a day of meeting up with other developers and discuss what's going to happen in PostgreSQL for the next year or so. It was a great success last year, and I'm sure everybody is expecting an equally valuable as last year. And the appropriate thanks out to Dave and EnterpriseDB for arranging the meetup and picking up the tab.

Why are you not logging your DDL?

Last week I had yet another customer issue where "someone" had been issuing DDL statements in the database. And nobody knew who. Or why. But (surprise!) it broke things (and they weren't even running Slony!). There are two simple lessons to be learned from this:

In a production environment, arbitrary DDL statements are normally not run. If they are, you really need to look over your application design, because it's broken. Note that this does not include temporary tables. Also things like automating the creation of new partitions are also pretty normal. But the important thing there is that it's controlled and scheduled work, not arbitrary statements.

So, you'll want to keep track of your DDL. PostgreSQL provides a very simple and good way to do this. Set the configuration parameter log_statement='ddl'. The default value for this parameter is none, and there are also options for logging all DML and all statements period. But for a production environment, I find the ddl option to be very useful. So useful, in fact, that I'd consider it an installation bug in most environments if it's not set. So if this parameter is not set in your production environment, now is a good time to reconsider that decision.

The second thing to learn comes from the fact that once we tracked it down, it turned out that the DDL was issued from the application server. Which was running with superuser privileges. Now that's a much larger bug in the deployment, and a failure waiting to happen. There's a very simple lesson to learn from this: the application server should never run with superuser privileges. It should also not run with a user that has permissions to issue any DDL. This is simply the principle of least privilege - or at least principle of not insanely high privileges.

Yes, there are a number of application servers and frameworks that issue their own DDL as part of their ORM. The best way to handle them is, IMHO, to have them generate the SQL output and then manually apply that using a high privilege account. Because DDL should only be issued as part of upgrades and similar things, this should not be an issue. If the application server does not support this, a workaround is to give the application server DDL permissions during the upgrade only, and then take them away as soon as the upgrade is completed.

And yes, you should do this on your developer systems as well, and not just in production. Because if you only do it in production, you won't notice your bugs until you have deployed. It may seem like a lot of extra work to begin with, but it really is only a little extra work once you have got the procedures in place. And it can save you a lot of forensics work once something has happened.

What's your favorite 8.4 feature?

Over the past month or so, I have been informally polling those of our customers I've been meeting with about which features in the upcoming 8.4 version of PostgreSQL they are most excited about, and most likely to have use for in the short-to-medium term. Part of the results were exactly what I expected, other parts were a bit more surprising.

First of all, the one feature that every single one said they were very much looking forward to is the updated Free Space Map implementation, which will remove two very annoying and difficult-to-get-right configuration parameters (max_fsm_pages and max_fsm_relations). So kudos to Heikki for that. Closely related to this feature is the Visibility Map, and it's reduction of VACUUM requirements. This is, however, not something a lot of our customers will have a direct use for - other than one or two, they don't actually have any issues with VACUUM (once they learned to remove VACUUM FULL from the nightly cronjobs).

When looking closer at the SQL level functionality that people have been excited above, I expected to see Common Table Expressions (CTEs), AKA Recursive Queries at the top of the list, based on requests on our lists in the future. However, it seems that at least to our customers, Window Aggregates are a lot more interesting than CTEs. I'm sure there are a lot of use-cases for recursive queries - there just seem to be even more of them for window aggregates.

Parallel restoring of dumps is also fairly high on peoples list, though most of the people that can really use it are deploying different ways to deal with it already. But they will definitely be using it once it's there. As well as the speedups of the PITR recovery and warm standby.

There are a lot of other features in the upcoming version as well. Some are small, some are large. There are still some smaller features (for example \ef in psql) that I keep missing a lot every time I end up working on an earlier version. They're all (well, most of them) great, but maybe not so obvious at first.

What's your favorite 8.4 feature?

Updates to the postgresql.org git service

As Peter has already noted, we have been planning for a while to update the service on git.postgresql.org, and as of now we just flipped the switch to the new version. We are still waiting for the DNS zone to update (yes, we could've lowered the TTL. No, we didn't think it was that important). So within 5 hours, anybody accessing git.postgresql.org through http, git or ssh will be accessing the new server.

First a note to all those people who use the git clone of the main cvs repository to do PostgreSQL development: nothing has changed for you. The URL is the same, the repository is the same, just keep pulling.

The only real difference there should be that the new server the service is hosted on has a little bit more bandwidth available, but it should not normally be noticeable.

The changes are all around the repositories that are originally hosted at the site. The changes have been made to make it easier to collaborate on projects, and to require less manual work to set things up. It is our hope that this will lower the bar of entry for those who are interested in developing PostgreSQL-related projects using git. The main changes for these users are: * Login is now integrated with the postgresql.org community account system. This means unified usernames across PostgreSQL services, such as for example the wiki. You just need to upload your SSH key to enable access. * Users no longer have shell accounts on the server. Instead, all users use git@git.postgresql.org as their ssh login. This makes the management and securing of the server much easier. * It is now possible for the owner of a repository to delegate permissions to other users directly from a web interface. As long as the other user has also uploaded his ssh key, this will be completely automatic. * It is now possible to request a new repository for your code using the same web interface. Requests for new repositories will still have to be approved before the repository is created.

Per an inventory that was made before the switch was made, several inactive repositories were not migrated to the new server. If you are missing a repository that was for some reason not migrated, feel free to contact us at [mailto:pgsql-www@postgresql.org the pgsql-www mailinglist] for a recovery - we will keep all the old files around for a while just in case.

As a final note, the source code for the git management tool itself is of course available in the git repository (when your DNS has updated).

PostgreSQL security updates are out

PostgreSQL 8.3.7, 8.2.13, 8.1.17, 8.0.21 and 7.4.25 have been released and are available for download from www.postgresql.org. They contain the fix for a low-risk security issue as well as several other minor updates. All users are advised to upgrade when possible.

Continue reading

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

PGDay Chicago 2024
Apr 26, 2024
Chicago, USA
PGConf.DEV 2024
May 28-31, 2024
Vancouver, Canada

Past

SCaLE 2024
Mar 14-17, 2024
Pasadena, USA
Nordic PGDay 2024
Mar 12, 2024
Oslo, Norway
FOSDEM PGDay 2024
Feb 2-4, 2024
Brussels, Belgium
PGConf.EU 2023
Dec 12-15, 2023
Prague, Czechia
PGConf.NYC 2023
Oct 3-5, 2023
New York, USA
More past conferences