PostgreSQL infrastructure updates

At PG-east^H^H^H^H^H^H^HJDCon here in Philly yesterday, Dave announced the infrastructure changes that we decided on late last year, and have been working on and off to get the groundwork done for over the past months (mostly off, or we'd be done already). Since I've had a number of questions around it, I'm going to post a summary here.

The bottom line is, we're changing - over time - how we're going to run and manage the infrastructure behind the postgresql.org services. Things like website, source code repositories, etc. The reason is simple - the way we are doing it now has become too hard to maintain, and just requires too much manual work.

There are several parts to these changes. For one, we are building automated methods for a lot of things that previously required a little manual work. But even a little manual work rapidly becomes very time-consuming when you have a lot of machines - and the number of machines we have have been increasing fast over the past couple of years.

The biggest change is that we will be moving our services off FreeBSD, that we use now, onto Debian GNU/Linux. At the same time, we will switch virtualization solution from FreeBSD Jails to KVM. FreeBSD jails have served us very well over time, but now that we have access to a well working KVM product that uses hardware virtualization, the gains of having full virtualization are much easier to get at.

One of the main drivers for the change is that maintaining ports-based installs are just taking way too much time. The new system will be based heavily around using Debian packages and the apt system, to the point that everything being installed will always be done using "meta-packages" that will ensure that all our machines look the same way. This also plugs into our monitoring systems very well, making applying (security) updates across the many machines much easier. In particular, it should get rid of what's sometimes a multi-day operation to get everything security patched, due to dependencies and the slowness of updating ports.

It is quite possible - in fact, I'd say probable - that there are perfectly fine ways of doing this on FreeBSD. But this outlines another big reason for this change - it's simply a lot easier to find people who can do these things on a Linux based system today. Our efforts are entirely volunteer based, and in the end we just don't have people who know enough FreeBSD volunteering to do this work. In fact, we have had several people retire from the sysadmin team over the years specifically because they refuse to work with FreeBSD. We don't expect this change to magically create more volunteers, but it will make it easier to recruit in the future.

The system will also automate a lot of the configuration work that needs to be done manually today, as well as automatically configure our monitoring systems (primarily Nagios and Munin) and backups. And finally, the system will automatically provision (and un-provision) users and access permissions across all machines from a central repository.

The code is not entirely done yet, but we do have enough in place to automatically provision and configure basic virtual machines in just minutes, as well as much of the configuration. We have prototype deployments online, but we haven't started moving production services yet. We'll hope to get started on that shortly. It will obviously take a long time before we have migrated all services - in fact, we may never migrate all services. The focus will obviously be on the most manpower-intensive ones first.

As usual the code that's used to build these services will be published under the PostgreSQL licence once it's finished. Some security sensitive parts may be removed, but the bulk of it is very generic and should be re-usable as a whole or in parts.

Another issue that's going to delay some changes is the fact that a few of our infrastructure servers are currently not VT-capable, and thus cannot run KVM. In the short term these machines will continue to run FreeBSD Jails, but eventually we want to replace these machines with more modern ones. So if you are sitting on one or more VT-capable machines with enough RAM to run a number of virtual machines, hosted in a datacenter somewhere that can provide us with multiple IP addresses and a decent availability of network and services, that you would like donate to the PostgreSQL project, please get in touch with our sysadmins team to see if this is something we can work with!

Heading west, going east, better schedule, and more

(public service announcement further down!)

I'm starting to get ready for the most mis-named conference so far this year. I mean, seriously. It's called East 2010, and yet it's located approximately 6000 km (that's almost 4000 miles for you Americans) to the west of the prime meridian. That's not even close. Sure, it's to the east of where the West conference is usually held, but really, this reminds me of POSIX timezones...

This will be a somewhat different conference than previous PostgreSQL conferences. It's the first big commercial conference. This has enabled it to change venue from "local university or college where rooms are cheap or free" to a conference hotel in downtown Philadelphia. Whether this is actually good for the talks is yet to be seen, but it's likely going to make some things around the conference easier and more integrated. There will also be a exhibition area - something we have tried for pgday.eu without much interest, but it will hopefully be more successful in this surrounding.

The contents of the conference are also somewhat different, since there is now a clear focus also on "decision makers", something that many PostgreSQL conferences have been lacking in. We may all want it to be true that the decision makers are the people who are actually going to use the product, but we know that's not true. This gives a conference schedule that contains a broader range of talks than we're used to - this can only be good.

Myself, I will be giving an updated version of my security talk, focusing on authentication and SSL. The updates are mainly around the new and changed authentication methods in 9.0, and some minor updates on the SSL part. If you've seen it before it may be an interesting refresher, but you might be better off going to see Greg Smith's benchmarking talk if you haven't...

Now for the second part, which is the public service announcement...

If you're like me, you find the official schedule for East very hard to read and basically useless to get an overview. It's also horrible to use from a device like the iPhone, something I'm quite likely to use at the conference. But we solved this problem for the PostgreSQL Europe conferences, and that code is pretty simple. So some copy/paste of a couple of hundred lines of python, some glue code, and voilà, a much more (IMHO) readable schedule. As a bonus, it also generates an aggregate iCalendar feed, that you can plug directly into the calendar application on the iPhone (or I assume most other phones). Google calendar may be very nice to use to work on the schedule, but I find this a lot more user friendly for those reading it - particularly in the ability to get an overview.

The page and the feed will both update once per hour by pulling from the official calendar feeds. They will also adjust for the fact that all the official feeds are 3 hours off due to the calendars being set in the wrong timezone. So should they suddenly jump 3 hours that's because the official ones were fixed - just remind me and I'll take out the adjustment. Anybody is free to use them, but of course, usual disclaimers apply, double check with the official one, etc, etc, etc.

Finally, then mandatory before

and after shots.

If you're interested in the code for the aggregator itself, it's up on my github page. And of course, any patches for cool features or just making it look better are always appreciated - it's open source after all.

FOSS STHML #1

Yesterday was the first meeting for the FOSS STHLM "group" - a (very) loose group of FOSS interested people in the Stockholm region. We met in a lecture hall at the Stockholm University in Kista north of Stockholm, for a couple of hours of short presentations. The lineup was very nice: cool embedded stuff? yup, rockbox. General linux? Yup, upstart. Debian-specific? Of course. OpenSource Sweden? yeah. Curl? What else did you expect with Daniel as one of the organizers? PostgreSQL? Yeah, why else would I be blogging this?

Our allocated time was short, so the setup was many short talks. It actually worked a lot better than I thought it would, but it's still very hard to convey something useful in just 20 minutes.

I had a hard time figuring out what I should focus on, so I did a split into two parts (perfect strategy - if you have too little time to do one thing good, do two things in the same time instead...). I started with a section about "things to think about if you're switching to PostgreSQL or trying it out for the first time" - things like the very most basic config parameters that you always have to touch. And of course the classic - "ident authentication failed" issue that hits everybody on RedHat or Debian platforms at least (which is most of the users - definitely in this crowd). It's hard to do much in 12 minutes, hopefully it got some people interested.

I followed that with a very very very short version of "what's coming up in 9.0". Once again, I focused on one of my personal favorite features, which is Exclusion Constraints. While this is often listed as one of the cool things in 9.0, sometimes I feel that too much focus is on streaming replication and hot standby. Don't get me wrong, these are very good and very much needed features. But Exclusion Constraints is a real eye-opener. All databases (including PostgreSQL, of course) have replication - this is "just another way to do it". Yes, a very important and good way to do it, but it's still not something brand new. Exclusion Constraints is something that's fundamentally new. And it's a brilliant example of how PostgreSQL is moving the goalposts forward. Oh, and it's really useful and cool, of course! (and it'll be even better when we have the period datatype, or something similar, in 9.1!)

In summary, I think it was a great event. Big thanks to all those who helped make it happen! Hopefully we can follow it up with many more in similar ways - perhaps mixing these many-short-talks with some more focused discussions on specific areas or technologies. Time will tell...

Update: For those who asked, there were somewhere around 150 people in the sessions.

Summary of feedback from FOSDEM devroom

I forgot completely to close up the feedback system for our talks at FOSDEM and post the feedback to the speakers. My apologies to all speakers and others who were interested! Thankfully, Marc Balmer reminded me today and the individual ratings for different speakers have been sent off to them.

When it comes to the aggregated statistics, they look fairly similar to what we saw at PGDay. The number of speakers were much lower, and unfortunately so were the number of people giving feedback. But in the end, the results were fairly clear. I am especially happy with the level of knowledge in the speakers we attract. Let's start with the graphs:

Breaking down the details, I think this shows a great quality of our speakers:

Name | Speaker Quality | Votes | Deviation Simon Riggs | 4.8 | 6 | 0.4 Magnus Hagander | 4.6 | 5 | 0.5 David Fetter | 4.5 | 4 | 1 Heikki Linnakangas | 4.3 | 9 | 0.9 Dave Page | 4.2 | 10 | 1.0 Marc Balmer | 3.4 | 5 | 1.7 Peter Eisentraut | 3.3 | 7 | 1.0

And the level of their knowledge is even higher:

Name | Speaker Knowledge | Votes | Deviation Simon Riggs | 5 | 6 | 0 Magnus Hagander | 5 | 5 | 0 Heikki Linnakangas | 5 | 9 | 0 David Fetter | 4.8 | 4 | 0.5 Dave Page | 4.3 | 10 | 1.1 Peter Eisentraut | 4.3 | 7 | 1.1 || Marc Balmer || 3.6 || 5 || 1.7 ||

Even at the bottom of this list, the numbers are very good! Unfortunately, the fairly low number of votes give a high deviation at the bottom of the chart.

For next time, we'd really like to get more people to leave their feedback. Does anybody have a good idea on how to make that happen?

Important updates for PostgreSQL Europe

Those of you who were at FOSDEM, heard my bold announcement that we will finally get off the ground and hold proper elections fort he board of PostgreSQL Europe, per a decision made the day before. Now, when we got home and read up on the statutes that govern the organization, it turns out we actually can't do that. The way forward that we've found around this, is to actually change the statutes to make these things possible.

To change the statutes, we need a majority of the General Assembly, made out of the active members, to agree on a change. It turns out that given how the statues are formed, there are currently only four members in PostgreSQL Europe, and these four members are the original board. Because of this, it was pretty easy to change the statutes (we only needed 3 of these people to agree, which we all did). And this we have done. The reason for changing them is to make sure we can get a working system going where we have more people involved. But in order to make that work, we first had to make this small-group decision.

The changes have just been published to www.postgresql.eu, and should show up any minute there. The changes that have been made are:

  • We have removed the definition of "Benefactor members", to keep things simple. We are keeping Active and Honorary members.
  • Created a membership fee, to control who is a member.
  • Change so that any resignations are handled by the Board, not the President, to make things more flexible.
  • Add that a membership is terminated if the fee isn't paid.
  • Added mailinglist as an approved way to do meetings. Since this is how we do most of our communicating, it made no sense not to have it on there.
  • Moved the budget out of the GA, it's just a tool for the executive board. Keeps down the paperwork.
  • Add a defined way for the GA to vote on issues, using a web based system (to be developed).
  • Since we will be voting over the internet, remove the ability to vote by proxy.

As you can see, the main change is the creation of a membership fee. This is the most controversial change. The idea here is that it will be a nominal fee (€10 for two years of membership - though that exact amount isn't part of the statutes, to keep flexibility). This will make it possible for us to have an actual membership list, which will be the grounds for who can vote.

The current method for handling membership requires the signing and sending around of forms on paper (yes! Actual dead trees!). Given how our organization works, this simply does not work. That is the main driver behind us changing this.

We did evaluate several other ways to deal with membership, ranging from things like requiring physical meetings (not really a good idea for a pan-European organization) to fancy PGP signing schemes (which really is way too complex for a lot of people). Our conclusion was that this was at least the least bad way to do it.

We are going to build on this shortly with more information about exactly how the elections are going to be held. The plan is still to have completed this years election before the summer, and we are well on track to make that goal.

The changes to the English statutes are of course available in our git repository. The binding french changes will be made shortly.

If you have any changes or comments, please direct them at the pgeu-general@postgresql.org mailinglist, or directly contact the board.

Time management by (somebody elses) press-releases

A while back I submitted a couple of talks for PG-East 2010 in Philly, and over the past couple of weeks I've been nagging the organizers semi-frequently to get some pre-info on whether I've been accepted or not, since flight prices started to climb fairly rapidly. The site clearly says information that the information will be available on Feb 15th, so I can't really complain that the answer kept being "don't know yet".

A couple of days ago, I got a note from Dave pinged me with a message asking if I was approved. Turns out this press-release had been posted (by his company, no less). Which explicitly names me as a speaker at the conference.

Took me two more days of chasing down JD, but I now have confirmation I'll be there. I don't actually know what I'll be speaking about, but it's a pretty safe bet it will be PostgreSQL related.

I call this Time management by press releases. If I could only get it to apply to all meetings, I would no longer need to keep my own calendar up to date.

So, I'll see you in Philly!

Integrating django authentication with PostgreSQL

If you've been to any of my pgcrypto talks, you know I'm a fan of exposing a login interface in the database. It's a great way to allow multiple systems to use the same authentication system, without being limited to a single framework. However, it can often be nice and easy to be able to use the account management system in said framework, because it's already there.

When I go to frameworks, django is my current favorite, and I do like the ability to use it's integrated admin system to edit users. For this reason, I wanted to add the ability to use a database function to authenticate other systems using the django user tables. Thanks to the fact that django choose a nicely readable text format for their hashes, and that PostgreSQL has the awesome pgcrypto library, this is pretty simple. Here's a function that will simply return true if authentication was successful, and false if not.


CREATE OR REPLACE FUNCTION django_login (_username text, _password text)
RETURNS boolean
AS $$
BEGIN
   IF EXISTS (SELECT * FROM auth_user WHERE username=_username) THEN
      RETURN encode(pgcrypto.digest(split_part(password, '$', 2) || _password, 'sha1'), 'hex') = 
             split_part(password, '$', 3) FROM auth_user WHERE username=_username;
   ELSE
      RETURN 'f';
   END IF;
END;
$$
LANGUAGE 'plpgsql';

This assumes you have installed pgcrypto in it's own schema, something I always recommend. If you haven't, just remove the schema specifier in the query.

Finally, if you use this type of authentication, remember to use SSL. And don't enable query logging...

64-bit Windows in PostgreSQL

Let me first say that I still stand by my previous comments that PostgreSQL is in much less need of a native 64-bit version on Windows than other databases - and less than PostgreSQL on other platforms. (Both this and the previous post talk about server-side support, having support client-side in libpq and ODBC is obviously important)

That said, I am now happy to say that with the effort of several people, it looks like we now have basic 64-bit support for PostgreSQL on Windows. During the past couple of weeks I have cleaned up and extended the patch from Tsutomu Yamada, which in itself builds on the work of several others, and applied it to cvs. As of today, you can build and run PostgreSQL on 64-bit Windows from the official repository.

This is obviously just the beginning. Unfortunately, Windows in general seems to be lagging behind quite severely on the 64-bit front, and this appears to be particularly true of the open source libraries. I've so far been unable to find 64-bit builds of many of the libraries we rely on for full functionality (gettext, zlib, libxml, libxslt, libiconv, tcl, uuid). While in theory support for these things should show up as soon as they are available, that's obviously not tested. If you know where I can download complete 64-bit binaries (including .lib and .h files for development) of any of these projects, please let me know.

Also, this is not very well tested yet. It passes all the PostgreSQL regression tests. That's pretty much all the tests I've been running on it. This is obviously a point where a lot of people can help - so please do. I have made a set of pre-alpha binaries available here. This is a bare-bones build of cvs head earlier today - things like OpenSSL that do have 64-bit binaries available is also disabled. It comes with no installer - just unzip into a directory, and use the initdb and pg_ctl commands directly. So if you have a 64-bit environment, please download and test this against your application!

PostgreSQL security releases out

As you can see, PostgreSQL has just released new updated versions, which include security fixes. They also contain other critical bug fixes, so even if you are not directly affected by the security issues, plan an upgrade as soon as possible.

One of the security issues that have been patched deal with NULL prefixes in SSL certificate names, a vulnerability that is basically the same one that have surfaced in a lot of different products this autumn, for example in the Mozilla suite of products. There is not really space enough to properly discuss the implications this has in a PostgreSQL environment in the release notes, so I'll try to elaborate some here - given that I wrote the fix for it.

First of all, a quick explanation of what the problem is. PostgreSQL uses OpenSSL to deal with certificates. Prior to the fixed version, we just asked OpenSSL for the name of the certificate, got back a string, and used this one. Now, if you know C coding, you know that a string is terminated by a NULL character. The bug in PostgreSQL is that we did not check the return value from this function, and make sure it returned the same value as the length of the returned string. This means that somebody could embed a NULL value in the certificate, and we would incorrectly parse and validate only the part that was before the NULL value. For example, if someone managed to get a certificate with the common name set to "postgresql.bank.com\0attacker.com", PostgreSQL would match this certificate against "postgresql.bank.com" (or "*.bank.com"), which is not correct. With the fix, the certificate will be rejected completely.

It is important to know that in order to make use of this vulnerability, the attacker needs to convince a trusted CA to sign such a certificate - which is quite obviously malicious. If the attacker cannot get the CA to hand this out, PostgreSQL will reject the certificate before we even get this far. It is arguably also a bug in the CA handling (technical or procedural) to even hand out such a certificate, and that bug need to be exploited before the one in PostgreSQL can be.

In the vast majority of cases, if not all, where PostgreSQL is deployed and actually using certificate validation, the certificates will be handed out by a trusted local CA. In which case, exploiting this vulnerability becomes much harder. This scenario is significantly different from the original scenario this bug was discovered in, which is the web browser. In the web browser case, the browser already trusts a large number of external CAs by default. PostgreSQL will trust no CAs by default (unless you are doing a debian install, in which case they put some default CAs in there - this is another reason why this is a really bad idea from a security perspective). PostgreSQL also does not prompt the user with a potentially incorrect name field on the certificate asking if this is ok or not - it will just reject the certificate if it doesn't match (correctly or incorrectly), closing another attack venue. So the bug is really only significant if you can't trust your CA - but the whole point of the CA is that it is a trusted entity...

PostgreSQL 8.4 is the first version to properly support certificate name validation, and also the first version to support client certificate authentication, both of which are vulnerable to this bug, neither of which is enabled by default. However, previous versions are also indirectly vulnerable, because they exposed the CN field of the certificate to the application for further validation. So you could have a stored procedure checking the client certificate, or just the libpq application checking the server certificate, even in earlier versions. And given the API structure, there was no way for these outside processes to know if they were being fooled or not. So if you are using an application that makes use of this on previous versions of PostgreSQL, you still need the patch - there is no way to fix the bug from the application.

The summary of this post is that this vulnerability is a lot less serious in PostgreSQL than in many other systems that had the issue. That doesn't mean it's not there, and that it should be (and have been) fixed. But it means that this vulnerability alone is likely not reason enough to rush an upgrade on your production systems - most likely you're not affected by it. On the PostgreSQL security page it is tagged with classification A, which is the highest. This is more an indication that the system we're using for classification really doesn't take these things into consideration - something we will look into for the future.

Transferring slonified databases without slony

If you back up your Slony database with pg_dump, and try to reload it on a different machine (say transfer from a production system to a testing or benchmarking system), you've probably come across this problem more than once.

The dump will include all the Slony objects, with functions and triggers, that you simply cannot reload on a different machine - unless that machine also has Slony installed, and in the same way. A common way to do this is to just do a restore and ignore the errors - but if your database has lots of objects in it, that makes it very hard to spot actual errors - I always prefer to run with -1 and/or -e.

The first step to fix the problem is to exclude the Slony schema when dumping or restoring. That gets rid of most of the problem, but not all. There are still triggers in the main schemas that reference functions in the Slony schema, and now they will fail. Luckily, pg_restore has functionality to generate a table of contents from a dump, and then you can edit this table of contents file to exclude the triggers specifically. If your database isn't too complicated, you can easily script this.

Which brings me to the point of this post. It's actually very simple to script this, as long as the name of your slony schema doesn't conflict with other objects in your database (including the underscore). This is something that I know a lot of people keep doing manually (given the number of questions I hear about it when I say you should always use -e when restoring, for example). So here is a small semi-generic script that will do this for you - sed to the rescue. It simply comments out all the references to your slony schema.

Continue reading

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

PGConf.EU 2017
Oct 24-27, 2017
Warsaw, Poland
PGConf.Asia
Dec 4-6, 2017
Tokyo, Japan

Past

Postgres Open 2017
Sep 6-8, 2017
San Francisco, USA
PGDay.RU
Jul 5-7, 2017
St Petersburg, Russia
PGDay.UK
Jul 4, 2017
London, UK
Amsterdam PUG
Jun 29, 2017
Amsterdam, Netherlands
PGCon 2017
May 23-26, 2017
Ottawa, Canada
More past conferences