PostgreSQL - now on git!

So it finally happened. The official PostgreSQL master source tree is now managed in git, instead of cvs. This means, amongst other things, that the worlds most advanced open source database now has a version control system with.. eh. atomic commits!

Like the first run, this one had some issues with it, but it was smaller and resolved in time not to have to roll back. This time, it turned out that the cvs version that ships in Debian GNU/Linux comes with patches that change the default date format to the ISO standard. But since one of our main requirements on the conversion was to be able to faithfully represent the old versions of the code, this broke every single file - since we used CVS keyword expansion in the old tree. Once we found this, it was a simple case of adding the DateFormat=old parameter to the CVS config file and re-run the whole conversion - which took several hours.

A lot of work went into making the repository conversion correct. Some of this was due to issues in the toolchain used - many thanks to Michael Haggerty and Max Bowsher for getting those fixed and explaining some of the behaviors of the software for us. In the end, a number of things needed to be changed in our existing CVS repository to make it migrate properly. Tom Lane provided a big patch to apply to the CVS repository itself prior to the conversion that cleaned most of those up - you can find a copy of it my github page if you're interested.

With this patch applied, we managed a conversion that was very close to the original repository. I personally think this is only because the PostgreSQL project has been very careful about how it deals with it's CVS repository - using it in a fairly simple way. And even with that, we had a number of issues - such as tags moved "after the fact", and branches created off partial checkouts. A fair number of the issues were simply because CVS doesn't have ways to represent everything in a reasonable way, such as issues when a file was deleted, re-added, deleted again, and mix this over different branches.

Git obviously deals with this better, and hopefully we'll have no such issues creeping into the new repository. However, the PostgreSQL project will be sticking with our "conservative approach" to source control - at least for the time being. For this reason, we are restricting what committers can use within git. We still allow any developers (and committers) to use whatever parts of git they want as they develop, but for commits going into the main tree, we are making a number of restrictions:

  • We will not allow merge commits. The PostgreSQL project doesn't follow the "git workflow" - we generally develop our patches on the master branch, and then back-patch to released stable branches for important bugs. We will continue doing this as separate commits and not using merges, thus keeping history linear.
  • We will not use the author field in git to tag it with the patches original author (even in the few cases when the patch is actually authored by a single person). Instead, we will require that author and committer are always set to the same thing, and we will then credit the author(s) (along with the reviewer(s)) in the commit message, just like we've done before.
  • As a follow-on to that requirement, we will require that all committers are the ones registered with the project, using the same name and address on all commits. So even if a patch is developed on a topic branch on say github, it will get collapsed into a single commit (or maybe a couple, depending on size) tagged with the committers name on that.

There has been a lot of discussion around this, and this is how the PostgreSQL project has worked and wants to continue working. We may change this sometime in the future, but not now - we are only changing the tool, and not the workflow.

To enforce these requirements, I've developed a policy hook for our git server that makes sure we don't make the mistake. It's up on my github page, along with the script we use to generate commit mails to the pgsql-committers list that look just the way we want them to.

What does this mean for you as a PostgreSQL user? Really, nothing at all.

What does this mean for you as a PostgreSQL patch developer? Not much. If you did your work off the cvs-to-git mirror, you need to do a new clone. This repository is converted from scratch, so the old one is not valid anymore. We still encourage you to use for example github if you want to do your development there, but the patch submission process remains the same - send a context style diff to the pgsql-hackers mailinglist.

What does this mean for you as a buildfarm-animal maintainer? You need to reconfigure it to use git. I expect Andrew to post instructions on exactly what to do, and keep track of who hasn't done it ;)

Thanks and Well done to all the people involved in making this happen!

A busy week for PostgreSQL - hello 9.0!

In case you missed it (I'm certainly not the first on Planet PostgreSQL to blog about this, but there are supposedly other aggregators), PostgreSQL 9.0 has been released. Comes with spiffy things like Streaming Replication, Hot Standby and Exclusion Constraints! If you aren't up to speed on all the news already, go check out the release notes.

Second, I yesterday committed Thom Browns changes to the stylesheets for the documentation. So not only do you get the documentation for the new version, it also looks a lot better than the old one - nicer formatting for tables and highlighting examples and code in a better way. This is also changed in the old version of the docs.

Finally, tonight we start the second attempt to move the authoritative PostgreSQL source tree to git. It didn't end well last time and we reverted back to cvs, but given the large amount of work put into it by many people, I have much higher hopes this time. Stay tuned...

PGDay.EU announced and call for papers

PGDay.EU 2010 has finally been announced. It will be in Stuttgart, Germany, on December 6th to 8th. More details available on the conference website.

We have also sent out our call for papers. If you have done something interesting with PostgreSQL, please go ahead and submit a talk! We are currently looking for talks in both English and German!

Robotic moderation duties

Every now and then, the discussion about why it takes too long for messages to get approved when posted to some of the PostgreSQL mailinglists comes up, and it goes around a couple of laps. Maybe we get a new moderator. But eventually it comes back down to not enough people. Personally, I've usually just found myself not having the time to keep up with moderation. So when I needed a project to learn some Android development on, I figured this could be an interesting (probably) and useful (maybe) topic.

So, meet Mailinglist Moderator. A tiny android application that helps with the daily moderation chores for anybody moderating Mailman or Majordomo2 mailinglists (should be easy enough to add more list managers if there is any that people actually use).

The application will simply enumerate all unmoderated items and let you set them to accept or reject either one by one or in batch. Personally, I've found it makes it a lot more likely I will do moderation - it's literally down to 30 seconds while waiting for a bus or train, or something like that. And once the hurdle is gone, it's a lot more likely I'll end up actually moderating. I found it useful - hopefully others will as well.

Here are a couple of screenshots showing what the application looks like.

The app is available as an APK for download on my github page. It hasn't been published to the market now, but I'll do that if enough people find it useful and ask me for it...

And of course, this is all BSD licensed open source, so any contributions are welcome!

PostgreSQL Europe Marchandise Store

We've finally opened the merchandise store for PostgreSQL Europe. It's a chance for everybody who haven't had the chance to attend one of the many PostgreSQL events where we've been selling mugs and shirts for a long time, as well as a chance to get some stuff that we haven't previously had available.

There's close to zero earnings for PostgreSQL Europe off these purchases - we're trying to make it as cheap as we can for everybody. You are of course most welcome to donate some extra to the project, should you wish.

Planet integration update

This post is one of those seriously annoying ones that's just here to verify that the updates I've made to the twitter integration of planet works. Since Twitter are terminating the type of authentication we were using, I had to change it. It is a change to a better method, but still somewhat annoying.

If you're interested in looking at the code, it's up on github and on git.postgresql.org.

PostgreSQL Europe election results

The elections for PostgreSQL Europe are now closed, and the full results are published on http://www.postgresql.eu/elections/2/.

The PostgreSQL Europe Board would like to welcome our new board members Dave Page and Guillaume Lelarge, as well as welcome back Andreas Scherbaum who was re-elected.

I would also like to personally thank Gabriele Bartolini, member of our original board and one of the initiators for PostgreSQL Europe, who will now be leaving the board. Gabriele has been instrumental in getting PostgreSQL Europe off the ground, and I'm sure we will see much of him within our community in the future as well, even if he is not serving on the board.

A total of 20 persons voted in the elections, out of the 39 that were eligible.

Thanks to all who participated in the elections!

If you have any further questions around this, please feel free to contact me or any of the other board members (old or new) directly.

PostgreSQL Europe - get your nominations in

We have now opened the nominations period for the upcoming elections to the board of PostgreSQL Europe. It's simple - anybody who is a member of PostgreSQL Europe (if you're in Europe and doing PostgreSQL stuff, as a developer, consultant or just a user, you really should be. It's easy!) can be a candidate to be elected. You just need to be nominated by one member (who can be yourself, just to let people know you are interested in being a candidate) and seconded by one other member - that's all.

So if you're interested in this, or know somebody who should be, post your nominations to the pgeu-general mailinglist. For full details about the procedure, see this email.

PostgreSQL infrastructure updates

At PG-east^H^H^H^H^H^H^HJDCon here in Philly yesterday, Dave announced the infrastructure changes that we decided on late last year, and have been working on and off to get the groundwork done for over the past months (mostly off, or we'd be done already). Since I've had a number of questions around it, I'm going to post a summary here.

The bottom line is, we're changing - over time - how we're going to run and manage the infrastructure behind the postgresql.org services. Things like website, source code repositories, etc. The reason is simple - the way we are doing it now has become too hard to maintain, and just requires too much manual work.

There are several parts to these changes. For one, we are building automated methods for a lot of things that previously required a little manual work. But even a little manual work rapidly becomes very time-consuming when you have a lot of machines - and the number of machines we have have been increasing fast over the past couple of years.

The biggest change is that we will be moving our services off FreeBSD, that we use now, onto Debian GNU/Linux. At the same time, we will switch virtualization solution from FreeBSD Jails to KVM. FreeBSD jails have served us very well over time, but now that we have access to a well working KVM product that uses hardware virtualization, the gains of having full virtualization are much easier to get at.

One of the main drivers for the change is that maintaining ports-based installs are just taking way too much time. The new system will be based heavily around using Debian packages and the apt system, to the point that everything being installed will always be done using "meta-packages" that will ensure that all our machines look the same way. This also plugs into our monitoring systems very well, making applying (security) updates across the many machines much easier. In particular, it should get rid of what's sometimes a multi-day operation to get everything security patched, due to dependencies and the slowness of updating ports.

It is quite possible - in fact, I'd say probable - that there are perfectly fine ways of doing this on FreeBSD. But this outlines another big reason for this change - it's simply a lot easier to find people who can do these things on a Linux based system today. Our efforts are entirely volunteer based, and in the end we just don't have people who know enough FreeBSD volunteering to do this work. In fact, we have had several people retire from the sysadmin team over the years specifically because they refuse to work with FreeBSD. We don't expect this change to magically create more volunteers, but it will make it easier to recruit in the future.

The system will also automate a lot of the configuration work that needs to be done manually today, as well as automatically configure our monitoring systems (primarily Nagios and Munin) and backups. And finally, the system will automatically provision (and un-provision) users and access permissions across all machines from a central repository.

The code is not entirely done yet, but we do have enough in place to automatically provision and configure basic virtual machines in just minutes, as well as much of the configuration. We have prototype deployments online, but we haven't started moving production services yet. We'll hope to get started on that shortly. It will obviously take a long time before we have migrated all services - in fact, we may never migrate all services. The focus will obviously be on the most manpower-intensive ones first.

As usual the code that's used to build these services will be published under the PostgreSQL licence once it's finished. Some security sensitive parts may be removed, but the bulk of it is very generic and should be re-usable as a whole or in parts.

Another issue that's going to delay some changes is the fact that a few of our infrastructure servers are currently not VT-capable, and thus cannot run KVM. In the short term these machines will continue to run FreeBSD Jails, but eventually we want to replace these machines with more modern ones. So if you are sitting on one or more VT-capable machines with enough RAM to run a number of virtual machines, hosted in a datacenter somewhere that can provide us with multiple IP addresses and a decent availability of network and services, that you would like donate to the PostgreSQL project, please get in touch with our sysadmins team to see if this is something we can work with!

Heading west, going east, better schedule, and more

(public service announcement further down!)

I'm starting to get ready for the most mis-named conference so far this year. I mean, seriously. It's called East 2010, and yet it's located approximately 6000 km (that's almost 4000 miles for you Americans) to the west of the prime meridian. That's not even close. Sure, it's to the east of where the West conference is usually held, but really, this reminds me of POSIX timezones...

This will be a somewhat different conference than previous PostgreSQL conferences. It's the first big commercial conference. This has enabled it to change venue from "local university or college where rooms are cheap or free" to a conference hotel in downtown Philadelphia. Whether this is actually good for the talks is yet to be seen, but it's likely going to make some things around the conference easier and more integrated. There will also be a exhibition area - something we have tried for pgday.eu without much interest, but it will hopefully be more successful in this surrounding.

The contents of the conference are also somewhat different, since there is now a clear focus also on "decision makers", something that many PostgreSQL conferences have been lacking in. We may all want it to be true that the decision makers are the people who are actually going to use the product, but we know that's not true. This gives a conference schedule that contains a broader range of talks than we're used to - this can only be good.

Myself, I will be giving an updated version of my security talk, focusing on authentication and SSL. The updates are mainly around the new and changed authentication methods in 9.0, and some minor updates on the SSL part. If you've seen it before it may be an interesting refresher, but you might be better off going to see Greg Smith's benchmarking talk if you haven't...

Now for the second part, which is the public service announcement...

If you're like me, you find the official schedule for East very hard to read and basically useless to get an overview. It's also horrible to use from a device like the iPhone, something I'm quite likely to use at the conference. But we solved this problem for the PostgreSQL Europe conferences, and that code is pretty simple. So some copy/paste of a couple of hundred lines of python, some glue code, and voilĂ , a much more (IMHO) readable schedule. As a bonus, it also generates an aggregate iCalendar feed, that you can plug directly into the calendar application on the iPhone (or I assume most other phones). Google calendar may be very nice to use to work on the schedule, but I find this a lot more user friendly for those reading it - particularly in the ability to get an overview.

The page and the feed will both update once per hour by pulling from the official calendar feeds. They will also adjust for the fact that all the official feeds are 3 hours off due to the calendars being set in the wrong timezone. So should they suddenly jump 3 hours that's because the official ones were fixed - just remind me and I'll take out the adjustment. Anybody is free to use them, but of course, usual disclaimers apply, double check with the official one, etc, etc, etc.

Finally, then mandatory before

and after shots.

If you're interested in the code for the aggregator itself, it's up on my github page. And of course, any patches for cool features or just making it look better are always appreciated - it's open source after all.

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

PGCon 2017
May 23-26, 2017
Ottawa, Canada

Past

pgDay.paris 2017
Mar 23, 2017
Paris, France
Nordic PGDay 2017
Mar 21, 2017
Stockholm, Sweden
Confoo Montreal 2017
Mar 8-10, 2017
Montreal, Canada
SCALE+PGDays
Mar 2-5, 2017
Pasadena, California, USA
Open Source Infrastructure @ SCALE
Mar 2, 2017
Pasadena, California, USA
More past conferences