Viewing entries tagged with cvs. Return to full view.

PostgreSQL - now on git!

So it finally happened. The official PostgreSQL master source tree is now managed in git, instead of cvs. This means, amongst other things, that the worlds most advanced open source database now has a version control system with.. eh. atomic commits!

Like the first run, this one had some issues with it, but it was smaller and resolved in time not to have to roll back. This time, it turned out that the cvs version that ships in Debian GNU/Linux comes with patches that change the default date format to the ISO standard. But since one of our main requirements on the conversion was to be able to faithfully represent the old versions of the code, this broke every single file - since we used CVS keyword expansion in the old tree. Once we found this, it was a simple case of adding the DateFormat=old parameter to the CVS config file and re-run the whole conversion - which took several hours.

A lot of work went into making the repository conversion correct. Some of this was due to issues in the toolchain used - many thanks to Michael Haggerty and Max Bowsher for getting those fixed and explaining some of the behaviors of the software for us. In the end, a number of things needed to be changed in our existing CVS repository to make it migrate properly. Tom Lane provided a big patch to apply to the CVS repository itself prior to the conversion that cleaned most of those up - you can find a copy of it my github page if you're interested.

With this patch applied, we managed a conversion that was very close to the original repository. I personally think this is only because the PostgreSQL project has been very careful about how it deals with it's CVS repository - using it in a fairly simple way. And even with that, we had a number of issues - such as tags moved "after the fact", and branches created off partial checkouts. A fair number of the issues were simply because CVS doesn't have ways to represent everything in a reasonable way, such as issues when a file was deleted, re-added, deleted again, and mix this over different branches.

Git obviously deals with this better, and hopefully we'll have no such issues creeping into the new repository. However, the PostgreSQL project will be sticking with our "conservative approach" to source control - at least for the time being. For this reason, we are restricting what committers can use within git. We still allow any developers (and committers) to use whatever parts of git they want as they develop, but for commits going into the main tree, we are making a number of restrictions:

  • We will not allow merge commits. The PostgreSQL project doesn't follow the "git workflow" - we generally develop our patches on the master branch, and then back-patch to released stable branches for important bugs. We will continue doing this as separate commits and not using merges, thus keeping history linear.
  • We will not use the author field in git to tag it with the patches original author (even in the few cases when the patch is actually authored by a single person). Instead, we will require that author and committer are always set to the same thing, and we will then credit the author(s) (along with the reviewer(s)) in the commit message, just like we've done before.
  • As a follow-on to that requirement, we will require that all committers are the ones registered with the project, using the same name and address on all commits. So even if a patch is developed on a topic branch on say github, it will get collapsed into a single commit (or maybe a couple, depending on size) tagged with the committers name on that.

There has been a lot of discussion around this, and this is how the PostgreSQL project has worked and wants to continue working. We may change this sometime in the future, but not now - we are only changing the tool, and not the workflow.

To enforce these requirements, I've developed a policy hook for our git server that makes sure we don't make the mistake. It's up on my github page, along with the script we use to generate commit mails to the pgsql-committers list that look just the way we want them to.

What does this mean for you as a PostgreSQL user? Really, nothing at all.

What does this mean for you as a PostgreSQL patch developer? Not much. If you did your work off the cvs-to-git mirror, you need to do a new clone. This repository is converted from scratch, so the old one is not valid anymore. We still encourage you to use for example github if you want to do your development there, but the patch submission process remains the same - send a context style diff to the pgsql-hackers mailinglist.

What does this mean for you as a buildfarm-animal maintainer? You need to reconfigure it to use git. I expect Andrew to post instructions on exactly what to do, and keep track of who hasn't done it ;)

Thanks and Well done to all the people involved in making this happen!

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

PGConf.EU 2024
Oct 22-25, 2024
Athens, Greece
Nordic PGDay 2025
Mar 18, 2025
Copenhagen, Denmark

Past

PGConf NYC 2024
Sep 30-Oct 2, 2024
New York, USA
PGDay UK 2024
Sep 11, 2024
London, UK
PGConf.DEV 2024
May 28-31, 2024
Vancouver, Canada
PGDay Chicago 2024
Apr 26, 2024
Chicago, USA
SCaLE 2024
Mar 14-17, 2024
Pasadena, USA
More past conferences