I have received a lot of questions since the announcement that we are temporarily shutting down the anonymous git mirror and commit messages. And we're also seeing quite a lot of media coverage.
Let me start by clarifying exactly what we're doing:
There has been some speculation in that we are going to shut down all list traffic for a few days - that is completely wrong. All other channels in the project will operate just as usual. This of course also includes all developers working on separate git repositories (such as a personal fork on github).
We are also not shutting down the repositories themselves. They will remain open, with the same content as today (including patches applied between now and Monday), they will just be frozen in time for a few days.
So why are we doing this? It's pretty simple - it takes a few days to prepare packages for all our supported platforms, to do testing on these, and get them ready for release. If we just committed the security fixes and then proceeded with the packaging, that would mean that anybody who was following our repository would be able to see those fixes a few days before the fixes were available to the majority of the users. That also means that anybody looking for the flaw would get a few days of time when the full details of the bug was in the open (since the fix was applied in public), but yet all the installations around the world would be unpatched and left wide open for exploit.
By restricting access to view the patches until release time, we close this window. Yes, the vulnerability is still in the code that is out there today. But it has been in there for a few years, and nobody (that we know of) found it in that time. Hopefully, nobody will between now and release time. But by not explicitly showing the bug, we're at least keeping that risk as low as possible while still being able to warn our users that they will need to apply the patch as soon as it's out.
We do realize that this will make some people look harder at the PostgreSQL code over the next couple of days trying to find this bug, and write an exploit for it.
I've seen a couple of comments along the line of "isn't this where you should be using a DVCS like git you're using, letting the people building the security fixes do that in a separate repository and merge it once ready, not needing to shut down the central one".
Turns out that is actually exactly what we are doing. The security fixes are mostly already developed, and as such are sitting somewhere else from the main repository. But we need at some point to merge these into the main repository, in order to let people build the packages. We only close down the repository mirroring right before this merge is done, and until the packages are ready to be released. It's not the work to develop the patch that requires the shutdown of the mirroring, it's the work to build and release packages.
The other advantage of the fact that we are using a DVCS, is that development does not stop during this time. Anybody working on a patch can keep working on it in their local copy of the repository. It's only the merge ("apply") of the patch to the upstream master branch that's going to be delayed. And that affects a much smaller group of people. Of course, it is a bit of an extra annoyance since we are currently trying to close out the open patches for the next release, but it's not a huge difference for most developers.
Yes, absolutely! We are not going to permanently hide any information, or try to obfuscate the contents of security patches (coughunlike some other players in the field).
Once the new versions are released, the git mirroring will resume. This will immediately mirror all the individual commits, including detailed commit messages showing what the bugs were (and of course including the fix itself). And we are assigning public CVE numbers to all security related bugs. At this point, the commit messages held in the queue will also be released, and appear on the pgsql-committers list for anybody who wants to read up on them. And of course, complete tarballs with the full release will be made available alongside the binary packages.
It's a difficult balance between keeping things open so that everybody can verify what's going on, and keep exploit information out of the hands of the bad guys. Our goal with what we did this time is to minimize exposure to our users for a potentially very bad exploit (depends on the scenario for each individual install, of course), while we work with downstream distributions to make sure our fixes can reach the users as quickly as possible.
Is it the right way? We don't know. It's the first time we do this, and it's not something we plan to do as a general process. We'll of course have to evaluate whether it was successful once it's all done.
Finally, for those of you who are our users, a short repeat. A new release is planned next week, current schedule is release on April 4th. We advise all users to review the security announcement and apply the fix as quickly as possible if the vulnerability is targetable in your environment. The patch will require installation of new binaries and a restart of the database, but no further migration work than that.
We take the security of our users seriously, and try our best to protect them as much as possible. It's out belief that the tradoffs we've done here are in their best interest. The future will tell, of course, if that belief is correct.
So it finally happened. The official PostgreSQL master source tree is now managed in git, instead of cvs. This means, amongst other things, that the worlds most advanced open source database now has a version control system with.. eh. atomic commits!
Like the first run, this one had some issues with it, but it was smaller and resolved in time not to have to roll back. This time, it turned out that the cvs version that ships in Debian GNU/Linux comes with patches that change the default date format to the ISO standard. But since one of our main requirements on the conversion was to be able to faithfully represent the old versions of the code, this broke every single file - since we used CVS keyword expansion in the old tree. Once we found this, it was a simple case of adding the DateFormat=old parameter to the CVS config file and re-run the whole conversion - which took several hours.
A lot of work went into making the repository conversion correct. Some of this was due to issues in the toolchain used - many thanks to Michael Haggerty and Max Bowsher for getting those fixed and explaining some of the behaviors of the software for us. In the end, a number of things needed to be changed in our existing CVS repository to make it migrate properly. Tom Lane provided a big patch to apply to the CVS repository itself prior to the conversion that cleaned most of those up - you can find a copy of it my github page if you're interested.
With this patch applied, we managed a conversion that was very close to the original repository. I personally think this is only because the PostgreSQL project has been very careful about how it deals with it's CVS repository - using it in a fairly simple way. And even with that, we had a number of issues - such as tags moved "after the fact", and branches created off partial checkouts. A fair number of the issues were simply because CVS doesn't have ways to represent everything in a reasonable way, such as issues when a file was deleted, re-added, deleted again, and mix this over different branches.
Git obviously deals with this better, and hopefully we'll have no such issues creeping into the new repository. However, the PostgreSQL project will be sticking with our "conservative approach" to source control - at least for the time being. For this reason, we are restricting what committers can use within git. We still allow any developers (and committers) to use whatever parts of git they want as they develop, but for commits going into the main tree, we are making a number of restrictions:
There has been a lot of discussion around this, and this is how the PostgreSQL project has worked and wants to continue working. We may change this sometime in the future, but not now - we are only changing the tool, and not the workflow.
To enforce these requirements, I've developed a policy hook for our git server that makes sure we don't make the mistake. It's up on my github page, along with the script we use to generate commit mails to the pgsql-committers list that look just the way we want them to.
What does this mean for you as a PostgreSQL user? Really, nothing at all.
What does this mean for you as a PostgreSQL patch developer? Not much. If you did your work off the cvs-to-git mirror, you need to do a new clone. This repository is converted from scratch, so the old one is not valid anymore. We still encourage you to use for example github if you want to do your development there, but the patch submission process remains the same - send a context style diff to the pgsql-hackers mailinglist.
What does this mean for you as a buildfarm-animal maintainer? You need to reconfigure it to use git. I expect Andrew to post instructions on exactly what to do, and keep track of who hasn't done it ;)
Thanks and Well done to all the people involved in making this happen!
Many people who develop patches for PostgreSQL don't have access to Windows machines to test their patches on. Particularly not with complete build environments for the MSVC build on them. The net result of this is that a fair amount of patches are never tested on Windows until after they are committed. For most patches this doesn't actually matter, since it's changes that don't deal with anything platform specific other than that which is already taken care of by our build system. But Windows is not Posix, so the platform differences are generally larger than between the different Unix platforms PostgreSQL builds on, and in MSVC the build system is completely different. In a non-trivial number of cases it ends up with breaking the buildfarm until somebody with access to a Windows build environment can fix it. Lucky, we have a number of machines running on the buildfarm with Windows on them, so we do catch these things long before release.
There are a couple of reasons why it's not easy for developers to have a Windows machine ready for testing, even a virtual one. For one, it requires a Windows license. In this case the same problem with availability for testing exists for other proprietary platforms such as for example Mac OSX, but it's different from all the free Linux/Unix platforms available. Second, setting up the build environment is quite complex - not at all as easy as on the most common Linux platforms for example. This second point is particularly difficult for those not used to Windows.
A third reason I noticed myself was that running the builds, and regression tests, is very very slow at least on my laptop using VirtualBox. It works, but it takes ages. For this reason, a while back I started investigating using Amazon EC2 to do my Windows builds on, for my own usage. Turns out this was a very good solution to my problem - the time for a complete rebuild on a typical EC2 instance is around 7 minutes, whereas it can easily take over 45 minutes on my laptop.
Now, EC2 provides a pretty nice way to create what's called an AMI (Amazon Machine Image) that can be shared. Using these facilities, I have created an AMI that contains Windows plus a complete PostgreSQL build environment. Since this AMI has been made public, anybody who wants to can boot up an instance of it to run tests. Each of these instances are completely independent of each other - the AMI only provides a common starting point.
I usually run these on a medium size Amazon instance. The cost for such an instance is, currently, $0.30 per hour that the instance is running. The big advantage here is that this includes the Windows license. That makes it a very cost-effective way to do quick builds and tests on Windows.
Read on for a full step-by-step instruction on how to get started with this AMI (screenshot overload warning).
As Peter has already noted, we have been planning for a while to update the service on git.postgresql.org, and as of now we just flipped the switch to the new version. We are still waiting for the DNS zone to update (yes, we could've lowered the TTL. No, we didn't think it was that important). So within 5 hours, anybody accessing git.postgresql.org through http, git or ssh will be accessing the new server.
First a note to all those people who use the git clone of the main cvs repository to do PostgreSQL development: nothing has changed for you. The URL is the same, the repository is the same, just keep pulling.
The only real difference there should be that the new server the service is hosted on has a little bit more bandwidth available, but it should not normally be noticeable.
The changes are all around the repositories that are originally hosted at the site. The changes have been made to make it easier to collaborate on projects, and to require less manual work to set things up. It is our hope that this will lower the bar of entry for those who are interested in developing PostgreSQL-related projects using git. The main changes for these users are: * Login is now integrated with the postgresql.org community account system. This means unified usernames across PostgreSQL services, such as for example the wiki. You just need to upload your SSH key to enable access. * Users no longer have shell accounts on the server. Instead, all users use git@git.postgresql.org as their ssh login. This makes the management and securing of the server much easier. * It is now possible for the owner of a repository to delegate permissions to other users directly from a web interface. As long as the other user has also uploaded his ssh key, this will be completely automatic. * It is now possible to request a new repository for your code using the same web interface. Requests for new repositories will still have to be approved before the repository is created.
Per an inventory that was made before the switch was made, several inactive repositories were not migrated to the new server. If you are missing a repository that was for some reason not migrated, feel free to contact us at [mailto:pgsql-www@postgresql.org the pgsql-www mailinglist] for a recovery - we will keep all the old files around for a while just in case.
As a final note, the source code for the git management tool itself is of course available in the git repository (when your DNS has updated).
If you're like me, you are using the git.postgresql.org repository to do your PostgreSQL development, because it's much nicer to work with than CVS.
If you're also like me, you like git diff with it's nice coloring and trailing-whitespace-warnings and such features. But you're a little bit annoyed that your tabs come out as 8-characters, when the PostgreSQL source uses 4-character tabs, making diffs a bit hard to read. But if you just pipe the output to less or something, the coloring goes away.
I finally got around to looking for a way to fix that today. And it took me all of 2 minutes to find it - I really should've done this before. Put the following in your .git/config file:
[core]
pager = less -x4
and it'll show you the diffs with 4-space tabs.
Trivial, yes. But it took me this long to even look at fixing it, so hopefully this can help someone...