This post was inspired by Montys post about MySQL release 5.1, and the many discussions it has created both around the web and offline. The question mainly being - why has nobody made such a post about PostgreSQL yet? Is it because it hasn't happened, or just because nobody has posted about it.
We all know that there is no such thing as bug-free software. This obviously includes both PostgreSQL and MySQL, as well as all the commercial competitors - claiming anything else is clearly untrue. But what I find remarkable from Montys post are mainly:
Now, there are several posts I found that are questioning Montys post, saying that the quality is just fine - and backing this up with actual experiences in deploying 5.1. I do think both sides are right here - it's perfectly possible to deploy 5.1 without hitting these bugs, as they are "corner-case" issues. But that does not decrease the importance of having releases without known bugs in them. And if there are known bugs, they should at least be listed very clearly in the release notes/announcement. Not doing this is, IMHO, simply irresponsible. Especially least for a database server which is supposed to safeguard all your work...
So how does PostgreSQL measure up in all this? Well, we aim to never release with known critical bugs in the software. But that all depends on the definition of "known". For example, PostgreSQL does not have a bug-tracker. For those following our lists, you will know that there is an endless debate started whenever somebody tries to discuss that, so I'm not going to discuss the topic in general. I am personally on the side that thinks we should have one. But I also feel that the MySQL bug-tracker is a clear example of the kind that we do not want to have. I don't have any exceedingly high thoughts about the bug-freeness of MySQL, but I have a hard time thinking that all the 41,000+ entries in their tracker are actually bugs. So it represents exactly the part that I really dislike about a lot of the bug-trackers around for open source projects - it's full of junk that aren't bugs, which makes it very hard to find what actually are bugs. (so thanks to Monty for outlining some for us :-P)
That means we can't use "bugs in the bug-tracker" as a metric. PostgreSQL does keep an "open items list" before each release, which does track things that came up during the development. It used to just be a static file on Bruce Momjians machine - now it's in the wiki. It's a lot less structured, but it does give us a lot of help in the let's-not-forget-this-bug-before-release department. But it doesn't track bugs past release, and the history is not preserved across versions (that I'm aware of at least).
Another metric would be to look at what actually changes in the minor versions. If a full release is not reasonably bug-free, there will be a lot of turnaround after the release, once people put it in production. We are, as many open source projects, lucky in that we have a lot of people running our betas and release candidates in near-production environments, which means we can iron out a lot of things before release. But it's obvious that a lot more people will deploy a product once it's released (in fact, that has been quoted as one of the reasons why MySQL 5.1 hit RC so early - "to get more people testing it"). So I decided to pull some statistics from our CVS repository about the latest four major PostgreSQL releases (8.0 to 8.3), and the turnaround on their branches after commit.
http://www.smugmug.com/photos/428148379_hX2qs-O.jpg Update: Since several people asked: the X axis is "minor version number"
This first graph shows the number of commits on the back-branches. This has been somewhat filtered - I have removed for example commits that are just updates of translations, and the commits that stamp the release itself. Or rather - I have tried to, with some simple regexp matching. So don't take these numbers as absolute, but they should be good enough to show trends. Also note that due to some problems with cvs not wanting to be nice to me, there are no commit statistics for 8.0.1, that's why the blue bar is missing in the first one.
A couple of things I think we can read from this graph. First of all, the total number of commits in the back-branches after a release are very low. Only very few of the branches have more than 50 commits going in. And this includes things that are just whitespace fixes or comment fixes, I didn't manually look through things to determine what were actual code fixes. It also "kind of" shows that 8.1 had more patches going in early on - which aligns with the general consensus that 8.1 was not one of our best releases. 8.3 had quite a lot of patches going into the first two releases, which can probably be attributed to several of the new features being very complex and thus obviously exposed to corner-case bugs. And there is (whew, lucky!) a steady decline in the number of fixes as the versions gets older - both because there are less people using them and finding bugs, and obviously because there are less bugs left in them.
So, moving on:
http://www.smugmug.com/photos/428148260_667SB-O.jpg Update: Since several people asked: the X axis is "minor version number"
This second graph shows the number of lines added and removed. I used a simple diffstat output and probably ran diff in the wrong mode first, so it all came out as inserts + deletes. Thus, any changed row actually counts as two in the graph. The true number of modified rows is therefore much lower than the ones indicated in the graph - but not necessarily half, since there are still pure additions and removals happened.
The first thing that stands out from this graph, is 8.1.1 having almost 10,000 lines changed. The reason for this, after trolling the cvs logs for a while, is that pgindent was re-run after release. This is the tool we use to make sure indenting and code formatting is consistent across the source code. Obviously, this generates a lot of changes, all of which are whitespace only. We normally don't run this after release, because it tends to make it harder to track history of the files. But for some reason that I can't remember (I'm sure it can be found in the list archives if somebody cares enough to go digging), we did it for 8.1.
In the 8.0 series, 8.0.2 stands out as having a lot of changes. This was the release where we pulled out the ARC cache method (for several reasons) and replaced it with a completely new one.
8.3 overall shows quite a lot of changes in the early releases, with a large drop for 8.3.5. But it's too early to tell if that's a permanent drop yet. But even "quite a lot of changes" is only just over 2,000 lines of code, which is less than 0.3%25 of the code base (code base size according to Ohloh, and not counting the effect of the add/remove for change as mentioned above).
I don't know the MySQL code base enough to make similar graphs for them - but would be very interested in seeing them if somebody else did!
Finally, another metric:
http://www.smugmug.com/photos/428159720_kpdox-X3.jpg
This is the number of lines modified between the latest release candidate and the release. This is essentially the number of lines that go into the release without being tested outside the developers. Before you get all excited about how large this number is - looking through the actual changes, very much of it is actually changes to error messages, comments or whitespace.
Yeah, that turned out to be a very long post. If there are more statistics around the code base that you think would be interested, let me know! I have some of this stuff loaded in a database now, so I can do some more analysis without too much work as long as it fits within that model.
A very interesting post.
"As far as I know, this has never been done with PostgreSQL (at least if we're talking "modern times", since after the product became reasonably stable)."
There is one issue that has been a bug in PostgreSQL for a long time. If you do something like "UPDATE foo SET i = i + 1" and "i" has a unique index on it, the statement will probably fail.
I still agree with your point, of course. I think this is a case of "the exception proves the rule". The fact that it takes effort to even think of one such example in PostgreSQL shows the stark contrast between the projects.
While that's a good point, I don't think that can be called a critical bug (crash/wrong result). It's certainly an annoyance factor though. So I think even that one is a bit of a stretch.
I consider that to be a "wrong result" because it can tell you something is a constraint violation where it should not -- and it's non-deterministic, meaning that you might miss it in testing and hit it in production.
However, it absolutely is a stretch. I have never hit this in a production application personally. It also doesn't cause data corruption, and has reasonable workarounds.
yeah while annoying this is documented and mentioned on the todo(http://wiki.postgresql.org/wiki/Todo#Integrity_Constraints) as well. the main problem is that nobody yet found a way to implement deferred unique constraints in a way that would not kill performance.
One example of a long known bug (found during 8.0 development) which crashed the server that was only fixed in 8.3 was:
http://archives.postgresql.org/pgsql-bugs/2004-06/msg00154.php
Another thing that is worth noting is that a pretty large percentage of back-branch commits involve fixing the same bug across more than one branch. It would be really interesting to filter out the "duplicate" commits and blame only the oldest branch (where such a bug presumably originated) for such a group of commits. I think that that would give a better picture of the branch-by-branch stability. Whether it's worth the trouble is another question though.
Yeah. I think the X-axis here is not really well chosen. Could we see the same data as four separate time series (8.3.x above 8.2.x above 8.1.x etc) with a shared X-axis of actual calendar date? That would make it more obvious when there are correlated changes in several branches, such as the big encoding fixes that swelled the numbers for both 8.0.8 and 8.1.4. In a display like that you could furthermore break down the commits by actual date rather than aggregating them into the point releases.
A comment on the "hidden bug" business: I don't know the exact story behind mysql's bug 37936 of course, but I do know that a lot of bugs in Red Hat's bugzilla are marked private because they contain some amount of customer-confidential information. In many cases there would be no harm in making such reports public, but no one has the time to run around and get them cleared for publicizing. 37936 very possibly has a similar problem. There are enough crashing bugs in their bugzilla that aren't hidden to make me doubt that they hid this just because it was a crasher.
Can you calculate the defect containment effectiveness of a MySQL v. Postgrest release?
Example: Bugs before RC1 = 100 Bugs after RC1 = 5
DCE = before / (before+after) * 100%25 = 100 / (100+5) * 100%25 = 95.24%25
Interesting post, Magnus. I would be curious what the overall changes are between major releases. One hypothesis is that PostgreSQL stability is due to conservatism verging paranoia about new features. Paranoia is good if you want things that work.
Incidentally, I don't think the MySQL "problem" is exactly quality per se, as there's no evidence that MySQL is buggier than many commercial DBMS products earlier in their product cycles. It's rather that MySQL took so long to get 5.1 out the door at the quality level they eventually achieved. Similarly, the MySQL engineering group is often very slow to address obvious bugs that the PostgreSQL community either avoided up front or would have addressed quickly once brought to light. This slowness is one of the key reasons for the fragmentation of the MySQL codeline that has occurred over the last year. I imagine the root cause lies in the way MySQL releases are managed, as code can generally be fixed if you have the will to do so.
It's really strange that the PostgreSQL devs do not believe in maintaining a bug database! Relying on someone to keep track of a bug by essentially wading through an archive of forum posts or e-mails seems to be pretty much useless.
Ha, there had to be at least one 'eh. Thanks for pointing it out.
Good point. I missed the perspective of mixing the paid customers with the community stuff in the same tracker, but I can see how that could be a good explanation for this happening.
Interesting, but I'd like to point out that bug 37937 includes the following:
[7 Nov 11:12] Georgi Kodinov
Bug #37936 is probably a duplicate of this bug
[7 Nov 11:46] Georgi Kodinov
Duplicate of bug #37936.
Maybe not as much dirt here as is being made out to be? This took me a whole 1 minute to find. I should probably mention that I've had more luck getting MySQL installations to do replication & 5 nines than I have with Postgres -- but I'm still a supporter of both. I feel both projects are useful, it all just goes back to using the right tool for the job.
I think Magnus has illustrated that a bug tracking system (like MySQL has) doesn't make everything magically better.
I am the packager of postgresql in Mandriva, and from my own experience, the bug tracker is just a big TODO list with a lot of opened case you're not always able to solve.
I don't know if MySQL has tests suite, but PostgreSQL has. And again, being also a developers, a test suite increase more the quality than a bugs tracker everyone can ignore, especially since the tests are added for each new bugs discovered, denying to it to back into the software.
Hi! I have access to BUG#37936 and it doesn't seem to have anything more than what is already there in BUG#37937. I'll try to make it public.
Yes, MySQL has a testsuite, it's quite big and is a part of the server source. It is also automatically re-run every time somebody has pushed some change into any (5.0/5.1/6.0) tree.
Excellent Analysis. Another thing I've noticed is that each 'major' release of PostgreSQL seems to have fewer and fewer bugfix releases, even accounting for time. This suggests the development process is improving...
New comments can no longer be posted on this entry.
the reason for the sharp drop in 8.3.5 can be explained by the rather short time between the 8.3.4 and the 8.3.5 release due to the GIST-Index related bug found after 8.3.4 was released. Maybe it would also be interesting to graph the number of commits to all branches over time as in commits/month or such.