www.postgresql.org - brand new, yet old and familiar

Posted on Dec 21, 2011 at 13:33. Tags: django, intrastructure, pgweb, postgresql, python, varnish.

Most of the visitors to www.postgresql.org probably never noticed that a couple of weeks back, the entire site was replaced with a new one. In fact, we didn't just change the website - just days before, we made large changes to our ftp network as well (more about that in another post, from me or others). So in fact, we hope that most people didn't notice. The changes were mainly a technical refresh, and there hasn't been much change to the contents at all yet. We did sneak in a few content changes as well, that have been requested for a while, so I'm going to start with listing those:

The developer version of the documentation (updated serveral times per day from the tip of the HEAD branch that will eventually become the next version of PostgreSQL) now live on the main website, and will use the same stylesheets to look a lot nicer than before.
Anybody who submits content to our site (news, events, professional services, products, etc) will notice there is now a new concept of an Organisation. This means that it will finally be possible to have more than one person manage the submissions from a single company or group.
Again for those that submit content, it is now possible to view which of your submissions are still in the moderation queue, and it's also possible to edit something after it's been submitted. In fact, you can edit your items even after they've been approved. Any such editing will be post-moderated, and if this is abused that organization will be banned from post-moderation - but we don't expect that to ever be necessary.
And finally, for those that submit content again, we've switched to markdown to format your submissions, instead of a very random subset of allowed HTML tags.

The rest of the changes are under the hood, and it's mostly done for two reasons: * The technology powering the site was simply very old * The frameworks used were quite obscure, which severely limited the number of people who could (or wanted to) work with them

Hopefully these two changes will make it easier to contribute to the website, so if you're potentially interested in doing that, please read on!

So what did we actually change, and why

Let me start with a few points about how the old site worked: * The main site was implemented in PHP, using HTML_Template_Sigma. This is, unfortunately, not a framework that is widely used these days, and it also has some tight dependencies between the PHP and the template side, making it very often necessary to change both the template and the code to make things work when you change something. * We used to have issues with the entire site crashing or just not responding on "release days". This was partially due to the fact that the hosts our website ran on back in the days (and this is many years ago now) simply couldn't cope with the load, and partially because the above technology solution was simply slow and scaled very badly. * We also had the hosting center the website was in drop off the network far too often, and this took our whole web presence down at once. * To work around these problems, we used static mirrors. This was implemented using a script that spidered our entire site at regular intervals generating static pages, which were in turn served up by multiple frontend servers. Serving static pages at very high rates have never been a problem, so we could easily deal with release days using this technology. Again, this was something put in place many years ago, and it has served us well over time. * For things that needed to be truely dynamic, such as submitting news/events, or getting an up-to-date list of mirrors, a special hostname wwwmaster.postgresql.org was used. This was on the same machine that the spider script used to render the pages.

Unfortunately, this had a number of rather serious drawbacks: * If you needed to make a change to the site, it took a long time for it to hit the frontends. By default, we generated our static frontends every 2 hours. * Regeneration took a long time. Most of the site went reasonably fast, in 5-10 minutes (thus the 2 hour interval). However, regenerating the documentation or ftp site structure could take much longer - more than 6 hours was not uncommon. This obviously wasn't good for performance or usability. * Writing any kind of dynamic or semi-dynamic code had to jump through some interesting hoops to make it work properly with this mirroring scheme. Unfortunately, this made a "simple fix or feature for the website" turn into quite a lot of work, and probably contributed to the lack of updates happening.

So, we made a couple of changes to address these problems:

A new framework - Django

We got rid of our old template framework, and our own home-written meta-framework for processing the data, and went with one of the popular web frameworks that exist today (and didn't back when the technology choice for the old solution was made). Our drug of choice is Django, which of course runs on top of Python. Our website isn't particularly advanced, an in particular the existence of the built-in administration interface was a tipping point for us choosing Django over other similar modern frameworks.

Our static pages are still basically templates stored in the filesystem, making editing of them a simple operation of a text editor + version control (in the new site, git, just like the main PostgreSQL repositories).

For dynamic views, we're still trying to do things simple. Even though we are database people, and stereotypically are supposed to hate everything that is ORM-related, the majority of the database access is through the Django ORM framework. There are a few very specific places where use direct database access, but we try to keep things as simple as possible for the rest.

We have tried to keep the URL space much the same as on the old site - and in the case where URLs have changed it's mostly syntax-wise (e.g. new way of specifying the location of a news article). When they have changed, we have generally put redirects in place from the old URL. There are still thousands of links out there that point to three generations old versions of the website though, and we haven't attempted to cover any of that.

A new frontend - Varnish

Most likely, the Django framework running on top of our newer hosting platforms that we have in production now, could cope with our load without anything in front of it. Certainly on ordinary days, it might struggle on release days. But we would still have the problem of our entire web presence going away if it goes down - and we don't have the manpower to have people who can start working on fixing such a problem right away if it happens. Thus, we wanted to keep the property of the old deployment that had multiple frontends running on different servers hosted with different providers - but we wanted to do this while getting rid of the problems of multi-hour updates.

The solution we choose to deploy this on in Varnish. Varnish is what some people call a web accelerator. What that means it's a caching reverse proxy - all our HTTP traffic hits varnish, which will then either serve the request from it's local cache, or it will pass the request on to the backend (Django) and then potentially cache that result for the next user. There are two main reasons we choose Varnish over the other options in the field (it has many other advantages, but such things as super-high-performance aren't things we really need):

A very flexible (regular expression based) and fast way to do forced expiry from the cache. This lets us do things like cache a news article for hours or days by default, but make a callback from our Django code that removes the cached copy as soon as it's updated in the application. This way, we can both serve the content with a very long time, yet not have to wait hours for an update to come through.
Something called grace mode. This means that if our backend goes down (the scenario we're trying to protect against), our Varnish frontends will keep serving request from the cache - even if the object has expired in the cache.

This means that we keep the property of surviving backend loss for those parts of the site that are commonly navigated. If they've never had a chance to enter the cache on the frontend, we will obviously not be able to serve it from the frontend if the backend is down. But most of the popular parts of the site (front page, news, events, documentation, downloads, etc) are likely always going to be in the cache.

Some glue

To glue this together, there are a couple of specific things in the environment: * All requests for logged in content, submission, etc, are done directly on the backend server. This is all handled over SSL, and we simply tunnel SSL from the frontends to the backend. This means the old wwwmaster.postgresql.org namespace no longer exists, and that all logged in actions are SSL protected. * Each page (or view in Django terms) can specify it's own cache time, through a simple decorator in the code. This will get passed by our framework to varnish, which will then set the cache time accordingly. * All cache purge requests generated by the Django applications are submitted to a pgq queue (part of the Skytools addon package for PostgreSQL). This queue has one consumer for each frontend, that makes certain that the purges are processed on all frontends even if they are not necessarily available at the very moment that the purge request is generated.

In summary, and what you can do to contribute

There is obviously a lot more technical detail to be had about this change. You can find the source code canonical repository on git.postgresql.org with a mirror on github. There is a subdirectory doc/ that contains a lot more technical details. As with all such documentation it's of course not complete, but it's a good start - and patches are always welcome to make it even better!

Speaking of patches - given that we are now in full production, we'd definitely appreciate more contributors. And at any level - from just updating the content of the site, to writing code for new features (or bugfixes, of course). While some people (you know who you are) have claimed that hordes of Django developers will now drop everything and start working on postgresql.org, we have no such illusions. But we do hope that this more modern framework will make it possible to get some more outside contributions.

So if you're interested, join us on the pgsql-www mailinglist. Or in the #postgresql IRC channel on FreeNode.

Comments

Thanks for all your hard work! (And that goes for the rest of the team, too.)

Posted on Dec 21, 2011 at 14:13 by gabrielle.

you know your upgrade is a success when nobody notices anything :)

kudos to the webmaster team !

Posted on Dec 21, 2011 at 22:20 by daamien.

The page with the mail archive is no longer working:

http://www.postgresql.org/community/

Shows me "An internal server error occured."

Posted on Dec 22, 2011 at 11:14 by Hans.

I didn't see that when I checked, but there was a bug that caused it to cache the internal errors for several hours, instead of 10 seconds. This is probably what happened to you - and the error was temporarily returned when we restarted the backend database due to an upgrade.

Posted on Dec 22, 2011 at 14:08 by Magnus Hagander.

The weekly news page (http://www.postgresql.org/community/weeklynews/) seems to be broken. I see only a couple of articles from 2005.

Posted on Dec 22, 2011 at 16:55 by Dave.

Yeah, unfortunately we are still waiting on David to migrate the content over there.

Posted on Dec 22, 2011 at 17:41 by Magnus Hagander.

Add comment

New comments can no longer be posted on this entry.