Another couple of steps on my backup crusade

Posted on Jan 16, 2017 at 14:18. Tags: postgresql, backup, replication.

For a while now, I've been annoyed with how difficult it is to set up good backups in PostgreSQL. The difficulty of doing this "right" has pushed people to use things like pg_dump for backups, which is not really a great option once your database reaches any non-toy size. And when visiting customers over the years I've seen a large number of home-written scripts to do PITR backups, most of them broken, and most of that breakage because the APIs provided were too difficult to use.

Over some time, I've worked on a number of ways to improve this situation, alone or with others. The bigger steps are:

9.1 introduced pg_basebackup, making it easier to take base backups using the replication protocol
9.2 introduced transaction log streaming to pg_basebackup
9.6 introduced a new version of the pg_start_backup/pg_stop_backup APIs that are needed to do more advanced base backups, in particular using third party backup tools.

For 10.0, there are a couple of new things that have been done in the past couple of weeks:

Allow streaming transaction log in pg_basebackup tar format

This makes it possible to use tar format (which is required for supporting compressed data, as well as makes it a lot easier to deal with backups across multiple tablespaces) while still using streaming transaction log mode (which is much more likely to work on larger databases than the fetch mode that was the only one supported together with tar previously).
Make wal streaming the default mode in pg_basebackup

This means that the by default, pg_basebackup will create a backup that is independently useful, without the need for pg_receivexlog or archive_command to be configured. I generally recommend doing this even if you have archiving setup already, because having base backups being independently useful is still a valuable thing in this case. And by making this mode the default, it is a lot less likely to be forgotten.

If you don't like this and want your backups without the included wal (for example to save space), you can get the old behavior back by using -X none.
Change default values for backup and replication parameters

This changes the default value for wal_level (now replica), max_wal_senders (now 10) and max_replication_slots (now 10).

This means that backups using pg_basebackup and with third party tools using the pg_start_backup/pg_stop_backup APIs will now work out of the box without having to reconfigure the server.

It also means that the data loading optimizations that are available in wal_level=minimal are no longer enabled by default. If you have a workload where data loading performance (and that otherwise hits the requirements for these optimizations) is more important than being able to take a full backup or enable replication, you may want to change these parameters back.
Make pg_basebackup use temporary replication slots

Replication slots are used to prevent the wal on the server from being removed while a downstream replica or backup still needs it. pg_basebackup could use permanent replication slots for handling backups and replication setup, but this required a fair amount of external tooling to make it work for simple backups, as the feature was mainly designed for replication. For backups, there was a lot of scenarios that could end up with leftover replication slots preventing all wal rotation and eventually run the server out of disk space.

Thanks to Petr, we have temporary replication slots in 10.0. These are replication slots that are automatically dropped when the connection ends, both on normal exit and on error. Using this, pg_basebackup can now create such a slot when the backup starts and automatically have it dropped at the end, thus guaranteeing that the wal required to complete the backup will not be recycled while not risking leftover slots if something goes wrong.

In summary

There are some other changes as well around the backup areas of course, but these are the major ones so far - at least that I have been involved with. Hopefully this will make it significantly easier to get up to speed with proper backups in PostgreSQL 10.0. Particularly using the built-in tools, but also with a good foundation for the third party tools if necessary.

Hopefully, there will no longer be any excuses for not having proper backups! (I wish..)

Comments

Due to both the difficulty in setting up PITR, as well as the fact that PITR backups are practically guaranteed to be tainted by any kind of data corruption, I always recommend making pg_dump part of disaster recovery if at all possible. Checksums would solve #2; it would be nice for #1 to be solved as well. (pg_basebackp certainly goes a long way in that area)

Another significant hole is support for using SAN or filesystem snapshots to take a base backup. Since pg_basebackup doesn't support that, you're back to relying on getting a bunch of individual steps correct. It'd be nice to eventually add support for a custom backup command to pg_basebackup.

Posted on Jan 21, 2017 at 02:44 by Jim Nasby.

Personally I find pg_dump mostly useless for backups, because it can never deliver on reasonable restore times (which is really the important measure). And of course, also doesn't deliver PITR. But - if you can afford to do both, I always encourage it of course.

More people should really be running with checksums on to cover that part. Maybe I should add that to my crusade :)

Finally - I think that once you use the new APIs (that came in 9.6) for doing the base backups, integrating with things like SAN snapshots is a lot safer. In earlier versions, I fully agree with your comment on that.

Posted on Jan 21, 2017 at 11:37 by Magnus Hagander.

Add comment

New comments can no longer be posted on this entry.