Viewing entries tagged with commit. Return to full view.

Another step towards easier backups

Today I committed the first version of a new PostgreSQL tool, pg_basebackup. The backend support was committed a couple of weeks back, but this is the first actual frontend.

The goal of this tool is to make base backups easier to create, because they are unnecessarily complex in a lot of cases. Base backups are also used as the foundation for setting up streaming replication slaves in PostgreSQL, so the tool will be quite useful there as well. The most common way of taking a base backup today is something like (don't run this straight off, it's not tested, there are likely typos):

psql -U postgres -c "SELECT pg_start_backup('base backup')"
if [ "$?" != "0" ]; then
   echo Broken
   exit 1
fi
tar cfz /some/where/base.tar.gz /var/lib/pgsql/data --exclude "*pg_xlog*"
if [ "$?" != "0" ]; then
   echo Broken
   psql -U postgres -c "SELECT pg_stop_backup()"
   exit 1
fi
psql -U postgres -c "SELECT pg_stop_backup()"
if [ "$?" != "0" ]; then
   echo Broken
   exit 1
fi

And when you're setting up a replication slave, it might look something like this:

psql -U postgres -h masterserver -c "SELECT pg_start_backup('replication base', 't')"
if [ "$?" != "0" ]; then
   echo Broken
   exit 1
fi
rsync -avz --delete --progress postgres@masterserver:/var/lib/pgsql/data /var/lib/pgsql
if [ "$?" != "0" ]; then
   echo Broken
   psql -U postgres -c "SELECT pg_stop_backup()"
   exit 1
fi
psql -U postgres -c "SELECT pg_stop_backup()"
if [ "$?" != "0" ]; then
   echo Broken
   exit 1
fi

There are obvious variations - for example, I come across a lot of cases where people don't bother checking exit codes. Particularly for the backups, this is really dangerous.

Now, with the new tool, both these cases become a lot simpler:

pg_basebackup -U postgres -D /some/where -Ft -Z9

That simple. -Ft makes the system write the output as a tarfile (actually, multiple tar files if you have multiple tablespaces, something the "old style" examples up top don't take into account). -Z enables gzip compression. The rest should be obvious...

In the second example - replication - you don't want a tarfile, and you don't want it on the same machine. Again, both are easily handled:

pg_basebackup -U postgres -h masterserver -D /var/lib/pgsql/data

That's it. You can also add -P to get a progress report (which you can normally not get out of tar or rsync, except on an individual file basis), and a host of other options.

This is not going to be a tool that suits everybody. The current method is complex, but it is also fantastically flexible, letting you set things up in very environment specific ways. That is why we are absolutely not removing any of the old ways, this is just an additional way to do it.

If you grab a current snapshot, you will have tool available in the bin directory, and it will of course also be included in the next alpha version of 9.1. Testing and feedback is much appreciated!

There are obviously things left to do to make this even better. A few of the things being worked on are: * Ability to run multiple parallel base backups. Currently, only one is allowed, but this is mainly a restriction based on the old method. Heikki Linnakangas has already written a patch that does this, that's just pending some more review. * Ability to include all the required xlog files in the dump, in order to create a complete "full backup". Currently, you still need to set up log archiving for full Point In Time Recovery, even if you don't really need it. We hope to get rid of this requirement before 9.1. * Another option is to stream the required transaction logs during the backup, not needing to include them in the archive at all. This is less likely to hit until 9.2. * The ability to switch WAL level as necessary. For PITR or replication to work, wal_level must be set to archive or hot_standby, and changing this requires a restart of the server. The hope is to eventually be able to bump this from the default (minimal) at the start of the backup, and turn it back down when the backup is done. This is definitely not on the radar until 9.2 though.

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

PGDay Chicago 2024
Apr 26, 2024
Chicago, USA
PGConf.DEV 2024
May 28-31, 2024
Vancouver, Canada

Past

SCaLE 2024
Mar 14-17, 2024
Pasadena, USA
Nordic PGDay 2024
Mar 12, 2024
Oslo, Norway
FOSDEM PGDay 2024
Feb 2-4, 2024
Brussels, Belgium
PGConf.EU 2023
Dec 12-15, 2023
Prague, Czechia
PGConf.NYC 2023
Oct 3-5, 2023
New York, USA
More past conferences