In a previous post I discussed a way to find out which transaction dropped a table by examining the transaction log, in order to set a restore point to right before the table was dropped.
But what if we have the luxury of planning ahead (right? Well, let's call it the second time it happens?). Shouldn't we be able to log which transaction dropped a table, and use that? Of course we should.
The first thing one tries is then of course something like this in postgresql.conf
:
log_statement='ddl'
log_line_prefix = '%t [%u@%d] <%x> '
to include the transaction id of the table. Unfortunately:
2017-02-12 12:16:39 CET [mha@postgres] <0> LOG: statement: drop table testtable;
The 0
as a transaction id indicates that this command was run in a virtual transaction, and did not have a real transaction id. The reason for this is that the statement logging happens before the statement has actually acquired a transaction. For example, if I instead drop two tables, and do so in a transaction:
postgres=# BEGIN;
BEGIN
postgres=# DROP TABLE test1;
DROP TABLE
postgres=# DROP TABLE test2;
DROP TABLE
postgres=# COMMIT;
COMMIT
I get this interesting output:
2017-02-12 12:17:43 CET [mha@postgres] <0> LOG: statement: DROP TABLE test1;
2017-02-12 12:17:45 CET [mha@postgres] <156960> LOG: statement: DROP TABLE test2;
Which shows two different transaction ids (one real and one not) for statements in the same transaction. That's obviously not true - they were both dropped by transaction 156960
. The transaction id just wasn't available at the time of logging.
So what can we do about that? Event triggers to the rescue!
For a while now, I've been annoyed with how difficult it is to set up good backups in PostgreSQL. The difficulty of doing this "right" has pushed people to use things like pg_dump for backups, which is not really a great option once your database reaches any non-toy size. And when visiting customers over the years I've seen a large number of home-written scripts to do PITR backups, most of them broken, and most of that breakage because the APIs provided were too difficult to use.
Over some time, I've worked on a number of ways to improve this situation, alone or with others. The bigger steps are:
For 10.0, there are a couple of new things that have been done in the past couple of weeks:
A common example when talking about why it's a good thing to be able to do PITR (Point In Time Recovery) is the scenario where somebody or some thing (operator or buggy application) dropped a table, and we want to do a recover to right before the table was dropped, to keep as much valid data as possible.
PostgreSQL comes with nice functionality to decide exactly what point to perform a recovery to, which can be specified at millisecond granularity, or that of an individual transaction. But what if we don't know exactly when the table was dropped? (Exactly being at the level of specific transaction or at least millisecond).
On way to handle that is to "step forward" through the log one transaction at a time until the table is gone. This is obviously very time-consuming.
Assuming that DROP TABLE
is not something we do very frequently in our system, we can also use the pg_xlogdump
tool to help us find the exact spot to perform the recovery to, in much less time. Unfortunately, the dropping of temporary tables (implicit or explicit) is included in this, so if your application uses a lot of temporary tables this approach will not work out of the box. But for applications without them, it can save a lot of time.
Let's show an example. This assumes you have already set up the system for log archiving, you have a base backup that you have restored, and you have a log archive.
The first thing we do is try to determine the point where a DROP TABLE
happened. We can do this by scanning for entries where rows have been deleted from the pg_class
table, as this will always happen as part of the drop.
Today I committed the first version of a new PostgreSQL tool, pg_basebackup. The backend support was committed a couple of weeks back, but this is the first actual frontend.
The goal of this tool is to make base backups easier to create, because they are unnecessarily complex in a lot of cases. Base backups are also used as the foundation for setting up streaming replication slaves in PostgreSQL, so the tool will be quite useful there as well. The most common way of taking a base backup today is something like (don't run this straight off, it's not tested, there are likely typos):
psql -U postgres -c "SELECT pg_start_backup('base backup')"
if [ "$?" != "0" ]; then
echo Broken
exit 1
fi
tar cfz /some/where/base.tar.gz /var/lib/pgsql/data --exclude "*pg_xlog*"
if [ "$?" != "0" ]; then
echo Broken
psql -U postgres -c "SELECT pg_stop_backup()"
exit 1
fi
psql -U postgres -c "SELECT pg_stop_backup()"
if [ "$?" != "0" ]; then
echo Broken
exit 1
fi
And when you're setting up a replication slave, it might look something like this:
psql -U postgres -h masterserver -c "SELECT pg_start_backup('replication base', 't')"
if [ "$?" != "0" ]; then
echo Broken
exit 1
fi
rsync -avz --delete --progress postgres@masterserver:/var/lib/pgsql/data /var/lib/pgsql
if [ "$?" != "0" ]; then
echo Broken
psql -U postgres -c "SELECT pg_stop_backup()"
exit 1
fi
psql -U postgres -c "SELECT pg_stop_backup()"
if [ "$?" != "0" ]; then
echo Broken
exit 1
fi
There are obvious variations - for example, I come across a lot of cases where people don't bother checking exit codes. Particularly for the backups, this is really dangerous.
Now, with the new tool, both these cases become a lot simpler:
pg_basebackup -U postgres -D /some/where -Ft -Z9
That simple. -Ft makes the system write the output as a tarfile (actually, multiple tar files if you have multiple tablespaces, something the "old style" examples up top don't take into account). -Z enables gzip compression. The rest should be obvious...
In the second example - replication - you don't want a tarfile, and you don't want it on the same machine. Again, both are easily handled:
pg_basebackup -U postgres -h masterserver -D /var/lib/pgsql/data
That's it. You can also add -P to get a progress report (which you can normally not get out of tar or rsync, except on an individual file basis), and a host of other options.
This is not going to be a tool that suits everybody. The current method is complex, but it is also fantastically flexible, letting you set things up in very environment specific ways. That is why we are absolutely not removing any of the old ways, this is just an additional way to do it.
If you grab a current snapshot, you will have tool available in the bin directory, and it will of course also be included in the next alpha version of 9.1. Testing and feedback is much appreciated!
There are obviously things left to do to make this even better. A few of the things being worked on are: * Ability to run multiple parallel base backups. Currently, only one is allowed, but this is mainly a restriction based on the old method. Heikki Linnakangas has already written a patch that does this, that's just pending some more review. * Ability to include all the required xlog files in the dump, in order to create a complete "full backup". Currently, you still need to set up log archiving for full Point In Time Recovery, even if you don't really need it. We hope to get rid of this requirement before 9.1. * Another option is to stream the required transaction logs during the backup, not needing to include them in the archive at all. This is less likely to hit until 9.2. * The ability to switch WAL level as necessary. For PITR or replication to work, wal_level must be set to archive or hot_standby, and changing this requires a restart of the server. The hope is to eventually be able to bump this from the default (minimal) at the start of the backup, and turn it back down when the backup is done. This is definitely not on the radar until 9.2 though.