A common example when talking about why it's a good thing to be able to do PITR (Point In Time Recovery) is the scenario where somebody or some thing (operator or buggy application) dropped a table, and we want to do a recover to right before the table was dropped, to keep as much valid data as possible.
PostgreSQL comes with nice functionality to decide exactly what point to perform a recovery to, which can be specified at millisecond granularity, or that of an individual transaction. But what if we don't know exactly when the table was dropped? (Exactly being at the level of specific transaction or at least millisecond).
On way to handle that is to "step forward" through the log one transaction at a time until the table is gone. This is obviously very time-consuming.
DROP TABLE is not something we do very frequently in our system, we can also use the
pg_xlogdump tool to help us find the exact spot to perform the recovery to, in much less time. Unfortunately, the dropping of temporary tables (implicit or explicit) is included in this, so if your application uses a lot of temporary tables this approach will not work out of the box. But for applications without them, it can save a lot of time.
Let's show an example. This assumes you have already set up the system for log archiving, you have a base backup that you have restored, and you have a log archive.
The first thing we do is try to determine the point where a
DROP TABLE happened. We can do this by scanning for entries where rows have been deleted from the
pg_class table, as this will always happen as part of the drop.
In this example I will scan the entire log archive. In a system where the log archive is large, which is probably true of most production systems, this can take some time. In this case it may be a good idea to limit the amount of logs scanned by looking at the timestamps of the archive files that are close to the time of the drop (presumably you know approximately when the table was dropped).
The below simple shellscript will scan for the transaction id's of all dropped tables in the log archives that were dropped from the database with
12379 (in my example this is the postgres database):
DBOID=12379 ARCHIVE=/tmp/archive START=$(ls -1 $ARCHIVE | grep -v partial | head -1) END=$(ls -1 $ARCHIVE | grep -v partial | tail -1) pg_xlogdump -r heap -p $ARCHIVE $START $END \ |sed -n -r -e "s/.*tx:\s+([0-9]+), lsn.*desc: DELETE.*rel [0-9]+\/$DBOID\/1259 .*/\1/p"
The expression in the
sed command looks for any
DELETE from the heap (table) of a relation with
oid=1259, which is
pg_class, that occurred in the correct database, and just prints this to standard output. If multiple tables have been dropped within the period, it cannot identify which is which, and the recovery process will just have to try them all.
In my example, it returned transaction id
Armed with this transaction id, I restored my base backup, and then created a
recovery.conf file that looked like this:
restore_command='cp /tmp/archive/%f %p' recovery_target_xid=50851 recovery_target_inclusive=false
This tells the system to restore up to but not including transaction
50851. It is very important that we do not include this transaction as this would then drop the table again for us. Once this is done, the recovery ran for less than a second (this is a test system locally on SSD after all), and at the end of the log I got:
2016-07-11 14:17:51 CEST LOG: starting point-in-time recovery to XID 50851 2016-07-11 14:17:51 CEST LOG: restored log file "000000010000000000000060" from archive 2016-07-11 14:17:51 CEST LOG: redo starts at 0/60000028 2016-07-11 14:17:51 CEST LOG: consistent recovery state reached at 0/600000F8 2016-07-11 14:17:51 CEST LOG: database system is ready to accept read only connections 2016-07-11 14:17:51 CEST LOG: restored log file "000000010000000000000061" from archive 2016-07-11 14:17:51 CEST LOG: restored log file "000000010000000000000062" from archive 2016-07-11 14:17:51 CEST LOG: recovery stopping before commit of transaction 50851, time 2000-01-01 01:00:00+01 2016-07-11 14:17:51 CEST LOG: recovery has paused 2016-07-11 14:17:51 CEST HINT: Execute pg_xlog_replay_resume() to continue.
At this point I could connect to the system and see that my table exists:
postgres=# \dt mytesttable List of relations Schema | Name | Type | Owner --------+-------------+-------+------- public | mytesttable | table | mha (1 row)
At this point, attempting to read from the table will hang, even though it exists:
postgres=# table mytesttable; ^CCancel request sent ERROR: canceling statement due to user request
This is because the recovery process was paused just before the transaction commits. The system has already replayed the actual
DROP TABLE command, meaning that the recovery process is holding an exclusive lock on the table, so we can't access it. The fact that it's blocking reads to the specific table we were trying to restore is an indication that we are looking at the correct transaction. We can view the state of these locks in
postgres=# SELECT locktype, pid, granted FROM pg_locks postgres-# WHERE relation=(SELECT oid FROM pg_class WHERE relname='mytesttable'); locktype | pid | granted ----------+------+--------- relation | 1795 | t (1 row)
The pid 1795 is not present in the
pg_stat_activity view, but we can see it in the operating system:
$ ps 1795 PID TTY STAT TIME COMMAND 1795 ? Ss 0:00 postgres: startup process recovering 000000010000000000000062
At this point, we can execute the
pg_xlog_replay_resume function to open up the system for access, and once that is done we can read our table:
postgres=# select pg_xlog_replay_resume(); pg_xlog_replay_resume ----------------------- (1 row) postgres=# select count(*) from mytesttable; count ------- 1400 (1 row)
Once the system is open, we can no longer change our mind and go back - the system is now started for normal operation, and has switched to a new timeline.
If we go back to a point right before we opened the system, we can try another approach. If, when the
TABLE query hangs, we instead edit the
recovery.conf file to say that we want to recover up to and including the specific transaction like this:
restore_command='cp /tmp/archive/%f %p' recovery_target_xid=50851 recovery_target_inclusive=true
and then restart the recovery with
pg_ctl restart, we will recover including the drop of the table (
recovery_target_inclusive=true is the default, but I include it to be extra clear). At this point, we can connect to the system and see that our table is gone:
postgres=# \d mytesttable Did not find any relation named "mytesttable".
Of course, that means that we have "gone too far", and need to go back one transaction. Unfortunately, there is no way to do that other than to restart the recovery from the beginning. When you do that, don't forget to
rm -rf the data directory you are restoring into (or use a tool such as rsync which will clear out extra files from it for you), or you may run into conflicting files later on.
If the table still exists after this change then it means that it was some other table being dropped, and we have to try again with another transaction id (the
pg_xlogdump step should have returned multiple transactions in this case).
Some PostgreSQL documentation references: