Locating the recovery point just before a dropped table

A common example when talking about why it's a good thing to be able to do PITR (Point In Time Recovery) is the scenario where somebody or some thing (operator or buggy application) dropped a table, and we want to do a recover to right before the table was dropped, to keep as much valid data as possible.

PostgreSQL comes with nice functionality to decide exactly what point to perform a recovery to, which can be specified at millisecond granularity, or that of an individual transaction. But what if we don't know exactly when the table was dropped? (Exactly being at the level of specific transaction or at least millisecond).

On way to handle that is to "step forward" through the log one transaction at a time until the table is gone. This is obviously very time-consuming.

Assuming that DROP TABLE is not something we do very frequently in our system, we can also use the pg_xlogdump tool to help us find the exact spot to perform the recovery to, in much less time. Unfortunately, the dropping of temporary tables (implicit or explicit) is included in this, so if your application uses a lot of temporary tables this approach will not work out of the box. But for applications without them, it can save a lot of time.

Let's show an example. This assumes you have already set up the system for log archiving, you have a base backup that you have restored, and you have a log archive.

The first thing we do is try to determine the point where a DROP TABLE happened. We can do this by scanning for entries where rows have been deleted from the pg_class table, as this will always happen as part of the drop.

Continue reading

www.postgresql.org is now https only

We've just flipped the switch on www.postgresql.org to be served on https only. This has been done for a number of reasons:

  • In response to popular request
  • Google, and possibly other search engines, have started to give higher scores to sites on https, and we were previously redirecting accesses to cleartext
  • Simplification of the site code which now doesn't have to keep track of which pages need to be secure and which does not
  • Prevention of evil things like WiFi hotspot providers injecting ads or javascript into the pages

We have not yet enabled HTTP Strict Transport Security, but will do so in a couple of days once we have verified all functionality. We have also not enabled HTTP/2 yet, this will probably come at a future date.

Please help us out with testing this, and let us know if you find something that's not working, by emailing the pgsql-www mailinglist.

There are still some other postgresql.org websites that are not available over https, and we will be working on those as well over the coming weeks or months.

JSON field constraints

After giving my presentation at ConFoo this year, I had some discussions with a few people about the ability to put constraints on JSON data, and whether any of the advanced PostgreSQL constraints work for that. Or in short, can we get the benefits from both SQL and NoSQL at the same time?

My general response to questions like this when it comes to PostgreSQL is "if you think there's a chance it works, it probably does", and it turns out that applies in this case as well.

For things like UNIQUE keys and CHECK constraints it's fairly trivial, but there are also things like EXCLUSION constraints where there are some special constructs that need to be considered.

Other than the technical side of things, it's of course also a question of "should we do this". The more constraints that are added to the JSON data, the less "schemaless" it is. On the other hand, other databases that have schemaless/dynamic schema as their main selling points, but still require per-key indexes and constraints (unlike PostgreSQL where JSONB is actually schemaless even when indexed).

Anyway, back on topic. Keys and constraints on JSON data.

In PostgreSQL, keys and constraints can be defined on both regular columns and directly on any expression, as long as that expression is immutable (meaning that the output is only ever dependent on the input, and not on any outside state). And this functionality works very well with JSONB as well.

So let's start with a standard JSONB table:

postgres=# CREATE TABLE jsontable (j jsonb NOT NULL);
CREATE TABLE

postgres=# CREATE INDEX j_idx ON jsontable USING gin(j jsonb_path_ops);
CREATE INDEX

Of course, declaring a table like this is very seldom a good idea in reality - a single table with just a JSONB field. You probably know more about your data than that, so there will be other fields in the table than just the JSONB field. But this table will suffice for our example.

A standard gin index using jsonb_path_ops is how we get fully schemaless indexing in jsonb with maximum performance. We're not actually going to use this index in the examples below this time, but in real deployments it's likely one of the main reasons to use JSONB in the first place.

To illustrate the constraints, let's add some data representing some sort of bookings. Yes, this would be much better represented as relational, but for the sake of example we'll use JSON with a semi-fixed schema. We'll also use a uuid in the JSON data as some sort of key, as this is fairly common in these scenarios.

postgres=# INSERT INTO jsontable (j) VALUES ($${
  "uuid": "4e9cf085-09a5-4b4f-bc99-bde2d2d51f41",
  "start": "2015-03-08 10:00",
  "end": "2015-03-08 11:00",
  "title": "test"
}$$);
INSERT 0 1

Continue reading

A new version of Planet PostgreSQL

I have just pushed code for a new version of the codebase for Planet PostgreSQL.

For those of you who are just reading Planet, hopefully nothing at all should change. There will probably be some bugs early on, but there are no general changes in functionality. If you notice something that is wrong (given a couple of hours from this post at least), please send an email to planet(at)postgresql.org and we'll look into it!

For those who have your blogs aggregated at Planet PostgreSQL, there are some larger changes. In particular, you will notice the whole registration interface has been re-written. Hopefully that will make it easier to register blogs, and also to manage the ones you have (such as removing a post that needs to be hidden). The other major change is that Planet PostgreSQL will now email you whenever something has been fetched from your blog - to help you catch configuration mistakes bigger.

The by far largest changes are in the moderation and administration backend. This will hopefully lead to faster processing of blog submissions, and less work for the moderators.

New committer for PostgreSQL

The PostgreSQL core team are happy to welcome Dean Rasheed as a new committer to the project.

Dean has been involved wih patch review and authorship for a while, and will now get the extra workload of also pushing other patches through the final hurdles!

Nordic PGDay 2016 open for registration

Nordic PGDay is a yearly one-day conference in the Nordic region, and this year it will be held in Helsinki, Finland, on March 17th. We have just opened up for registration! The event sold out in the past two years, so get your tickets early!

The call for papers is still open, which means that we do not yet have a schedule. We hope to post this as soon as possible after the CfP closes, but you can as usual count on a full day of interesting talks about PostgreSQL!

And for those of you who are interested in speaking, yes, the CfP is open until the end of this week. Get your submissions in!

PGConf.EU 2014 - feedback is in

It's that time of the year again - we've wrapped PGConf.EU 2014, and I've just closed the feedback system, so it's time to take a look at what's been said.

We're keeping fairly consistent numbers with previous years, which is something we are definitely happy with. We did have a slight drop in "overall view", since this year we had 8% ranking us as 3, a worse score than we saw last year, and we had a couple of fewer people voting 5. And a slight shift from 5 to 4 on the programme. The numbers are still good of course, but since we had a tiny drop last year as well, we need to step our game back up for next year!

This year we had a slightly bigger spread of how users identify themselves, seeing most categories chip away a little on DBAs and Developers, but they are still definitely the dominating categories. We also have a lot of returning developers - it's cool to see so many people who have been to every one of our events so far, combined with a full 25% being first-time attendees!

Continue reading

A few short notes about PostgreSQL and POODLE

The POODLE attack on https (the attack is about https, the vulnerability in SSL, an important distinction) has received a lot of media attention lately, so I figured a (very) short writeup was necessary.

The TL;DR; version is, you don't have to worry about POODLE for your PostgreSQL connections when using SSL.

The slightly longer version can be summarized by:

  • The PostgreSQL libpq client in all supported versions will only connect with TLSv1, which is not vulnerable.
  • The PostgreSQL server prior to the upcoming 9.4 version will however respond in SSLv3 (which is the vulnerable version) if the client insists on it (which a third party client can do).
  • To exploit POODLE, you need a client that explicitly does out-of-protocol downgrading. Something that web browsers do all the time, but very few other clients do. No known PostgreSQL client library does.
  • To exploit POODLE, the attacker needs to be able to modify the contents of the encrypted stream - it cannot be passively broken into. This can of course happen if the attacker can control parameters to a SQL query for example, but the control over the data tends to be low, and the attacker needs to already control the client. In the https attack, this is typically done through injecting javascript.
  • To exploit POODLE, there needs to be some persistent secret data at a fixed offset in each connection. This is extremely unlikely in PostgreSQL, as the protocol itself has no such data. There is a "cancel key" at the same location in each stream, but it is not reused and a new one is created for each connection. This is where the https attack typically uses the session cookie which is both secret and fixed location in the request header.

For a really good writeup on the problem, see this post from PolarSSL, or this one from GnuTLS.

The end of PostgreSQL 8.4 - a small tribute

Todays release marks the end of life for PostgreSQL 8.4, after 5 years of service.

I've found it interesting that PostgreSQL 8.4 was probably the first release to actually make it possible to run fairly busy PostgreSQL installations without any hands-on at all. There were many important parts to 8.4 that made that happen, such as much better standby database support (though not hot standby yet - that came in 9.0) and better statistics and monitoring systems. 8.4 also came with Recursive CTEs (WITH-queries) and Window Aggregates, both of which made a whole new set of applications possible.

But I think nothing actually brought about this "run without hands-on" as much as the new code for Visibility Map (VM) and in particular the automatic sizing of the Free Space Map (FSM). Anybody who deployed 8.3 or earlier in any kind of busy environment knows the pain of trying to configure max_fsm_pages correctly. It was almost impossible to get it right, and the value of right kept changing with your load and data. And the worst part of it all was that if you got it wrong you were in trouble - there wasn't really any remedy other than taking your database offline (or at least read-only) for manual full database vacuums (there were tricks to get around the vacuum specifics, but the core problem was there). So what happened was that a lot of those people who knew what to do just increased that number to something that was "big enough", which usually meant "way too big" and thus wasting resources. And even with that, sometimes ran into it not being big enough because the workload changed.

In fact, I think more or less every single system I did reviews for customers for on those versions had a substantially wrong number in max_fsm_pages - usually the default value because they had no idea what to do with it - and were paying the price for it.

Extra funny is that I've been talking to Heikki (who wrote the dynamic FSM and VM code) a few times recently, and he's still surprised that these changes had such a big impact on real life deployments. Probably because it's one of those things that is usually not a big problem at all in small installations, developer installations, and testing systems, but can quickly snowball in real life production. I think many of us didn't realize before the release actually made it out there how useful it would be.

This feature, together with things like Recursive CTEs and Window Functions really makes 8.4 one of the landmark PostgreSQL releases. It's easier to point to releases like 9.0 which had Streaming Replication and Hot Standby because they are sexier and more marketable features, but infrastructure like this is more important than we often think.

8.4 got rid of some of the hardest things to tune. Let's stay on the path of trying to keep the system simple enough to use that it doesn't need handholding, and look forward to the upcoming 9.4 release!

Getting information about PostgreSQL SSL connections

PostgreSQL can, as many other products, use SSL to secure client/server communications. It can be configured to be optional or required, to require a client certificate, or even to use the SSL client certificate to log in. However, the DBA tools are currently a bit lacking in this regard. In particular, there is no way for a DBA to see what SSL parameters are in force (or indeed if SSL is enabled at all) for an already existing connection.

There are multiple ways to see the status of a specific connection (such as the libpq PQgetssl() function, the psql startup message or the sslinfo module. Unfortunately all these methods have one thing in common - they are only available to the process making the connection to the database, not to a DBA inspecting the system from the outside.

9.4 will make it a little bit better, because log_connections now include SSL information when the user connects, similar to:

LOG:  connection authorized: user=mha database=postgres SSL enabled (protocol=TLSv1.1, cipher=ECDHE-RSA-AES256-SHA)

But this can still be a bit of a pain to figure out for existing connectioons of course.

To deal with this problem, I've written a small PostgreSQL extension called pg_sslstatus. When loaded using shared_preload_libraries it will keep track of the SSL status for each connection, and expose it through a system view named pg_sslstatus like this:

postgres=# SELECT * FROM pg_sslstatus;
  pid  | ssl | bits | compression | version |        cipher        |                         clientdn                         
-------+-----+------+-------------+---------+----------------------+----------------------------------------------------------
 27286 | t   |  256 | f           | TLSv1   | ECDHE-RSA-AES256-SHA | 
 26682 | t   |  256 | t           | TLSv1   | ECDHE-RSA-AES256-SHA | /C=AU/ST=Some-State/O=Internet Widgits Pty Ltd/CN=magnus
 26693 | f   |      |             |         |                      | 
(3 rows)

It will include one row for each connection, and can then be joined with either pg_stat_activity or pg_stat_replication to view the SSL data in a bigger context:

postgres=# SELECT datname, usename, client_addr, ssl, cipher FROM pg_stat_activity INNER JOIN pg_sslstatus ON pg_stat_activity.pid=pg_sslstatus.pid;
 datname  | usename | client_addr | ssl |        cipher        
----------+---------+-------------+-----+----------------------
 postgres | mha     | 127.0.0.1   | t   | ECDHE-RSA-AES256-SHA
 postgres | mha     |             | f   | 
 pgweb    | pgweb   |             | f   | 
(2 rows)

The extension is available from my github account today for existing version of PostgreSQL, under the standard PostgreSQL license. My plan is to work on having a similar view included in PostgreSQL 9.5 by default, but it's unfortunately too late to include it in 9.4 at this point.

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

Postgres Vision 2016
Oct 11-Nov 13, 2016
San Francisco, USA
PGConf.EU 2016
Nov 1-Jan 4, 2016
Tallinn, Estonia
PGConf.Asia 2016
Dec 2-3, 2016
Tokyo, Japan

Past

Postgres Open
Sep 13-16, 2016
Dallas, USA
Stockholm PUG 2016/4
Aug 31, 2016
Stockholm, Sweden
PG Day'16 Russia
Jul 6-8, 2016
St Petersburg, Russia
PGDay UK 2016
Jul 05, 2016
London, UK
Stockholm PUG 2016/3
Jun 16, 2016
Stockholm, Sweden
More past conferences