Viewing entries tagged with postgresql. Return to full view.

Logging transactions that dropped tables

In a previous post I discussed a way to find out which transaction dropped a table by examining the transaction log, in order to set a restore point to right before the table was dropped.

But what if we have the luxury of planning ahead (right? Well, let's call it the second time it happens?). Shouldn't we be able to log which transaction dropped a table, and use that? Of course we should.

The first thing one tries is then of course something like this in postgresql.conf:

log_statement='ddl'
log_line_prefix = '%t [%u@%d] <%x> '

to include the transaction id of the table. Unfortunately:

2017-02-12 12:16:39 CET [mha@postgres] <0> LOG:  statement: drop table testtable;

The 0 as a transaction id indicates that this command was run in a virtual transaction, and did not have a real transaction id. The reason for this is that the statement logging happens before the statement has actually acquired a transaction. For example, if I instead drop two tables, and do so in a transaction:

postgres=# BEGIN;
BEGIN
postgres=# DROP TABLE test1;
DROP TABLE
postgres=# DROP TABLE test2;
DROP TABLE
postgres=# COMMIT;
COMMIT

I get this interesting output:

2017-02-12 12:17:43 CET [mha@postgres] <0> LOG:  statement: DROP TABLE test1;
2017-02-12 12:17:45 CET [mha@postgres] <156960> LOG:  statement: DROP TABLE test2;

Which shows two different transaction ids (one real and one not) for statements in the same transaction. That's obviously not true - they were both dropped by transaction 156960. The transaction id just wasn't available at the time of logging.

So what can we do about that? Event triggers to the rescue!

Continue reading

Another couple of steps on my backup crusade

For a while now, I've been annoyed with how difficult it is to set up good backups in PostgreSQL. The difficulty of doing this "right" has pushed people to use things like pg_dump for backups, which is not really a great option once your database reaches any non-toy size. And when visiting customers over the years I've seen a large number of home-written scripts to do PITR backups, most of them broken, and most of that breakage because the APIs provided were too difficult to use.

Over some time, I've worked on a number of ways to improve this situation, alone or with others. The bigger steps are:

  • 9.1 introduced pg_basebackup, making it easier to take base backups using the replication protocol
  • 9.2 introduced transaction log streaming to pg_basebackup
  • 9.6 introduced a new version of the pg_start_backup/pg_stop_backup APIs that are needed to do more advanced base backups, in particular using third party backup tools.

For 10.0, there are a couple of new things that have been done in the past couple of weeks:

Continue reading

Financial updates in PostgreSQL Europe

As we say welcome to a new year, we have a couple of updates to the finances and payment handling in PostgreSQL Europe, that will affect our members and attendees of our events.

First of all, PostgreSQL Europe has unfortunately been forced to VAT register. This means that most of our invoices (details below) will now include VAT.

Second, we have enabled a new payment provider for those of you that can't or prefer not to use credit cards but that still allows for fast payments.

Continue reading

Mail agents in the PostgreSQL community

A few weeks back, I noticed the following tweet from Michael Paquier:

tweet

And my first thought was "that can't be right" (spoiler: Turns out it wasn't. But almost.)

The second thought was "hmm, I wonder how that has actually changed over time". And of course, with today being a day off and generally "slow pace" (ahem), what better way than to analyze the data that we have. The PostgreSQL mailinglist archives are all stored in a PostgreSQL database of course, so running the analytics is a quick job.

Continue reading

A more secure Planet PostgreSQL

Today, Planet PostgreSQL was switched over from http to https. Previously, https was only used for the logged in portions for blog owners, but now the whole site uses it. If you access the page with the http protocol, you will automatically be redirected to https.

As part of this, the RSS feeds have also changed address from http to https (the path part of the URLs remain unchanged). If your feed reader does not automatically follow redirects, this will unfortunately make it stop updating until you have changed the URL.

In a couple of days we will enable HTTP Strict Transport Security on the site as well.

We apologize for the inconvenience for those of you who have to reconfigure your feeds, but we hope you agree the change is for the better.

PGConf.EU 2016 attendee statistics

It is now about a week since PGConf.EU 2016, and things are slowly returning to normal :) You'll have to wait a while longer for the traditional summary of the feedback post that I make every year, but there's another piece of statistics I'd like to share.

As always, Dave put the attendees per country statistics into the closing session slides, and we shared some of the top countries. Unsurprisingly, countries like Estonia (the host country), Germany (one of Europes larges country), Sweden and Russia (countries near by) were at the top.

For those looking into more details, here is the actual statistics for all countries and not just the top ones (click for bigger version)

Continue reading

Locating the recovery point just before a dropped table

A common example when talking about why it's a good thing to be able to do PITR (Point In Time Recovery) is the scenario where somebody or some thing (operator or buggy application) dropped a table, and we want to do a recover to right before the table was dropped, to keep as much valid data as possible.

PostgreSQL comes with nice functionality to decide exactly what point to perform a recovery to, which can be specified at millisecond granularity, or that of an individual transaction. But what if we don't know exactly when the table was dropped? (Exactly being at the level of specific transaction or at least millisecond).

On way to handle that is to "step forward" through the log one transaction at a time until the table is gone. This is obviously very time-consuming.

Assuming that DROP TABLE is not something we do very frequently in our system, we can also use the pg_xlogdump tool to help us find the exact spot to perform the recovery to, in much less time. Unfortunately, the dropping of temporary tables (implicit or explicit) is included in this, so if your application uses a lot of temporary tables this approach will not work out of the box. But for applications without them, it can save a lot of time.

Let's show an example. This assumes you have already set up the system for log archiving, you have a base backup that you have restored, and you have a log archive.

The first thing we do is try to determine the point where a DROP TABLE happened. We can do this by scanning for entries where rows have been deleted from the pg_class table, as this will always happen as part of the drop.

Continue reading

www.postgresql.org is now https only

We've just flipped the switch on www.postgresql.org to be served on https only. This has been done for a number of reasons:

  • In response to popular request
  • Google, and possibly other search engines, have started to give higher scores to sites on https, and we were previously redirecting accesses to cleartext
  • Simplification of the site code which now doesn't have to keep track of which pages need to be secure and which does not
  • Prevention of evil things like WiFi hotspot providers injecting ads or javascript into the pages

We have not yet enabled HTTP Strict Transport Security, but will do so in a couple of days once we have verified all functionality. We have also not enabled HTTP/2 yet, this will probably come at a future date.

Please help us out with testing this, and let us know if you find something that's not working, by emailing the pgsql-www mailinglist.

There are still some other postgresql.org websites that are not available over https, and we will be working on those as well over the coming weeks or months.

JSON field constraints

After giving my presentation at ConFoo this year, I had some discussions with a few people about the ability to put constraints on JSON data, and whether any of the advanced PostgreSQL constraints work for that. Or in short, can we get the benefits from both SQL and NoSQL at the same time?

My general response to questions like this when it comes to PostgreSQL is "if you think there's a chance it works, it probably does", and it turns out that applies in this case as well.

For things like UNIQUE keys and CHECK constraints it's fairly trivial, but there are also things like EXCLUSION constraints where there are some special constructs that need to be considered.

Other than the technical side of things, it's of course also a question of "should we do this". The more constraints that are added to the JSON data, the less "schemaless" it is. On the other hand, other databases that have schemaless/dynamic schema as their main selling points, but still require per-key indexes and constraints (unlike PostgreSQL where JSONB is actually schemaless even when indexed).

Anyway, back on topic. Keys and constraints on JSON data.

In PostgreSQL, keys and constraints can be defined on both regular columns and directly on any expression, as long as that expression is immutable (meaning that the output is only ever dependent on the input, and not on any outside state). And this functionality works very well with JSONB as well.

So let's start with a standard JSONB table:

postgres=# CREATE TABLE jsontable (j jsonb NOT NULL);
CREATE TABLE

postgres=# CREATE INDEX j_idx ON jsontable USING gin(j jsonb_path_ops);
CREATE INDEX

Of course, declaring a table like this is very seldom a good idea in reality - a single table with just a JSONB field. You probably know more about your data than that, so there will be other fields in the table than just the JSONB field. But this table will suffice for our example.

A standard gin index using jsonb_path_ops is how we get fully schemaless indexing in jsonb with maximum performance. We're not actually going to use this index in the examples below this time, but in real deployments it's likely one of the main reasons to use JSONB in the first place.

To illustrate the constraints, let's add some data representing some sort of bookings. Yes, this would be much better represented as relational, but for the sake of example we'll use JSON with a semi-fixed schema. We'll also use a uuid in the JSON data as some sort of key, as this is fairly common in these scenarios.

postgres=# INSERT INTO jsontable (j) VALUES ($${
  "uuid": "4e9cf085-09a5-4b4f-bc99-bde2d2d51f41",
  "start": "2015-03-08 10:00",
  "end": "2015-03-08 11:00",
  "title": "test"
}$$);
INSERT 0 1

Continue reading

A new version of Planet PostgreSQL

I have just pushed code for a new version of the codebase for Planet PostgreSQL.

For those of you who are just reading Planet, hopefully nothing at all should change. There will probably be some bugs early on, but there are no general changes in functionality. If you notice something that is wrong (given a couple of hours from this post at least), please send an email to planet(at)postgresql.org and we'll look into it!

For those who have your blogs aggregated at Planet PostgreSQL, there are some larger changes. In particular, you will notice the whole registration interface has been re-written. Hopefully that will make it easier to register blogs, and also to manage the ones you have (such as removing a post that needs to be hidden). The other major change is that Planet PostgreSQL will now email you whenever something has been fetched from your blog - to help you catch configuration mistakes bigger.

The by far largest changes are in the moderation and administration backend. This will hopefully lead to faster processing of blog submissions, and less work for the moderators.

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

SCALE+PGDays
Mar 2-5, 2017
Pasadena, California, USA
Open Source Infrastructure @ SCALE
Mar 2, 2017
Pasadena, California, USA
Confoo Montreal 2017
Mar 8-10, 2017
Montreal, Canada
Nordic PGDay 2017
Mar 21, 2017
Stockholm, Sweden
pgDay.paris 2017
Mar 23, 2017
Paris, France
PGCon 2017
May 23-26, 2017
Ottawa, Canada

Past

FOSDEM + PGDay 2017
Feb 2-4, 2017
Brussels, Belgium
PGConf.Asia 2016
Dec 2-3, 2016
Tokyo, Japan
Berlin PUG
Nov 17, 2016
Berlin, Germany
PGConf.EU 2016
Nov 1-4, 2016
Tallinn, Estonia
Stockholm PUG 2016/5
Oct 25, 2016
Stockholm, Sweden
More past conferences