After giving my presentation at ConFoo this year, I had some discussions with a few people about the ability to put constraints on JSON data, and whether any of the advanced PostgreSQL constraints work for that. Or in short, can we get the benefits from both SQL and NoSQL at the same time?
My general response to questions like this when it comes to PostgreSQL is "if you think there's a chance it works, it probably does", and it turns out that applies in this case as well.
For things like UNIQUE keys and CHECK constraints it's fairly trivial, but there are also things like EXCLUSION constraints where there are some special constructs that need to be considered.
Other than the technical side of things, it's of course also a question of "should we do this". The more constraints that are added to the JSON data, the less "schemaless" it is. On the other hand, other databases that have schemaless/dynamic schema as their main selling points, but still require per-key indexes and constraints (unlike PostgreSQL where JSONB is actually schemaless even when indexed).
Anyway, back on topic. Keys and constraints on JSON data.
In PostgreSQL, keys and constraints can be defined on both regular columns and directly on any expression, as long as that expression is immutable (meaning that the output is only ever dependent on the input, and not on any outside state). And this functionality works very well with JSONB as well.
So let's start with a standard JSONB table:
postgres=# CREATE TABLE jsontable (j jsonb NOT NULL);
CREATE TABLE
postgres=# CREATE INDEX j_idx ON jsontable USING gin(j jsonb_path_ops);
CREATE INDEX
Of course, declaring a table like this is very seldom a good idea in reality - a single table with just a JSONB field. You probably know more about your data than that, so there will be other fields in the table than just the JSONB field. But this table will suffice for our example.
A standard gin index using jsonb_path_ops is how we get fully schemaless indexing in jsonb with maximum performance. We're not actually going to use this index in the examples below this time, but in real deployments it's likely one of the main reasons to use JSONB in the first place.
To illustrate the constraints, let's add some data representing some sort of bookings. Yes, this would be much better represented as relational, but for the sake of example we'll use JSON with a semi-fixed schema. We'll also use a uuid in the JSON data as some sort of key, as this is fairly common in these scenarios.
postgres=# INSERT INTO jsontable (j) VALUES ($${
"uuid": "4e9cf085-09a5-4b4f-bc99-bde2d2d51f41",
"start": "2015-03-08 10:00",
"end": "2015-03-08 11:00",
"title": "test"
}$$);
INSERT 0 1
I have just pushed code for a new version of the codebase for Planet PostgreSQL.
For those of you who are just reading Planet, hopefully nothing at all should change. There will probably be some bugs early on, but there are no general changes in functionality. If you notice something that is wrong (given a couple of hours from this post at least), please send an email to planet(at)postgresql.org and we'll look into it!
For those who have your blogs aggregated at Planet PostgreSQL, there are some larger changes. In particular, you will notice the whole registration interface has been re-written. Hopefully that will make it easier to register blogs, and also to manage the ones you have (such as removing a post that needs to be hidden). The other major change is that Planet PostgreSQL will now email you whenever something has been fetched from your blog - to help you catch configuration mistakes bigger.
The by far largest changes are in the moderation and administration backend. This will hopefully lead to faster processing of blog submissions, and less work for the moderators.
The PostgreSQL core team are happy to welcome Dean Rasheed as a new committer to the project.
Dean has been involved wih patch review and authorship for a while, and will now get the extra workload of also pushing other patches through the final hurdles!
Nordic PGDay is a yearly one-day conference in the Nordic region, and this year it will be held in Helsinki, Finland, on March 17th. We have just opened up for registration! The event sold out in the past two years, so get your tickets early!
The call for papers is still open, which means that we do not yet have a schedule. We hope to post this as soon as possible after the CfP closes, but you can as usual count on a full day of interesting talks about PostgreSQL!
And for those of you who are interested in speaking, yes, the CfP is open until the end of this week. Get your submissions in!
It's that time of the year again - we've wrapped PGConf.EU 2014, and I've just closed the feedback system, so it's time to take a look at what's been said.
We're keeping fairly consistent numbers with previous years, which is something we are definitely happy with. We did have a slight drop in "overall view", since this year we had 8% ranking us as 3, a worse score than we saw last year, and we had a couple of fewer people voting 5. And a slight shift from 5 to 4 on the programme. The numbers are still good of course, but since we had a tiny drop last year as well, we need to step our game back up for next year!
This year we had a slightly bigger spread of how users identify themselves, seeing most categories chip away a little on DBAs and Developers, but they are still definitely the dominating categories. We also have a lot of returning developers - it's cool to see so many people who have been to every one of our events so far, combined with a full 25% being first-time attendees!
The POODLE attack on https (the attack is about https, the vulnerability in SSL, an important distinction) has received a lot of media attention lately, so I figured a (very) short writeup was necessary.
The TL;DR; version is, you don't have to worry about POODLE for your PostgreSQL connections when using SSL.
The slightly longer version can be summarized by:
For a really good writeup on the problem, see this post from PolarSSL, or this one from GnuTLS.
Todays release marks the end of life for PostgreSQL 8.4, after 5 years of service.
I've found it interesting that PostgreSQL 8.4 was probably the first release to actually make it possible to run fairly busy PostgreSQL installations without any hands-on at all. There were many important parts to 8.4 that made that happen, such as much better standby database support (though not hot standby yet - that came in 9.0) and better statistics and monitoring systems. 8.4 also came with Recursive CTEs (WITH-queries) and Window Aggregates, both of which made a whole new set of applications possible.
But I think nothing actually brought about this "run without hands-on" as much as the new code for Visibility Map (VM) and in particular the automatic sizing of the Free Space Map (FSM). Anybody who deployed 8.3 or earlier in any kind of busy environment knows the pain of trying to configure max_fsm_pages correctly. It was almost impossible to get it right, and the value of right kept changing with your load and data. And the worst part of it all was that if you got it wrong you were in trouble - there wasn't really any remedy other than taking your database offline (or at least read-only) for manual full database vacuums (there were tricks to get around the vacuum specifics, but the core problem was there). So what happened was that a lot of those people who knew what to do just increased that number to something that was "big enough", which usually meant "way too big" and thus wasting resources. And even with that, sometimes ran into it not being big enough because the workload changed.
In fact, I think more or less every single system I did reviews for customers for on those versions had a substantially wrong number in max_fsm_pages - usually the default value because they had no idea what to do with it - and were paying the price for it.
Extra funny is that I've been talking to Heikki (who wrote the dynamic FSM and VM code) a few times recently, and he's still surprised that these changes had such a big impact on real life deployments. Probably because it's one of those things that is usually not a big problem at all in small installations, developer installations, and testing systems, but can quickly snowball in real life production. I think many of us didn't realize before the release actually made it out there how useful it would be.
This feature, together with things like Recursive CTEs and Window Functions really makes 8.4 one of the landmark PostgreSQL releases. It's easier to point to releases like 9.0 which had Streaming Replication and Hot Standby because they are sexier and more marketable features, but infrastructure like this is more important than we often think.
8.4 got rid of some of the hardest things to tune. Let's stay on the path of trying to keep the system simple enough to use that it doesn't need handholding, and look forward to the upcoming 9.4 release!
PostgreSQL can, as many other products, use SSL to secure client/server communications. It can be configured to be optional or required, to require a client certificate, or even to use the SSL client certificate to log in. However, the DBA tools are currently a bit lacking in this regard. In particular, there is no way for a DBA to see what SSL parameters are in force (or indeed if SSL is enabled at all) for an already existing connection.
There are multiple ways to see the status of a specific connection (such as the libpq PQgetssl() function, the psql startup message or the sslinfo module. Unfortunately all these methods have one thing in common - they are only available to the process making the connection to the database, not to a DBA inspecting the system from the outside.
9.4 will make it a little bit better, because log_connections now include SSL information when the user connects, similar to:
LOG: connection authorized: user=mha database=postgres SSL enabled (protocol=TLSv1.1, cipher=ECDHE-RSA-AES256-SHA)
But this can still be a bit of a pain to figure out for existing connectioons of course.
To deal with this problem, I've written a small PostgreSQL extension called pg_sslstatus. When loaded using shared_preload_libraries it will keep track of the SSL status for each connection, and expose it through a system view named pg_sslstatus like this:
postgres=# SELECT * FROM pg_sslstatus;
pid | ssl | bits | compression | version | cipher | clientdn
-------+-----+------+-------------+---------+----------------------+----------------------------------------------------------
27286 | t | 256 | f | TLSv1 | ECDHE-RSA-AES256-SHA |
26682 | t | 256 | t | TLSv1 | ECDHE-RSA-AES256-SHA | /C=AU/ST=Some-State/O=Internet Widgits Pty Ltd/CN=magnus
26693 | f | | | | |
(3 rows)
It will include one row for each connection, and can then be joined with either pg_stat_activity or pg_stat_replication to view the SSL data in a bigger context:
postgres=# SELECT datname, usename, client_addr, ssl, cipher FROM pg_stat_activity INNER JOIN pg_sslstatus ON pg_stat_activity.pid=pg_sslstatus.pid;
datname | usename | client_addr | ssl | cipher
----------+---------+-------------+-----+----------------------
postgres | mha | 127.0.0.1 | t | ECDHE-RSA-AES256-SHA
postgres | mha | | f |
pgweb | pgweb | | f |
(2 rows)
The extension is available from my github account today for existing version of PostgreSQL, under the standard PostgreSQL license. My plan is to work on having a similar view included in PostgreSQL 9.5 by default, but it's unfortunately too late to include it in 9.4 at this point.
Is your PostgreSQL installation vulnerable to the Heartbleed bug in OpenSSL? The TL;DR; version is "maybe, it depends, you should read this whole thing to find out". If you are vulnerable, it is a high risk vulnerability!
The slightly longer version is that it will be vulnerable if you are using SSL, and not vulnerable if you are not. But the situation is not quite that easy, as you may be using SSL even without planning to. PostgreSQL also not provide any extra protection against the bug - if you are using SSL, you are vulnerable to the bug just as with any other service.
As the bug is in OpenSSL, however, what you need to get patched is your OpenSSL installation and not PostgreSQL itself. And of course, remember to restart your services (this includes both PostgreSQL and any other services using SSL on your system). You will then have to consider in your scenario if you have to replace your SSL keys or not - the same rules apply as to any other service.
PostgreSQL by default ships with SSL turned off on most platforms. The most notable exception is Debian and derivatives (such as Ubuntu), which enable SSL by default.
If SSL is disabled globally, your installation is not vulnerable.
The easiest way to check this is to just use a simple SQL query:
postgres=# show ssl;
ssl
-----
off
(1 row)
If this parameter returns off, you are not vulnerable. If it returns on, you are.
If you do not need SSL, the easiest fix is to turn this off and restart PostgreSQL. This also brings additional benefits of not paying the overhead of encryption if you don't need it. If you actually use SSL, this is of course not an option.
If you have installed PostgreSQL using a package based system, such as yum (from redhat/fedora or from yum.postgresql.org), apt (from debian/ubuntu/etc or from apt.postgresql.org), FreeBSD ports etc, it is up to your operating system to provide a patch. Most major distributions have already done this - you just need to to install it (and restart your services!). If your distribution has not yet updated, you need to convince them to do so ASAP.
If you are using a PostgreSQL installation package that bundles OpenSSL, you need an updated version of this package. The most common example of this is the EnterpriseDB Graphical Installers primarily used on Windows and Mac. We expect a new version of these installers to be released within a day or a few.
Postgres.app is also vulnerable and needs an update, but is normally not used for servers.
The OpenSCG separate download packages are also vulnerable.
For each of these you will have to wait for an updated package to show up in the next couple of days. All package maintainers have been notified, so it's only a matter of time.
Per the www.postgresql.org download pages we do recommend that you always use the "package manager" system for any platform where this is supported, which means most modern Linux or BSD distributions. If you are currently using one of the above installers on these platforms, a quick fix before the packages are out would be to switch to one of the "package manager" platforms that rely on the operating system update process. This may or may not be an option of course, depending on the complexity of the installation.
If you are using a platform where this is not available (such as Windows), your only option is to wait.
In PostgreSQL, the SSL negotiation happens before pg_hba.conf is matched. And the vulnerability in OpenSSL is in the negotiation phase. For this reason, even if you have restricted access to your server using pg_hba.conf IP filter rules, or your pg_hba.conf specifies only hostnossl records, this does not protect you.
Obviously, if you have an IP level firewall, either at the host or on the network, that will protect you. But pg_hba does not.
The pgcrypto module in PostgreSQL uses OpenSSL to provide encryption functions when available. Since the vulnerability is specifically in the protocol negotiation, use in pgcrypto is not vulnerable to this issue.
Nordic PGDay 2014 was just over a week ago in Stockholm, Sweden, and was definitely a success. A lot more people than we initially expected had the event sell out completely a couple of weeks before it happened. A week later we've now collected the feedback forms that people filled out (a big thanks to the 25-30 people who did fill them out - a 30% "hit rate" on feedback forms gives us a pretty good input).
I'll lead with a few graphs that fairly well represents the feedback:
In particular, that every person who filled out our feedback would recommend the event to friends or colleagues is definitely a flattering grade!