Viewing entries tagged with window aggregates. Return to full view.

The end of PostgreSQL 8.4 - a small tribute

Todays release marks the end of life for PostgreSQL 8.4, after 5 years of service.

I've found it interesting that PostgreSQL 8.4 was probably the first release to actually make it possible to run fairly busy PostgreSQL installations without any hands-on at all. There were many important parts to 8.4 that made that happen, such as much better standby database support (though not hot standby yet - that came in 9.0) and better statistics and monitoring systems. 8.4 also came with Recursive CTEs (WITH-queries) and Window Aggregates, both of which made a whole new set of applications possible.

But I think nothing actually brought about this "run without hands-on" as much as the new code for Visibility Map (VM) and in particular the automatic sizing of the Free Space Map (FSM). Anybody who deployed 8.3 or earlier in any kind of busy environment knows the pain of trying to configure max_fsm_pages correctly. It was almost impossible to get it right, and the value of right kept changing with your load and data. And the worst part of it all was that if you got it wrong you were in trouble - there wasn't really any remedy other than taking your database offline (or at least read-only) for manual full database vacuums (there were tricks to get around the vacuum specifics, but the core problem was there). So what happened was that a lot of those people who knew what to do just increased that number to something that was "big enough", which usually meant "way too big" and thus wasting resources. And even with that, sometimes ran into it not being big enough because the workload changed.

In fact, I think more or less every single system I did reviews for customers for on those versions had a substantially wrong number in max_fsm_pages - usually the default value because they had no idea what to do with it - and were paying the price for it.

Extra funny is that I've been talking to Heikki (who wrote the dynamic FSM and VM code) a few times recently, and he's still surprised that these changes had such a big impact on real life deployments. Probably because it's one of those things that is usually not a big problem at all in small installations, developer installations, and testing systems, but can quickly snowball in real life production. I think many of us didn't realize before the release actually made it out there how useful it would be.

This feature, together with things like Recursive CTEs and Window Functions really makes 8.4 one of the landmark PostgreSQL releases. It's easier to point to releases like 9.0 which had Streaming Replication and Hot Standby because they are sexier and more marketable features, but infrastructure like this is more important than we often think.

8.4 got rid of some of the hardest things to tune. Let's stay on the path of trying to keep the system simple enough to use that it doesn't need handholding, and look forward to the upcoming 9.4 release!

Finding gaps in partitioned sequences

There are an almost unlimited number of articles on the web about how to find gaps in sequences in SQL. And it doesn't have to be very hard. Doing it in a "partitioned sequence" makes it a bit harder, but still not very hard. But when I turned to a window aggregate to do that, I was immediately told "hey, that's a good example of a window aggregate to solve your daily chores, you should blog about that". So here we go - yet another example of finding a gap in a sequence using SQL.

I have a database that is very simply structured - it's got a primary key made out of (groupid, year, month, seq), all integers. On top of that it has a couple of largish text fields and an fti field for full text search. (Initiated people will know right away which database this is). The sequence in the seq column resets to zero for each combination of (groupid, year, month). And I wanted to find out where there were gaps in it, and how big they were, to debug the tool that wrote the data into the database. This is really easy with a window aggregate:


SELECT * FROM (
   SELECT
      groupid,
      year,
      month,
      seq, 
      seq-lag(seq,1) OVER (PARTITION BY groupid, year, month ORDER BY seq) AS gap FROM mytable
) AS t
WHERE NOT (t.gap=1)
ORDER BY groupid, year, month, seq

One advantage to using a window aggregate for this is that we actually get the whole row back, and not just the primary key - so it's easy enough to include all the data you need to figure something out.

What about performance? I don't really have a big database to test this on, so I can't say for sure. It's going to be a sequential scan, since I look at the whole table,and not just parts of it. It takes about 4 seconds to run over a table of about a million rows, 2.7Gb, on a modest VM with no actual I/O capacity to speak of and a very limited amount of memory, returning about 100 rows. It's certainly by far fast enough for me in this case.

And as a bonus, it found me two bugs in the loading script and at least one bug in somebody elses code that I'm now waiting on to get fixed...

What's your favorite 8.4 feature?

Over the past month or so, I have been informally polling those of our customers I've been meeting with about which features in the upcoming 8.4 version of PostgreSQL they are most excited about, and most likely to have use for in the short-to-medium term. Part of the results were exactly what I expected, other parts were a bit more surprising.

First of all, the one feature that every single one said they were very much looking forward to is the updated Free Space Map implementation, which will remove two very annoying and difficult-to-get-right configuration parameters (max_fsm_pages and max_fsm_relations). So kudos to Heikki for that. Closely related to this feature is the Visibility Map, and it's reduction of VACUUM requirements. This is, however, not something a lot of our customers will have a direct use for - other than one or two, they don't actually have any issues with VACUUM (once they learned to remove VACUUM FULL from the nightly cronjobs).

When looking closer at the SQL level functionality that people have been excited above, I expected to see Common Table Expressions (CTEs), AKA Recursive Queries at the top of the list, based on requests on our lists in the future. However, it seems that at least to our customers, Window Aggregates are a lot more interesting than CTEs. I'm sure there are a lot of use-cases for recursive queries - there just seem to be even more of them for window aggregates.

Parallel restoring of dumps is also fairly high on peoples list, though most of the people that can really use it are deploying different ways to deal with it already. But they will definitely be using it once it's there. As well as the speedups of the PITR recovery and warm standby.

There are a lot of other features in the upcoming version as well. Some are small, some are large. There are still some smaller features (for example \ef in psql) that I keep missing a lot every time I end up working on an earlier version. They're all (well, most of them) great, but maybe not so obvious at first.

What's your favorite 8.4 feature?

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

PGConf.EU 2017
Oct 24-27, 2017
Warsaw, Poland
PGConf.Asia
Dec 4-6, 2017
Tokyo, Japan

Past

Postgres Open 2017
Sep 6-8, 2017
San Francisco, USA
PGDay.RU
Jul 5-7, 2017
St Petersburg, Russia
PGDay.UK
Jul 4, 2017
London, UK
Amsterdam PUG
Jun 29, 2017
Amsterdam, Netherlands
PGCon 2017
May 23-26, 2017
Ottawa, Canada
More past conferences