PostgreSQL 9.1 beta1 now available - now is a great time to start testing it, and trying out all the great new features.
There have always been a number of ways to read your PostgreSQL logs remotely, over a libpq connection. For example, you can use the pg_read_file() function - which is what pgadmin does. PostgreSQL 9.1 adds a new and more convenient way (in some ways) to do this - using SQL/MED.
PostgreSQL 9.1 comes with SQL standard SQL/MED functionality. The MED in is short for "Managemend of External Data", and as the name sounds, it's about accessing data that's external to the PostgreSQL server. The SQL/MED functionality is not (yet) complete, but it's already very useful in it's current state.
In SQL/MED, there is something called a Foreign Data Wrapper, that can be compared to a driver. Using this FDW, we can create one or more Foreign Servers, which is a definition of how to connect to a specific instance of the service - if any. Finally, we can create one or more Foreign Tables on each of the Foreign Servers, giving us direct access to the remote data using SQL.
To access our log file, we are going to use the Foreign Data Wrapper called file_fdw. This wrapper provides access to CSV files (and other files in formats that are recognizable by COPY), which is what we need for reading the logfiles. file_fdw ships as a contrib-module, which gives us a good chance to showcase another new feature in PostgreSQL 9.1 - extensions. It's not that it was hard to install contrib modules in 9.0 and earlier - but 9.1 makes this even easier. Installing file_fdw is now a single command:
postgres=# CREATE EXTENSION file_fdw;
CREATE EXTENSION
And with just this, the FDW is available for use.
Before we can read the logfile, it needs to be configured for CSV format to a file - if this isn't already done (which it often is). To do this, we just need to set a couple of parameters in postgresql.conf:
log_destination='csvlog'
logging_collector='on'
log_filename='postgresql.log'
Having done this, we can start setting up the MED objects. We first create a server. For the file_fdw, we don't need to set up anything special, it will just provide us access to the local filesystem. So:
postgres=# CREATE SERVER logserver FOREIGN DATA WRAPPER file_fdw;
CREATE SERVER
Finally, we need to create a foreign table to access our data. The definition of the table that maps the format of the CSV log file is available in the PostgreSQL documentation, and we just need to convert it to a foreign table:
postgres=# CREATE FOREIGN TABLE postgres_log
(
log_time timestamp(3) with time zone,
user_name text,
database_name text,
process_id integer,
connection_from text,
session_id text,
session_line_num bigint,
command_tag text,
session_start_time timestamp with time zone,
virtual_transaction_id text,
transaction_id bigint,
error_severity text,
sql_state_code text,
message text,
detail text,
hint text,
internal_query text,
internal_query_pos integer,
context text,
query text,
query_pos integer,
location text,
application_name text)
SERVER logserver OPTIONS (filename 'pg_log/postgresql.csv', format 'csv');
CREATE FOREIGN TABLE
I just added FOREIGN to the beginning of the statement, and then the SERVER and OPTIONS sections at the end. And once we've done this, our logfile is now available as a general table. We can now run any kind of (read only) query against this table, including joins and analytics:
postgres=# SELECT count(*) FROM postgres_log;
count
-------
8
(1 row)
postgres=# SELECT database_name,count(*) FROM postgres_log GROUP BY database_name ORDER BY 2 desc;
database_name | count
---------------+-------
postgres | 5
| 3
(2 rows)
postgres=# SELECT log_time, log_time-lag(log_time,1) OVER () FROM postgres_log;
log_time | ?column?
----------------------------+--------------
2011-05-02 19:33:33.343+02 |
2011-05-02 19:33:33.364+02 | 00:00:00.021
2011-05-02 19:33:33.366+02 | 00:00:00.002
2011-05-02 19:35:04.828+02 | 00:01:31.462
2011-05-02 19:36:07.369+02 | 00:01:02.541
2011-05-02 19:36:13.948+02 | 00:00:06.579
2011-05-02 19:36:19.487+02 | 00:00:05.539
2011-05-02 20:44:30.073+02 | 01:08:10.586
(8 rows)
Yes, that last example is quite forced, but it goes to show that you can apply such things as window aggregates to your foreign tables just fine. You could even write a recursive CTE - but I couldn't come up with a reasonable example...
A word of warning - PostgreSQL does not support pushing down qualifications to the foreign data wrapper at this point, and the file_fdw provider would probably have a hard time supporting it anyway. That means that if your logfile is really big, it will still all be loaded at once. That means that if you want to do any kind of repeated analytics on it, it might be a good idea to copy it over to a (temporary?) table and do the analytics there, where there can be indexes to speed up the processing. Creating such a snapshot is trivial:
postgres=# CREATE TABLE postgres_log_snapshot AS SELECT * FROM postgres_log;
SELECT 9
And with that, we can read the logfiles. Since on this instance I just did this for a demo, here's an example of another part of extensions that's very useful - proper dependency tracking for all objects when you want to remove them:
postgres=# DROP EXTENSION file_fdw;
ERROR: cannot drop extension file_fdw because other objects depend on it
DETAIL: server logserver depends on foreign-data wrapper file_fdw
foreign table postgres_log depends on server logserver
HINT: Use DROP ... CASCADE to drop the dependent objects too.
postgres=# DROP EXTENSION file_fdw CASCADE;
NOTICE: drop cascades to 2 other objects
DETAIL: drop cascades to server logserver
drop cascades to foreign table postgres_log
DROP EXTENSION
New comments can no longer be posted on this entry.
Neat. Maybe it would be useful to have a template table around, so you could do for each log file:
create foreign table log_nn( like log_tmpl ) ...