I'm sure everybody is aware you can have PostgreSQL fetch data live across the internet (or locally of course) as part of your queries and use the data directly. In fact there are a large number of drivers available to access different kinds of data for download. But in the simplest case, we can also just use the file_fdw
wrapper that's included in the standard PostgreSQL packages, together with everybody's http-swiss-army-knife, curl
.
In attempting to adapt this post for the public, what more time-relevant dataset to work off in these pandemic-affected times than the open data provided by the ECDC, being both open data and very current. In particular for this example, they provide public datasets with COVID numbers from across the world (actual public data, requiring no registration to read, and actual data, not just a limited API).
So, let's see how we can access this data from PostgreSQL:
CREATE EXTENSION file_fdw;
CREATE SERVER curly FOREIGN DATA WRAPPER file_fdw;
CREATE FOREIGN TABLE _rawdata (
daterep text not null,
day int not null,
month int not null,
year int not null,
cases int not null,
deaths int not null,
countries text not null,
geoid text not null,
countrycode text null,
popdata int null,
continent text not null,
cumulative14days float null
)
SERVER curly
OPTIONS (
PROGRAM 'curl -s https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/',
FORMAT 'csv',
HEADER 'on'
);
And to use this we can simply query the foreign table, which will then trigger a http GET of the data:
covid=# SELECT count(*) FROM _rawdata;
count
-------
41837
(1 row)
However, there is an unfortunate interaction with LIMIT
. So if we for example try to get just the first 10 rows:
covid=# SELECT * FROM _rawdata LIMIT 10;
ERROR: program "curl -s https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/" failed
DETAIL: child process exited with exit code 23
PostgreSQL 9.1 beta1 now available - now is a great time to start testing it, and trying out all the great new features.
There have always been a number of ways to read your PostgreSQL logs remotely, over a libpq connection. For example, you can use the pg_read_file() function - which is what pgadmin does. PostgreSQL 9.1 adds a new and more convenient way (in some ways) to do this - using SQL/MED.
PostgreSQL 9.1 comes with SQL standard SQL/MED functionality. The MED in is short for "Managemend of External Data", and as the name sounds, it's about accessing data that's external to the PostgreSQL server. The SQL/MED functionality is not (yet) complete, but it's already very useful in it's current state.
In SQL/MED, there is something called a Foreign Data Wrapper, that can be compared to a driver. Using this FDW, we can create one or more Foreign Servers, which is a definition of how to connect to a specific instance of the service - if any. Finally, we can create one or more Foreign Tables on each of the Foreign Servers, giving us direct access to the remote data using SQL.