Viewing entries tagged with fdw. Return to full view.

FDWs, curl and LIMIT

I'm sure everybody is aware you can have PostgreSQL fetch data live across the internet (or locally of course) as part of your queries and use the data directly. In fact there are a large number of drivers available to access different kinds of data for download. But in the simplest case, we can also just use the file_fdw wrapper that's included in the standard PostgreSQL packages, together with everybody's http-swiss-army-knife, curl.

In attempting to adapt this post for the public, what more time-relevant dataset to work off in these pandemic-affected times than the open data provided by the ECDC, being both open data and very current. In particular for this example, they provide public datasets with COVID numbers from across the world (actual public data, requiring no registration to read, and actual data, not just a limited API).

So, let's see how we can access this data from PostgreSQL:

CREATE EXTENSION file_fdw;

CREATE SERVER curly FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE _rawdata (
 daterep text not null,
 day int not null,
 month int not null,
 year int not null,
 cases int not null,
 deaths int not null,
 countries text not null,
 geoid text not null,
 countrycode text null,
 popdata int null,
 continent text not null,
 cumulative14days float null
)
SERVER curly
OPTIONS (
 PROGRAM 'curl -s https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/',
 FORMAT 'csv',
 HEADER 'on'
);

And to use this we can simply query the foreign table, which will then trigger a http GET of the data:

covid=# SELECT count(*) FROM _rawdata;
 count 
-------
 41837
(1 row)

However, there is an unfortunate interaction with LIMIT. So if we for example try to get just the first 10 rows:

covid=# SELECT * FROM _rawdata LIMIT 10;
ERROR:  program "curl -s https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/" failed
DETAIL:  child process exited with exit code 23

Continue reading

Remote log reading in PostgreSQL 9.1

PostgreSQL 9.1 beta1 now available - now is a great time to start testing it, and trying out all the great new features.

There have always been a number of ways to read your PostgreSQL logs remotely, over a libpq connection. For example, you can use the pg_read_file() function - which is what pgadmin does. PostgreSQL 9.1 adds a new and more convenient way (in some ways) to do this - using SQL/MED.

PostgreSQL 9.1 comes with SQL standard SQL/MED functionality. The MED in is short for "Managemend of External Data", and as the name sounds, it's about accessing data that's external to the PostgreSQL server. The SQL/MED functionality is not (yet) complete, but it's already very useful in it's current state.

In SQL/MED, there is something called a Foreign Data Wrapper, that can be compared to a driver. Using this FDW, we can create one or more Foreign Servers, which is a definition of how to connect to a specific instance of the service - if any. Finally, we can create one or more Foreign Tables on each of the Foreign Servers, giving us direct access to the remote data using SQL.

Continue reading

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

PGDay Chicago 2024
Apr 26, 2024
Chicago, USA
PGConf.DEV 2024
May 28-31, 2024
Vancouver, Canada

Past

SCaLE 2024
Mar 14-17, 2024
Pasadena, USA
Nordic PGDay 2024
Mar 12, 2024
Oslo, Norway
FOSDEM PGDay 2024
Feb 2-4, 2024
Brussels, Belgium
PGConf.EU 2023
Dec 12-15, 2023
Prague, Czechia
PGConf.NYC 2023
Oct 3-5, 2023
New York, USA
More past conferences