Postgres FDWs: Unlocking external data sources into Postgres
A blog on how Postgres Foreign Data Wrappers can be leveraged for loading all kinds of Data into Postgres
Modern data pipelines consist of moving data from one format to other, from one source to another. Tools like ETL exist to facilitate this kind of Data movement.
FDW Use Cases
Zero ETL overhead
Most popular ETL tools include Airbyte, Apache Airflow, Apache Nifi. These tools are versatile that allows integration between all combinations of Data sources and sinks. But ETLs have some issues such as:
It is async in process so data will always lag based on the ETL job run which is typically nightly/weekly.
Maintenance overhead as it is a separate system. There is also a cost component that is also thst needs to be compared.
Even though ETL have overheads, it is sometimes required to do complex Joins, filtration, materialized views. ETLs are still the go-to tool for this use case.
In the Postgres FDW world, the big data file formats such as Parquet, ORC are supported as Postgres FDWs
List of FDW
Foregin Data Wrapper | Feature |
File FDW | Simplest FDW |
Mongo FDW | Read Write from Mongo |
Kafka FDW | Read Write from Kafka |
MySQL FDW | Read Write from MySQL |
Oracle FDW | Read Write from Kafka |
JDBC FDW | Read Write via JDBC API |
TDS FDW |
(Sybase/Microsoft SQL Server) | Read Write from Sybase or Microsoft SQL Server | | Postgres FDW | Read Write from remote Postgres database | | Prometheus FDW | Read Write from Prometheus Metrics | | Redis FDW | Read Write from Redis | | ORB FDW | Read Write from ORB | | Parquet_S3 FDW | Read Write from Parquet Files | | Duckdb | Duckdb is an promising in-memory OLAP database | | Clickhouse | Read Write from Clickhouse | | Clerk FDW | Clerk is an AuthN/AuthZ user management platform | | OGR FDW | Read Write from Kafka | | Posgtres Log FDW | Read Write from Posgtres Logs | | Pgbouncer FDW | Read Write from Pgbouncer |
Importing from different data sources
In the age of large enterprises, thousands of Microservices, thousands of teams with apps that have different databases, it is sometimes necessary to move the data around.
In Postgres, there are FDW for many database systems :
Oracle FDW
TDS database (Sybase and Microsoft SQL server)
Mongo FD
Postgres FDW to import from remote Postgres server
MySQL FDW
JDBC FDW
CSV FDW using
file_ftw
Building blocks of FDW
Step 0: Installing the FDW extension
Step 1: Server
--- 1. Create Server
CREATE SERVER <custom_server_name> FOREIGN DATA WRAPPER <fdw_name>;
-- fdw_name is the name of the extenstion that you have installed
Step 2: User Mapping (Optional)
This is optional yet important thing to do when you are using FDWs. As for any database, you need to create a User Mapping to make a 2-way binding on how the Local Postgres Server's users can talk to Remote Postgres Server's users.
For File FDW, it is not required as the imprt method is either a local file or a file hosted on a public server.
But User Mapping for other use-cases is very important and to be done in a way that is secure.
Step 3: Foreign Table
CREATE FOREIGN TABLE <table_name>
(
--- Table Schema
id TEXT,
userId TEXT,
userName TEXT,
--- ...
)
SERVER <custom_server_name>
OPTIONS (
--- Different for each foreign data wrapper
);
File FDW
File FDW works by using COPY FROM format which is part of Postgres.
- Creating Server
--- 1. Create Server
CREATE SERVER fdw_files FOREIGN DATA WRAPPER file_fdw;
- Creating a Foreign Table
CREATE FOREIGN TABLE public.covid_staging
(
fips TEXT,admin2 TEXT,
province_state TEXT,country_region TEXT,
last_update TIMESTAMPTZ,
lat FLOAT,lon FLOAT,
confirmed INT,deaths INT,recovered INT,
active INT,combined_key TEXT
)
SERVER fdw_files
OPTIONS (
program 'wget -q -O - "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-26-2020.csv"',
-- filename for local filesystem. program for external file system
format 'csv',
header 'true'
-- Other options.
--- 1. delimiter
--- 2. quote
--- 3. escape
--- 4. null
--- 5. encoding
);
Here program
is used if you want to import from another system.
If you have a data from the local Postgres Server, you can use filename : <path-to-file>/mydata.csv
You can also set delimiter
, quote
, escape
, null
and encoding
for parsing the text input.
- Read the data that is imported
SELECT * FROM public.covid_staging;
The data can be seen by running the above SELECT
command:
Postgres FDW
This is trivial as most Postgres distributions have postgres_fdw
enabled. Have to follow the building blocks of FDW:
Creating a Foreign Server
Creating a User Mapping
Creating a Foreign Table
Non-Postgres FDW
MySQL FDW example
I am using Tembo database for installing extensions. It is very intuitive way of installing Postgres extensions. They have a GUI which internally uses Trunk (Rust-based Postgres Extension manager).
Similar to the building blocks of FDWs, we are:
Creating a Foreign Server
Creating a User Mapping
Creating a Foreign Table
-- 1. create Foreign Server
CREATE SERVER mysql_server
FOREIGN DATA WRAPPER mysql_fdw
OPTIONS (host '<my-host>', port '<port-number>');
-- 2. create User Mapping
CREATE USER MAPPING FOR postgres
SERVER mysql_server OPTIONS (username '<username>', password '<password>');
-- 3. create Foreign Table
CREATE FOREIGN TABLE warehouse
(
warehouse_id INT,
warehouse_name TEXT,
warehouse_created TIMESTAMP
)
SERVER mysql_server
OPTIONS (dbname 'defaultdb', table_name 'warehouse');
INSERT INTO warehouse values (6, 'UPS', current_date);
SELECT * FROM warehouse ORDER BY 1;
The result can be seen:
Tembo does not support
mysql_fdw
's Pushdown feature which was a recent addition.(Remote MySQL -> Postgres reading); hence cannot show Postgres screenshot.
Parquet FDW
A lot of dara pipelines have parquet files for efficient data storage. These can also be imported via the parquet_s3_fdw
Closing Thoughts
Depending on your scale, expertise and requirement, you can leverage Postgres as the source of truth from various data sources.
Keep in mind, that data impedence will still be present. And large-scale ETL might still be the way to go.
But it is always good to know what you get in the Postgres world.