🔂 CDC in PostgreSQL: how does pglogical works?

This article will go through an experimental repo to simulate Change Data Capture (CDC) from PostgreSQL to PostgreSQL using pglogical extension.

Nov 10, 2024

There is another version of this post for native Logical Replication in PostgreSQL that I would suggest check beforehand:

🔂 CDC in PostgreSQL: how does built-in Logical Replication work?

Alejandro Aboy

October 27, 2024

Read full story

This repo has been adapted with some improvements to handle some edge cases and enhance the replication process!

👨🏻‍💻 The environment

I’ve created a repo to help me speed up my testing.

👉 Github Repo: 🔃 Change Data Capture (CDC) using pglogical in PostgreSQL

In this context, I’ve enabled 2 PostgreSQL services with Docker to simulate CDC with a fake data generator I’ve adapted for this scenario.

With some simple commands, you can simulate a bunch of scenarios such as INSERT, DELETE, UPDATE, DROP, TRUNCATE.

Using Dbeaver or psql in your terminal can help you validate the changes afterwards.

By default, the repo will create the 2 PostgreSQL instances with everything you need.
For this to work, we need:
pglogical extension.
A Publisher node (Source)
A Subscriber node (Target)
A Replication set for the tables to replication.
A Subscription to the Publisher.
Tables with primary keys.
PostgreSQL config with wal_level = ‘logical’

Also, this repo has some adjustments as backup, since these actions are not executed by default and would break the replication process:

Scripts will add/remove columns to tables on source and target databases.
Scripts will add new tables to the replication and to both databses.
Scripts will remove tables from the replication set and drop tables on both databases.

🔗 Starting the replication

To get started, you can run:

make build
NUM_RECORDS=10000 make run

This will spin up both PostgreSQL services, and create source tables with 10K rows of fake data. Then, you run:

make cdc-pglogical

Source (a.k.a provider/publisher):

Create pglogical extension.
Create a publisher node.
Add tables to the replication set.
By default, it will create a replication slot.

Target (a.k.a subscriber):

Create pglogical extension.
Create a subscription to the publisher.

🔂 Sending changes

You can start sending changes, these commands will apply INSERT, DELETE, TRUNCATE or UPDATE on a single random table. You can decide on how many rows you apply the changes:

NUM_RECORDS=5 make insert-data 
NUM_RECORDS=5 make delete-data
NUM_RECORDS=5 make update-data
make truncate

This is the same as doing (e.g. products table):

👀 Try breaking the replication!

If it wasn’t for the improvements to the scripts, all these upcoming actions would break the pglogical replication.

Good thing about to pglogical compared to native PostgreSQL replication is that replication sets can cover entire schemas.

If you add a new table for an already included schema, it will add it automatically, but you will still need to create it on the target database.

If you need to drop a table, you need to remove it from the replication and also drop on the target database.

This edge cases are covered by the scripts:

You can test this scenario by running these commands in the repo. You will notice tables are created and drop in both databases:

make create-table 
make drop-table

Translates as (e.g. products table):

Schema changes is another scenario, you can also test it:

make add-column
make drop-column

Translates as (e.g. products table):

The enhanced scripts are applying changes in both databases, thing that wouldn’t happen by default.

As always, you can run these commands to validate all changes:

📝 Conclusions & Notes

Be default, pglogical will copy an initial snapshot when enabled.
You need PRIMARY KEYs on the TABLES you want to replicate. Not VIEWS.
This feature doesn’t replicate sequences, DDLs or schemas.
Schema changes are ignored. Adding or dropping columns will do nothing.
If you have set pglogical.replication_set_add_all_tables('default', ARRAY['public']); and run CREATE TABLE, you need to also add it on the target database, otherwise it will break the replication.
The previous point works similarly to the DROP TABLE, you need to remove it from the replication set and then drop it on both sides.

If you enjoyed the content, hit the like ❤️ button, share, comment, repost, and all those nice things people do when like stuff these days. Glad to know you made it to this part!

Hi, I am Alejandro Aboy. I am currently working as a Data Engineer. I started in digital marketing at 19. I gained experience in website tracking, advertising, and analytics. I also founded my agency. In 2021, I found my passion for data engineering. So, I shifted my career focus, despite lacking a CS degree. I'm now pursuing this path, leveraging my diverse experience and willingness to learn.

Follow me on Linkedin

The Pipe & The Line

🔂 CDC in PostgreSQL: how does built-in Logical Replication work?

Discussion about this post