One of the unexpected gifts of taking a sabbatical is the headspace to chase problems that never quite make it to the top of your todo list. One of the more interesting challenges that popped up last year was data replication for our point-of-sale solution.
Our point-of-sale solution currently relies on MS SQL Express as its database engine and Microsoft Message Queuing (MSMQ) for data transport, with built-in SQL Server replication features and a wrapper application for monitoring.
While this approach is battle-tested and has been robust, it comes with challenges around vendor lock-in, costs once your database grows - before SQL Server 2025, database sizes are limited to 10GB on SQL Express and would require a full license to SQL Server for larger DBs.
Additionally, MSMQ has not received any meaningful support and has not been actively maintained in over a decade while the message broker space has boomed. While MSMQ is not going away, I’m cautious to build new integrations on it, and ultimately would want to move away from a Windows-heavy stack to something that’s unix-based.
The sabbatical gave me the space to actually prototype an alternative. The brief was straightforward:
- Build a replication layer where both data and schema changes replicate from a main node to N worker nodes.
- Workers should also publish data back to main (transactions, sales, etc.), and those changes should propagate to other workers in the replication lifecycle.
- Bringing up a new worker should be simple. Restore from a snapshot of main not older than X days, then catch up on changes via the message broker.
- Workers should be able to report their DB integrity status against main.
- The replication layer must be entirely separate from the application layer. If the application needs to issue commands, it should do so via an SDK talking to the replication layer.
- Prefer open-source database engines with strong WAL support - Postgres, MySQL, MariaDB or even SQLite.
- For data transport, prefer something reliable but lightweight. We use Kafka internally, but managing thousands of consumers and producers in a highly distributed context felt like overkill. NATS JetStream seemed like a better fit.
An Aside: Agentic Coding
For this project I made use of OpenCode with Claude Opus 4.5 as my main model. I generally have a plan-then-build approach, where every change step is recorded in a changeset directory.
My general structure is:
- docs/
|- PROJECT.md
|- changesets/
|-|- 000-foundation-setup.md
|-|- 001-bidirectional-replication.md
- cmd/main.go
- ...
- AGENTS.mdWhere the AGENTS.md is pretty straight-forward
# Data Replication
- Refer to `docs/PROJECT.md` for the project summary of what we’re trying to achieve and technology choices.
- When planning, make sure that you write up your plan in the `docs/changesets/` folder as `000-<plan-name>.md` where the number in front increments, and once you’ve executed it (before committing), append to your changeset doc what you actually executed on.
- Always make sure you’re checking that we’re implementing idiomatic Go
- If you need external resources (Redis, Postgres NATS etc), use Docker ComposeI’ve found this approach pretty useful for storing context on why a change was made, and I’ve seen OpenCode reference older changesets while thinking.
I’ll continue to refine this approach and add more files to both my docs directory and just general skills directory for reference. While this works fine for a single-person project, I’m interested to see if it solves the context challenges that some AI-enabled teams have.
Technology Choices
I settled on Go, Postgres 16 and NATS JetStream for the PoC. Go felt like the natural choice for this type of systems-level work and it’s a language I feel comfortable in, and the ecosystem for Postgres and NATS clients is mature. Postgres was chosen for its logical replication support via pgoutput, which is built into the engine. NATS JetStream gives us durable consumers, explicit acknowledgement and configurable message retention without the overhead of a solution like Kafka (I’ve also been wanting to use NATS for quite a while).
The whole thing runs as a single binary on each node. Config determines the node’s role: which tables it publishes via WAL and which it subscribes to via NATS. Locally, Docker Compose spins up three Postgres instances and a NATS server to simulate a main node and two workers. For the PoC a config.yaml file is fine, but a production deployment would need a better approach for defining table publishing and subscription at scale.
One-Way Replication
The first milestone was relatively straightforward: replicate a posts table from main to workers. The replicator on the main node connects to Postgres via the replication protocol using pglogrepl, decodes WAL messages into change events (insert, update, delete), serialises them to JSON and publishes to a NATS JetStream stream. In a production environment I’d probably choose Protobuf over JSON, but JSON works fine for a PoC.
The worker’s replicator subscribes to the stream and applies each change to its local Postgres using an upsert (INSERT ... ON CONFLICT DO UPDATE)
I did play around with the idea of looking at something out-the-box for change events (like Debezium), but I knew that I would need to have a custom binary for the distribution of the change events and migrations, so I opted for using the WAL messages directly.
This phase of the project was completed in an afternoon and worked just as expected. A post created on main appeared on both workers within seconds. The application on each node just writes to Postgres as normal, and the replicator handles everything else as a sidecar process.
The Bidirectional Loop
Once I moved onto workers publishing changes, things got trickier. Workers produce data (e.g. comments) that needs to replicate back to main and to other workers. The initial approach was to run WAL listeners on both sides - main publishes posts, workers publish comments, everyone subscribes to what they don’t own.
Some readers can likely see where this failed - when worker-1 creates a comment, its WAL listener publishes to NATS, main received the event and applies it locally, its WAL listener publishes to NATS and you end up in an infinite loop.
After a bit of back-and-forth with Claude, I realised that I was over-complicating the worker implementation and that it likely wasn’t a good idea to have a WAL implementation on the ‘edge’. I ended up moving to an outbox pattern producing to NATS, which seems like a more reasonable approach.
Outbox pattern
In this implementation, the main-to-worker stays WAL-based, while workers write to an _outbox table alongside their data writes, ideally in the same transaction. The replicator polls this table on a short interval, publishes each row to NATS and deletes it after successful delivery.
In a later iteration I might consider making outbox table writes part of the SDK allowing the application to interact with the replicator binary around the outbox and message delivery.
Load Testing Bugs
The load test pushed around 10,000 inserts per second into the main database for 20 seconds - 195,672 posts created on main. When I checked the workers I only received 184,963.
NATS confirmed that every message had been delivered with no lag and at the correct offset. The issue at hand was my deduplication logic.
I’d implemented LSN-based deduplication - track the highest Log Sequence Number you’ve applied per table, skip anything with a lower LSN on the assumption it’s a duplicate. This works if messages arrive in order. Under high concurrency, NATS JetStream doesn’t guarantee ordering within a subject. So if a message with LSN 300 arrives before one with LSN 100, the applier processes 300, sets it as the high watermark, and then quietly drops 100 when it arrives a moment later.
To solve this I dropped the LSN approach entirely and opted for the upsert pattern instead (ON CONFLICT DO UPDATE) which already provides idempotency - duplicates just overwrite the same data.
Re-running the load test confirmed that this resolved the issue, with an integration test thrown in to make sure.
Schema replication
One thing that caught me off guard was that Postgres logical replication only captures DML - inserts, updates, deletes. It does not capture DDL. When main runs a migration to add a column or create an index, workers have no idea.
The solution was to build a migration system on top of the existing NATS infrastructure. Running replicate migrate up --publish applies the migration locally and publishes it as a special event to the repl.schema.changes subject. Workers pick these up via their existing NATS subscription, verify the checksum, check ordering (version N requires version N-1 to be applied first), and apply the DDL. If the predecessor migration hasn’t been applied yet, the message gets NAK’d and retried with backoff until it has.
Workers also catch up on all pending migration events before they start any data subscribers. If catch-up fails within a configurable timeout, the worker refuses to start entirely. This prevents data from being applied to an out-of-date schema.
This schema replication strategy combined with a strictly controlled main and worker application deployment playbook should allow workers to receive schema migrations well before receiving messages with the new schema.
Snapshots and Operational Tooling
Bringing up a new worker needed to be a straightforward process. The CLI supports replicate snapshot which runs pg_dump on the main node and captures the current NATS stream sequence positions alongside the data. replicate restore loads the snapshot onto a fresh worker and writes those sequences to a state file. When the replicator starts, it creates NATS consumers that begin from those saved sequences - so the worker catches up on changes since the snapshot rather than replaying the entire stream history.
There’s also a replicate status command that shows database and NATS connectivity, consumer lag per table and the last applied timestamp - useful for quickly checking whether a node is healthy and caught up.
A small but satisfying addition was state reconciliation. When config changes - say you stop publishing a table - the replicator diffs its previous state file against the current config on startup and cleans up orphaned Postgres publications, replication slots and NATS consumers. This might be a bit over-engineered for a PoC, but would definitely be needed in a production environment.
Where to go from here
The PoC went further than I initially expected. Over 12 changesets it grew from basic one-way replication to something that handles bidirectional sync, schema migrations, snapshot restore, state management and structured logging. There’s the start of a Go SDK for publishing migrations programmatically, and the integration tests use testcontainers-go so they’re fully self-contained.
Under load it sustains around 9,800 writes per second on the publishing side, with the apply side (single-threaded upserts) being the bottleneck at around 4,900 ops/sec. There’s room for improvement there with batched transactions and parallel appliers, but for a PoC it validates the approach.
The bigger items still on the table are stream-per-table isolation (currently all tables share a single NATS stream), subject partitioning for high-volume tables, consumer grouping for distributed deployments and implementing world-class observability. These are the kinds of problems you’d solve when moving from PoC to production, and they’re well-documented trade-offs rather than unknowns.
The project scratched an itch I had during my time off, and I hope I get to explore more of this once I’m back working on our next-gen Point-of-Sale solution at Bash/TFG.