Data Layer — 404inc

The data layer, at rest and in motion.

A snapshot from one of our production data planes. OLTP, OLAP, and event spine — instrumented end to end.

/ POSTGRES P95

12ms

▼ 8% week-over-week

/ CLICKHOUSE QPS

14.2k

▲ Sub-second on 4B rows

/ EVENT THROUGHPUT

47M/day

▲ Kafka · zero loss

/ VECTOR INDEX

89ms

k=10 · 8M embeddings

When we reach for which database.

Postgres is the default. Everything else needs to earn its way in. Below: how we score each candidate against the contract it's actually being asked to serve.

SYSTEM

WRITE PERF

READ PERF

OPS COST

HALF-LIFE

VERDICT

PostgreSQL

DEFAULT

ClickHouse

USE

pgvector

USE

Kafka

USE

NATS JetStream

USE

Redis

SHARP TOOL

MongoDB

RESCUE ONLY

Forty years of persistence eras.

Every five to seven years, the industry decides relational databases are obsolete. Every five to seven years, they aren't. Here's the path we've walked through it all.

1995 — 2008

The relational era

SQL was the entire conversation. Schemas were sacred, joins were expensive, and a corrupted table was a Tuesday. We learned normalization the hard way — and the lessons still hold.

MySQLPostgreSQLOracleSQL Server

2008 — 2014

The NoSQL reckoning

"Schema is dead." It wasn't. We watched teams ship Mongo as their primary store, then quietly migrate to Postgres two years later when consistency mattered. The ones who skipped this era are richer for it.

MongoDBCouchDBCassandraDynamoDB

2014 — 2020

The NewSQL rebound

Postgres came back, with extensions. JSONB gave us schema-on-read without losing ACID. Citus, Aurora, and CockroachDB taught us that "horizontally scalable SQL" was no longer a contradiction.

PostgreSQL+JSONBAuroraCockroachDBCitus

2020 — 2024

The HTAP & streaming era

OLTP and OLAP stopped pretending they lived in different buildings. Kafka became the spine. ClickHouse made sub-second analytics on billions of rows feel routine. Event sourcing finally earned its keep.

ClickHouseKafkaMaterializeDebezium

2024 — NOW

The vector-native era

Embeddings became a first-class storage primitive. We don't run a separate vector database — we run pgvector inside Postgres, alongside the relational data it semantically describes. One transaction, one source of truth.

pgvectorDuckDBIcebergLance

How we ship the data layer.

Four principles that have held since 2002, through every persistence trend that came and went.

/ 01

Schemas are contracts.

Every column has a type, a constraint, and a reason. Migrations are versioned, reviewed, and reversible. The "we'll just store JSON for now" decision becomes a year-long cleanup. We know because we've cleaned up other people's.

/ 02

One source of truth.

No data exists in two places without an explicit replication contract. Caches are derivations, never authority. When Postgres and Redis disagree, Postgres wins — and we audit why they diverged.

/ 03

Events before tables.

For domains where history matters, we model the event stream first and derive the table state from it. CQRS isn't dogma — it's how you build systems that can answer questions you didn't anticipate.

/ 04

Restore before backup.

Untested backups are folklore. We rehearse restores monthly, on real production snapshots, in real disaster scenarios. The first time you need it isn't the time to find out the WAL stream had a hole.

In the field: Optimus.

A multi-tenant operations platform with an event spine that didn't blink.

Optimus needed to ingest 47 million events per day across 180 tenants, surface sub-second analytics on any slice of that data, and never lose a single event — not during a deploy, not during a regional failover, not when a customer's webhook started replaying three months of backlog at 3am. The architecture had to be built to fail safely, not to never fail.

We separated concerns ruthlessly. Postgres held the tenant configuration and operational state — small tables, sharp indexes, every query under 50ms. ClickHouse handled the analytical surface — columnar storage, denormalized projections, sub-second queries on four billion rows. NATS JetStream sat between them as the event spine, with at-least-once delivery and consumer-driven offsets.

The platform has not lost a single event in three years of production. Not one.

When a tenant's webhook replayed three months of events overnight, the spine absorbed it without back-pressuring upstream services. The analytical surface caught up in 14 minutes. Nobody noticed.

/ DATA METRICS · OPTIMUS

EVENTS / DAY47M

TENANTS180

CLICKHOUSE ROWS4.2B

QUERY P99340ms

EVENT LOSS0

RESTORE TIME11min

Frameworks change. The schema is forever.