404inc Stack Data Layer

Frameworks change. The schema is forever.

The compute layer is replaceable. The interface is rented. The data model is the only thing that's truly yours — and the one decision you can't undo with a refactor sprint. We treat persistence as the most important conversation on day one of every engagement.

postgres p95 12ms clickhouse qps 14.2k kafka lag 0.3s vector dims 1536 replication lag 11ms events/day 47M tables tracked 312 backup verified 14m ago postgres p95 12ms clickhouse qps 14.2k kafka lag 0.3s vector dims 1536 replication lag 11ms events/day 47M tables tracked 312 backup verified 14m ago

The data layer, at rest and in motion.

A snapshot from one of our production data planes. OLTP, OLAP, and event spine — instrumented end to end.

/ POSTGRES P95
12ms
▼ 8% week-over-week
/ CLICKHOUSE QPS
14.2k
▲ Sub-second on 4B rows
/ EVENT THROUGHPUT
47M/day
▲ Kafka · zero loss
/ VECTOR INDEX
89ms
k=10 · 8M embeddings

When we reach for which database.

Postgres is the default. Everything else needs to earn its way in. Below: how we score each candidate against the contract it's actually being asked to serve.

SYSTEM
WRITE PERF
READ PERF
OPS COST
HALF-LIFE
VERDICT
PostgreSQL
DEFAULT
ClickHouse
USE
pgvector
USE
Kafka
USE
NATS JetStream
USE
Redis
SHARP TOOL
MongoDB
RESCUE ONLY

Forty years of persistence eras.

Every five to seven years, the industry decides relational databases are obsolete. Every five to seven years, they aren't. Here's the path we've walked through it all.

1995 — 2008
The relational era
SQL was the entire conversation. Schemas were sacred, joins were expensive, and a corrupted table was a Tuesday. We learned normalization the hard way — and the lessons still hold.
MySQLPostgreSQLOracleSQL Server
2008 — 2014
The NoSQL reckoning
"Schema is dead." It wasn't. We watched teams ship Mongo as their primary store, then quietly migrate to Postgres two years later when consistency mattered. The ones who skipped this era are richer for it.
MongoDBCouchDBCassandraDynamoDB
2014 — 2020
The NewSQL rebound
Postgres came back, with extensions. JSONB gave us schema-on-read without losing ACID. Citus, Aurora, and CockroachDB taught us that "horizontally scalable SQL" was no longer a contradiction.
PostgreSQL+JSONBAuroraCockroachDBCitus
2020 — 2024
The HTAP & streaming era
OLTP and OLAP stopped pretending they lived in different buildings. Kafka became the spine. ClickHouse made sub-second analytics on billions of rows feel routine. Event sourcing finally earned its keep.
ClickHouseKafkaMaterializeDebezium
2024 — NOW
The vector-native era
Embeddings became a first-class storage primitive. We don't run a separate vector database — we run pgvector inside Postgres, alongside the relational data it semantically describes. One transaction, one source of truth.
pgvectorDuckDBIcebergLance

Anatomy of a well-modeled query.

A 24-hour heatmap of query latency across one of our production Postgres clusters. Every column is an hour. Every row is a percentile. Cool means fast.

p99 p95 p90 p75 p50 p25 00 06 12 18 24 cooler · faster warmer · slower ▸ peak load 14:00 — 18:00 UTC

How we ship the data layer.

Four principles that have held since 2002, through every persistence trend that came and went.

/ 01
Schemas are contracts.
Every column has a type, a constraint, and a reason. Migrations are versioned, reviewed, and reversible. The "we'll just store JSON for now" decision becomes a year-long cleanup. We know because we've cleaned up other people's.
/ 02
One source of truth.
No data exists in two places without an explicit replication contract. Caches are derivations, never authority. When Postgres and Redis disagree, Postgres wins — and we audit why they diverged.
/ 03
Events before tables.
For domains where history matters, we model the event stream first and derive the table state from it. CQRS isn't dogma — it's how you build systems that can answer questions you didn't anticipate.
/ 04
Restore before backup.
Untested backups are folklore. We rehearse restores monthly, on real production snapshots, in real disaster scenarios. The first time you need it isn't the time to find out the WAL stream had a hole.

In the field: Optimus.

A multi-tenant operations platform with an event spine that didn't blink.

Optimus needed to ingest 47 million events per day across 180 tenants, surface sub-second analytics on any slice of that data, and never lose a single event — not during a deploy, not during a regional failover, not when a customer's webhook started replaying three months of backlog at 3am. The architecture had to be built to fail safely, not to never fail.

We separated concerns ruthlessly. Postgres held the tenant configuration and operational state — small tables, sharp indexes, every query under 50ms. ClickHouse handled the analytical surface — columnar storage, denormalized projections, sub-second queries on four billion rows. NATS JetStream sat between them as the event spine, with at-least-once delivery and consumer-driven offsets.

The platform has not lost a single event in three years of production. Not one.

When a tenant's webhook replayed three months of events overnight, the spine absorbed it without back-pressuring upstream services. The analytical surface caught up in 14 minutes. Nobody noticed.

/ DATA METRICS · OPTIMUS

EVENTS / DAY47M
TENANTS180
CLICKHOUSE ROWS4.2B
QUERY P99340ms
EVENT LOSS0
RESTORE TIME11min

Architect a schema that lasts.

If you're modeling new data — or trying to dig out from a model that's gone wrong — this is where we start.

Architect with us → View other layers