7 Databases in 7 Weeks for 2025

2024-11-30

I’ve been running databases-as-a-service for a long time, and there are always new things to keep abreast of - new technologies, different ways of solving problems, not to mention all the research coming out of universities. In 2025, consider spending a week with each of these database technologies.

A line drawing of a bookshelf, with the books labelled for each database covered - PostgreSQL, SQLite, DuckDB, ClickHouse, FoundationDB, TigerBeetle and CockroachDB

Preamble

These aren’t the “7 Best Databases” or something similar to power a Buzzfeed listicle - these are just 7 databases that I think are worth your time to really look into for a week or so. You might ask something like “why not Neo4j or MongoDB or MySQL/Vitess or <insert other db here>” - the answer is mostly that I don’t find them interesting. I’m also not covering Kafka or other similar streaming data services - definitely worth your time, but not covered.

Table of Contents

  1. PostgreSQL
  2. SQLite
  3. DuckDB
  4. ClickHouse
  5. FoundationDB
  6. TigerBeetle
  7. CockroachDB

1. PostgreSQL

The Default Database

“Just use Postgres” is basically a meme at this point, and for good reason. PostgreSQL is the pinnacle of boring technology, and should be the database you reach for when you need a client-server model for your database. ACID compliant, plenty of interesting tricks for replication - both physical and logical - and incredibly well supported across all the major vendors.

My favourite feature of Postgres, however, are extensions. This is where I feel Postgres really comes alive in a way that few other databases can. There are extensions for almost everything you could want - AGE enables graph data structures and the user of the Cypher query language, TimescaleDB enables time-series workloads, Hydra Columnar provides an alternate columnar storage engine, and so on. I’ve written about writing an extension relatively recently if you’d like to give it a go yourself.

Postgres shines as a great “default” database for that reason, and we’re seeing even more non-Postgres services rely on the Postgres wire protocol as a general-purpose Layer 7 protocol to provide client compatibility. With a rich ecosystem, sensible default behaviour and that it can even be fit into a Wasm install makes it a database worth understanding.

Spend a week learning about whats possible with Postgres, but also some of its limitations - MVCC can be fickle. Implement a simple CRUD app in your favourite language. Maybe even build a Postgres extension.

2. SQLite

The Local-First Database

Moving on from a client-server model, we take a detour into “embedded” databases, starting with SQLite. I’ve termed this the “local-first” database, where the SQLite database is directly co-located with the application. One of the more famous examples of this usage is WhatsApp, which stored chats as local SQLite databases on the device being used. Signal also does the same thing.

Beyond that, we’re starting to see more creative uses of SQLite rather than “just” a local ACID-compliant database. With the advent of tools like Litestream enabling streaming backups and LiteFS to provide distributed access, we can devise more interesting topologies. Extensions like CR-SQLite allow the use of CRDTs to avoid needing conflict resolution when merging changesets, as used in Corrosion.

SQLite has also had a small resurgence thanks to Ruby on Rails 8.0 - 37signals has gone all in on SQLite, building a bunch of Rails modules like Solid Queue and configuring Rails to manipulate multiple SQLite databases via database.yml for this purpose. Bluesky uses SQLite for the Personal Data Servers - every user has their own SQLite database.

Spend a week experimenting with local-first architectures using SQLite, or even seeing if you can migrate a client-server model using Postgres to something that “just” needs SQLite instead.

3. DuckDB

The Query-Anything Database

Onto the next embedded database, we have DuckDB. Much like SQLite, DuckDB is intended to be an in-process database system, but more focused on online analytical processing (OLAP) versus online transaction processing (OLTP).

Where DuckDB shines is its use as a “query-anything” database, using SQL as its dialect of choice. It can natively pull data into its engine from CSVs, TSVs, JSON etc, but also formats like Parquet - just check out the list of data sources. This gives it extreme flexibility - check out this example of querying the Bluesky firehose.

Much like Postgres, DuckDB also has extensions, though not quite as rich an ecosystem - DuckDB is much younger, after all. Many contributed by the community can be found on the list of community extensions, though a particular favourite of mine is gsheets.

Spend a week doing some data analysis and processing with DuckDB - be it via a Python notebook or something like Evidence, maybe even see how it fits in with your “local-first” approach with SQLite by offloading analytics queries of your SQLite database to DuckDB, which can read it.

4. ClickHouse

The Columnar Database

Leaving the embedded database sphere, but sticking with the analytics theme, we come to ClickHouse. If I had to only pick two databases to deal with, I’d be quite happy with just Postgres and ClickHouse - the former for OLTP, the latter for OLAP.

ClickHouse specialises in analytics workloads, and can support very high ingest rates through horizontal scaling and sharded storage. It also supports tiered storage, allowing you to split “hot” and “cold” data - GitLab have a pretty thorough doc on this.

Where ClickHouse comes into its own is when you have analytics queries to run on a dataset too big for something like DuckDB, or you need “real-time” analytics. There is a lot of “benchmarketing” around these datasets, so I’m not going to repeat them here.

Another reason I suggest checking out ClickHouse is that it is a joy to operate - deployment, scaling, backups and so on are well documented - even down to setting the right CPU governor is covered.

Spend a week exploring some larger analytics datasets, or converting some of the DuckDB analytics from above into a ClickHouse deployment. ClickHouse also has an embedded version - chDB - that can offer a more direct comparison.

5. FoundationDB

The Layered Database

We now enter the “mind expanding” section of this list, with FoundationDB. Arguably, FoundationDB is not a database, but quite literally the foundation for a database. Used in production by Apple, Snowflake and Tigris Data, FoundationDB is worth your time because it is quite unique in the world of key-value storage.

Yes, it’s an ordered key-value store, but that isn’t what is interesting about it. At first glance, it has some curious limitations - transactions cannot exceed 10MB of affected data and they cannot take longer than five seconds after the first read in a transaction. But, as they say, limits set us free. By having these limits, it can achieve full ACID transactions at very large scale - 100+ TiB clusters are known to be in operation.

FoundationDB is architected for specific workloads and extensively tested using simulation testing, which has been picked up by other technologies, including another database on this list and Antithesis, founded by some ex-FoundationDB folks. For more notes on this, check out Tyler Neely’s and Phil Eaton’s notes on the topic.

As mentioned, FoundationDB has some very specific semantics that take some getting used to - their Anti-Features and Features docs are worth familiarising yourself with to understand the problems they are looking to solve.

But why is it the “layered” database? This is because of the Layers concept. Instead of tying the storage engine to the data model, instead the storage is flexible enough to be remapped across different layers. Tigris Data have a great post about building such a layer, and there are some examples such as a Record layer and a Document layer from the FoundationDB org.

Spend a week going through the tutorials and think about how you could use FoundationDB in place of something like RocksDB. Maybe check out some of the Design Recipes and go read the paper.

6. TigerBeetle

The Obsessively Correct Database

Flowing on from the deterministic simulation testing, TigerBeetle breaks the mold from our previous databases in that it is decidedly not a general purpose database - it is entirely dedicated to financial transactions.

Why is this worth a look? Single-purpose databases are unusual, and one that is as obsessively correct as TigerBeetle are a true rarity, especially considering it is open source. They include everything from NASA’s Power of Ten Rules and Protocol-Aware Recovery, through to strict serialisability and Direct I/O to avoid issues with the kernel page cache. It is seriously impressive - just go read their Safety doc and their approach to programming they call Tiger Style.

Another interesting point about TigerBeetle is that it’s written in Zig - a relative newcomer to the systems programming language school, but clearly has fit well with what the TigerBeetle folks are trying to accomplish.

Spend a week modelling your financial accounts in a local deployment of TigerBeetle - follow the Quick Start and take a look at the System Architecture docs on how you might use it in conjunction with one of the more general-purpose databases above.

7. CockroachDB

The Global Database

Finally, we come full circle. I struggled a little on what to put here in the last slot. Thoughts originally went to Valkey, but FoundationDB scratched the key-value itch. I thought about graph databases, or something like ScyllaDB or Cassandra. I thought about DynamoDB, but not being able to run it locally/for free put me off.

In the end, I decided to close on a globally distributed database - CockroachDB. It’s Postgres wire-protocol compatible, and inherits some of the more interesting features discussed above - large horizontal scaling, strong consistency - and has some interesting features of its own.

CockroachDB enables scaling a database across multiple geographies through being based on Google’s Spanner system, which relies on atomic and GPS clocks for extremely accurate time synchronisation. Commodity hardware, however, doesn’t have such luxuries, so CockroachDB has some clever solutions where reads are retried or delayed to account for clock sync delay with NTP, and nodes also compare clock drift amongst themselves and terminate members if they exceed the maximum offset.

Another interesting feature of CockroachDB is how multi-region configurations are used, including table localities, where there are different options depending on the read/write tradeoffs you want to make.

Spend a week re-implementing the the movr example in a language and framework of your choice.

Wrap Up

We’ve explored a bunch of different databases, all used in production by some of the largest companies on the planet, and hopefully this will have exposed you to some technologies you weren’t familiar with before. Take this knowledge with you as you look to solve interesting problems.