Databases (12 blogmarks)

← Blogmarks

Free, Open-Source Online database diagram editor and SQL generator

https://www.drawdb.app/

When you search for a SQL / Database diagramming tool, most results are for products with a shallow free tier that then leads to a steep per-month subscription fee. I knew I had seen a free and open-source database diagramming tool at some point. After a bit more searching I came back across DrawDB.

(All) Databases Are Just Files. Postgres Too

https://tselai.com/all-databases-are-just-files

This is an intriguing simplification of how to think about what Postgres is and what it does:

At its core, postgres is simply a program that turns SQL queries into filesystem operations. A CREATE TABLE becomes a mkdir. An UPDATE eventually becomes: open a file, write to it, close it. It’s a complex and powerful system—but fundamentally, it’s just an executable that manipulates files.

Making the case for lifting the “veil of ignorance” and building up a mental model for how Postgres works:

Understanding how PostgreSQL actually works—how it’s started, where the files live, what the key binaries do—puts real power in your hands. It removes the mystery. It means you’re no longer just reacting when something breaks; you’re diagnosing, tweaking, even optimizing.

Hermitage (GitHub) — Test Suite for DB Isolation Levels

https://github.com/ept/hermitage

Isolation looks a little different in every database system. This is a test suite for many popular databases to demonstrate what the mean by each of their isolation levels.

Isolation is the I in ACID, and it describes how a database protects an application from concurrency problems (race conditions). If you read a traditional database theory textbook, it will tell you that isolation is supposed to mean serializability, i.e. you can pretend that transactions are executed one after another, and concurrency problems do not happen. However, if you look at the implementations of isolation in practice, you see that serializability is rarely used, and some popular databases (such as Oracle) don't even implement it.

Build Your Own Database From Scratch in Go

https://build-your-own.org/database/

Someone shared their Go-based database implementation on Reddit and mentioned that they used this book as a guide.

Understand databases from the bottom up by building your own, in small steps, and with simple Golang code.
- Start with a B+tree, the data structure for querying and manipulating the data.
- Make it durable, that’s what makes a DB different from a file.
- Relational DB with concurrent transactions on top of the copy-on-write B+tree KV.
- A SQL-like query language, the finishing touch.

Comparison of the transaction systems of Oracle and PostgreSQL

https://www.cybertec-postgresql.com/en/comparison-of-the-transaction-systems-of-oracle-and-postgresql/

Nice example of when deferrable constraints matter in PostgreSQL.

CREATE TABLE tab (id numeric PRIMARY KEY);

INSERT INTO tab (id) VALUES (1);
INSERT INTO tab (id) VALUES (2);

UPDATE tab SET id = id + 1;
ERROR:  duplicate key value violates unique constraint "tab_pkey"
DETAIL:  Key (id)=(2) already exists.

The reason is that PostgreSQL (in violation of the SQL standard) checks the constraint after each row, while Oracle checks it at the end of the statement. To make PostgreSQL behave the same as Oracle, create the constraint as DEFERRABLE. Then PostgreSQL will check it at the end of the statement.

From web developer to database developer in 10 years

http://notes.eatonphil.com/2025-02-15-from-web-developer-to-database-developer-in-10-years.html

A one-year retrospective of Phil Eaton’s time at EnterpriseDB and the way he made his own path into database development to get there.

End of the road for PostgreSQL streaming replication?

https://www.cybertec-postgresql.com/en/end-of-the-road-for-postgresql-streaming-replication/

These kinds of performance / stress tests of PostgreSQL are always fascinating to me. There are many factors you need to consider about the machine’s stats, you have to set up multiple databases, maybe multiple clusters, and then after running a tool like pgbench, you have to make sure you’ve observed and captured useful data that you can draw conclusions from.

To try to quantify the crossover point, I ran a small test. I initialized a 8GB pgbench database that fits into shared buffers, then I set up WAL archiving and took a backup. Next, I ran 30min of pgbench with synchronous_commit=off to generate some WAL. On a 24 thread workstation, this generated 70GB of WAL containing 66M pgbench transactions, with an average speed of 36.6k tps. Finally, I configured Postgres to run recovery on the backup. This recovery was able to complete in 372 seconds, or 177k tps.

Here are some notes from AWS on benchmarking Postgres with pgbench: https://docs.aws.amazon.com/whitepapers/latest/optimizing-postgresql-on-ec2-using-ebs/postgresql-benchmark-observations-and-considerations.html

Learn Postgres at the Playground

https://www.crunchydata.com/blog/learn-postgres-at-the-playground

The folks at CrunchyData created a playground for playing around with and learning PostgreSQL... right in the browser.

SQL Tricks for More Effective CRUD is one of dozens of the tutorials they have since published.

This is made possible via WASM. Crazy Idea to Postgres in the Browser goes into more details about how they pulled it off.

It seems like there is a lot of energy moving in this direction. When I search "postgres wasm", several of the results are about PGlite from ElectricSQL (which is building a sync engine, a play in the local-first space).

supabase released what they are calling database.build that connects you to a Postgres database in the browser and gives you AI tools for interacting with that database.

the in-browser Postgres sandbox with AI assistance. With database.build, you can instantly spin up an unlimited number of Postgres databases that run directly in your browser (and soon, deploy them to S3). Each database is paired with a large language model (LLM)...

The secret to perfectly calculate Rails database connection pool size

https://island94.org/2024/09/secret-to-rails-database-connection-pool-size

tl;dr: don't compute the pool size, set it to a big number, the pool size is the max that Rails enforces, but the thing that matters is the number of connections available at the database (which is a separate issue).

If, rather, you're running out of connections at the database, then try things like:
- reduce the number of Puma threads
- reduce background job threads (e.g. via GoodJob, Solid Queue, etc.)
- "Configure anything else using a background thread making database queries"
- among others

Or increase the number of connections available at the database with a tool like PgBouncer.

This post was written by the person who created GoodJob.

Data migrations with the `maintenance_tasks` gem

https://railsatscale.com/2023-01-04-how-we-scaled-maintenance-tasks-to-shopify-s-core-monolith/article.html

The maintenance_tasks gem from Shopify is a mountable Rails engine for running one-off data migrations from a UI that is separate from the schema migration lifecycle.

In the past, I've used the after_party gem for this use case. That gem runs data migrations tasks typically as part of a post-deploy process with the same up/down distinction as schema migrations.

It seems the big difference with maintenance_tasks is that they are managed from a UI and that there are many more features, such as batch, pausing, rerunning, etc. You can observe the progress of these tasks from the UI as well.

There is a Go Rails Episode about how to use the maintenance_tasks gem.

The Slotted Counter Pattern — PlanetScale

https://planetscale.com/blog/the-slotted-counter-pattern

To avoid contention on a single high-write counter column, you can create a generic slotted counter table with polymorphic associations to anything that needs counting. Then to update the count, you increment the count in one of one hundred random slots. You can either sum the counts for a specific record and type to get the count, or you can have a process to roll up the count periodically and store it nearer to the original record.

I wonder what heuristics you could use to scale the number of slots you use for a given entity. That way for a relatively low-update entity, you spread the counts over, say, 3 counter slots. And with a very high-update entity, you spread it across, say, 50 or 100 slots.

Rails Database Migrations Best Practices

https://www.fastruby.io/blog/db-migrations-best-practices.html

Meant to be deleted

I love this idea for a custom rake task (rails db:migrate:archive) to occasionally archive past migration files.

# lib/tasks/migration_archive.rake
namespace :db do
  namespace :migrate do
    desc 'Archives old DB migration files'
    task :archive do
      sh 'mkdir -p db/migrate/archive'
      sh 'mv db/migrate/*.rb db/migrate/archive'
    end
  end
end

That way you still have access to them as development artifacts. Meanwhile you remove the migration clutter and communicate a reliance on the schema file for standing up fresh database instances (in dev/test/staging).

Data migrations

They don't go into much detail about data migrations. It's hard to prescribe a one-size-fits-all because sometimes the easiest thing to do is embed a bit of data manipulation in a standard schema migration, sometimes you want to manually run a SQL file against each database, or maybe you want to set up a process for these changes with a tool like the after_party gem.

Reversible migrations

For standard migrations, it is great to rely on the change method to ensure migrations are reversible. It's important to recognize what kinds of migrations are and aren't reversible. Sometimes we need to write some raw SQL and for that we are going to want up and down methods.