PostgreSQL (29 blogmarks)

← Blogmarks

Life Altering Postgresql Patterns

https://mccue.dev/pages/3-11-25-life-altering-postgresql-patterns

Use UUID Primary Keys

I don’t have strong opinions on this one. Most of the systems I work on would do just as well with bigints as UUIDs. Now that v7 UUIDs are generally available, there is an even stronger argument to go that direction since they are sortable.

Give everything createdat and updatedat

I absolutely agree with this. These columns should be used not as app logic (except sorting), but for auditing and investigation purposes. I’ve always needed these at one point or another to look into some production data issue or support request.

on update restrict on delete restrict

It’s good to have default practices for this kind of thing, but absolutes aren’t that useful. Sometimes you want a cascade and as long as you’ve thought through the details, that may be the right choice.

Use schemas

The default public schema is great. Using alternative schemas usually means having to fight conventions in web frameworks like Rails. It might be worth it, but know that you’ll generally be cutting against the grain if you go this direction.

Enum Tables

Enum tables are a great pattern, but I would caution against using a text value as the primary key. Use some size of bit value that won’t change even if the text value does. That way you don’t have to make a bunch of updates in referencing tables.

Name your tables singularly

I see no benefit to singular naming. I prefer plural naming because of my time using Rails which establishes that as a convention. Whichever direction you go, make sure to be consistent.

Mechanically name join tables

I think this is a good practice to follow. Migration tools like Rails’ ActiveRecord often do this in a deterministic way for you. Occasionally it is nice to have a perfectly named join table based on domain, but you inevitably falter when trying to later remember what tables it joins. It’s a small thing, but consistency generally wins.

Almost always soft delete

This used to be my stance for any core application tables. However since listening to the Soft Deletes episode of Postgres.fm, I’ve realized it’s way more nuanced than I thought and you often don’t want soft deletes. It’s worth a listen.

Catbench Vector Search App has Postgres Query Throughput and Latency Monitoring Now

https://tanelpoder.com/posts/catbench-vector-search-query-throughput-latency-monitoring/

The whole app is meant to be a learning tool, an easy way to run some vector similarity search workloads (together with other application data) and be able to see the various query texts, execution plans and plan execution metrics when you navigate around.

Repo: https://github.com/tanelpoder/catbench

Understanding logical replication in Postgres

https://www.springtail.io/blog/postgres-logical-replication

What is replication in a Postgres database?

PostgreSQL database replication is a process of creating and maintaining multiple copies of a database across different servers. This technique is crucial for ensuring data availability, improving performance, and enhancing disaster recovery capabilities.

This page from the Postgres docs describes all the ways you can approach achieving high-availability and replication with Postgres, including considerations for the tradeoffs.

Physical Replication

Physical replication is the replication of the physical blocks of the Write-Ahead Log after changes are applied. It is beneficial for large-scale replication because it copies data at the block level, making it faster; the data does not have to be summarized as in logical replication. Additionally, physical replication supports streaming the WAL providing near real-time updates.

However, it replicates the entire database cluster, which includes all databases and system tables, making it less flexible for partial replication or replication of only a subset of databases or tables. Changes to the physical layout of data blocks, that may change across Postgres versions, can break compatibility.

Logical Replication

Logical replication exposes a logical representation of changes made within transactions, e.g., an update X was applied to row Y, rather than replicating the physical data blocks of the WAL. This allows for the replication of specific tables or databases, and supports cross-version replication, which makes it more flexible for complex replication scenarios.

However, logical replication can be slower for large-scale replication, requires more configuration and management, and does not support certain database modifications, such as schema changes like ALTER TABLE, and other metadata or catalog changes.

How Instacart Built a Modern Search Infrastructure on Postgres

https://tech.instacart.com/how-instacart-built-a-modern-search-infrastructure-on-postgres-c528fa601d54

Unconventionally, they transitioned from Elasticsearch to Postgres FTS and Vector Search.

A key insight was to bring compute closer to storage. This is opposed to more recent database patterns, where storage and compute layers are separated by networked I/O. The Postgres based search ended up being twice as fast by pushing logic and computation down to the data layer instead of pulling data up to the application layer for computation. This approach, combined with Postgres on NVMEs, further improved data fetching performance and reducing latency.

Big Problems From Big IN lists with Ruby on Rails and PostgreSQL

https://andyatkinson.com/big-problems-big-in-clauses-postgresql-ruby-on-rails

I always enjoy these detailed PostgreSQL query explorations by Andrew.

I’ve seen this exact “big in clause” issue arise in many production Rails systems. These are tricky because the queries are often generated by ActiveRecord and they work fine for small data. As your data grows over time or specific power users do their thing, these sneaky, slow IN queries will start to crop up. Since they only happen some times, they often go unnoticed or appear hard to reproduce.

Andrew presents a lot of good options for rewriting these queries in a way that allows the database to better plan and optimize the query.

Short alphanumeric pseudo random identifiers in Postgres

https://andyatkinson.com/generating-short-alphanumeric-public-id-postgres

I’ve had to add this style of identifier to many apps that I’ve worked on, e.g. for something like coupon codes. I end up with a bespoke solution each time, usually in app code. It’s cool to see a robust solution like this at the database layer by adding a couple functions.

Concurrent locks and MultiXacts in Postgres

https://blog.danslimmon.com/2023/12/11/concurrent-locks-and-multixacts-in-postgres/

When multiple transactions have no exclusive locks on the same row, the database needs a way of tracking that set of transactions beyond the single xmax value that initially locks the row.

Postgres creates a MultiXact. A MultiXact essentially bundles together some set of transactions so that those transactions can all lock the same row at the same time. Instead of a transaction ID, a new MultiXact ID is written to the row’s xmax.

Speaking in Tongues: PostgreSQL and Character Encodings

https://thebuild.com/blog/2024/10/27/speaking-in-tongues-postgresql-and-character-encodings/

The character encoding decision is an easy one: Use UTF-8. Although one can argue endlessly over whether or not Unicode is the “perfect” way of encoding characters, we almost certainly will not get a better one in our lifetime, and it has become the de facto (and in many cases, such as JSON, the de jure) standard for text on modern systems.

In Defense of PostgreSQL MVCC and Vacuuming

https://www.softwareandbooz.com/in-defense-of-postgresql-mvcc-and-vacuuming/

For all the FUD that gets spread about PostgreSQL’s MVCC architecture, its downsides are not impacting most users and the upsides are great.

Most of the complaints about the PostgreSQL implementation of MVCC revolve around vacuuming issues which can cause table bloat, transaction ID wraparound, and wasted overhead introduced by the visibility map.

From the Postgres docs:

The main advantage of using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading. PostgreSQL maintains this guarantee even when providing the strictest level of transaction isolation…

…proper use of MVCC will generally provide better performance than locks.

There are always tradeoffs. Someone being over-critical about the downsides of MVCC may not be representing what you lose when you switch away from Postgres.

The solution isn’t necessarily to jump ship, either. In other databases a large select statement can block your writes. You also won’t have a rich extension ecosystem or a complete vendor agnostic database.

Here are the kinds of things to learn about if you want to better understand how to tune your database for impacted workloads:

Talk about vacuum settings. Help people understand how to tune individual table thresholds. Discuss maintenanceworkmem, effective partitioning, and how to validate your settings with a given workload.

PostgreSQL Operations that can cause Table Rewrites

https://www.linkedin.com/posts/gwenshapira_table-rewrites-can-be-slow-and-disruptive-activity-7322327616135319552-ipFR

For example, doing a full vacuum rewrites a table. This is why tools like pg_repack exist, which help reduce the amount of locking.

See the post for the full table of operations.

Reduce locking caused by `set not null` with a check constraint

https://github.com/sbdchd/squawk/issues/300

During Gülçin Yıldırım Jelínek's talk "Anatomy of Table-Level Locks in PostgreSQL", she mentioned that the common "Expand and Contract" schema migration approach to applying not null to an existing column will still involve an Access Exclusive table lock. This is because PostgreSQL needs to do a full table scan to ensure there are no instances of a null.

She goes on to mention that we can do better than that. We can further reduce locking with an additional step. If we first add an invalid not null check constraint for the column and eventually validate it after backfilling the entire table, then the alter table statement for adding the not null constraint will only need a Share Update Exclusive table lock. This means we can generally still perform SQL operations that read and update data.

According to the alter table docs:

SET NOT NULL may only be applied to a column provided none of the records in the table contain a NULL value for the column. Ordinarily this is checked during the ALTER TABLE by scanning the entire table; however, if a valid CHECK constraint is found which proves no NULL can exist, then the table scan is skipped.

Here is the relevant commit and commit message:

If existing CHECK or NOT NULL constraints preclude the presence
of nulls, we need not look to see whether any are present.

Which was added in the PostgreSQL 12 release.

This approach is also recommended by the strong_migrations gem for Rails.

What happens when you add and remove a column from a table in Postgres

https://www.linkedin.com/posts/gwenshapira_what-happens-when-you-add-and-remove-a-column-activity-7320517353597718529-u_Fi

Here’s the question that Gwen poses:

What happens when you add and remove a column from a table in Postgres 2000 times?

I couldn’t come up with any good guesses.

The surprising answer is that PostgreSQL 1) has a hard limit of 1600 columns on any given table and 2) that removes columns are effectively soft-deleted. This means that each column you add to a table, regardless of whether it is later deleted, will contribute to that hard limit.

If you somehow run into this limit in a real-world database deployment (unlikely), you’ll have to swap everything into a new table to get around it.

What I Wish Someone Told Me About Postgres

https://challahscript.com/what_i_wish_someone_told_me_about_postgres

The Postgres docs are massive, so “Read The Manual” is tough advice when getting started with it.

Except…Postgres. It’s not because the official docs aren’t stellar (they are!)–they’re just massive. For the current version (17 at the time of writing), if printed as a standard PDF on US letter-sized paper, it’s 3,200 pages long. It’s not something any junior engineer can just sit down and read start to finish.

This article breaks down a bunch of features and advice for getting into and using Postgres. It ranges from things like unintuitive NULL behavior, the Don’t Do This Postgres wiki, psql enhancements, and advice on indexes.

Trigger recursion in PostgreSQL and how to deal with it

https://www.cybertec-postgresql.com/en/dealing-with-trigger-recursion-in-postgresql/

This article walks through a couple examples of where infinite trigger recursion can happen if we’re not careful about when we’re updating and under what conditions.

I learned a few new things including the function for getting the current trigger depth and that a trigger definition can include a when clause so that it conditionally runs.

Great article, very approachable.

(All) Databases Are Just Files. Postgres Too

https://tselai.com/all-databases-are-just-files

This is an intriguing simplification of how to think about what Postgres is and what it does:

At its core, postgres is simply a program that turns SQL queries into filesystem operations. A CREATE TABLE becomes a mkdir. An UPDATE eventually becomes: open a file, write to it, close it. It’s a complex and powerful system—but fundamentally, it’s just an executable that manipulates files.

Making the case for lifting the “veil of ignorance” and building up a mental model for how Postgres works:

Understanding how PostgreSQL actually works—how it’s started, where the files live, what the key binaries do—puts real power in your hands. It removes the mystery. It means you’re no longer just reacting when something breaks; you’re diagnosing, tweaking, even optimizing.

Easy, alternative soft deletion: `deleted_record_insert`

https://brandur.org/fragments/deleted-record-insert

This is a simple (as in ‘not complex’) pattern for saving any deleted records that you want to persist in a shared, schemaless table. Triggers fire after delete for any table you want to save and that deleted data is inserted as a jsonb entry.

I suggest an alternative to deletedat called deletedrecord, a separate schemaless table that gets a full dump of deleted data, but which doesn’t interfere with mainline code (no need to include a deleted_at IS NULL predicate in every live query, no foreign key problems), and without the expectation that it’ll be used to undelete data.

This article is great, but it leaves me unsatisfied in that I don’t feel it touches on the “why” at all. Why and for what purpose are we saving the deleted data? Is it just for debugging purposes? Do we ever restore it or otherwise reference it?

Brandur also suggests that a simple deleted_at column won’t really work for undeleting data. Why not? This seems like the best method for that use case assuming you don’t mind all the implications for app code, queries, and indexes.

Brandur covers the reasons that undoing soft deletes isn’t as easy as it seems in Soft Deletion Probably Isn’t Worth It:

The biggest reason for this is that almost always, data deletion also has non-data side effects. Calls may have been made to foreign systems to archive records there, objects may have been removed in blob stores, or servers spun down. The process can’t simply be reversed by setting NULL on deleted_at – equivalent undos need to exist for all those other operations too, and they rarely do.

I came across the primary article (in the show notes) after listening to the Soft Deletes episode of Postgres.fm.

5 Hard Problems Companies May Need Consulting Help On for their PostgreSQL Databases

https://www.linkedin.com/posts/samokhvalov_postgresql-activity-7316278118468489216-h_qr

Nikolay points out some of the things that are uniquely hard and common challenges for many PostgreSQL databases. This means it is also a good list of things to demonstrate expertise in because companies will likely need consulting help with each of these.

Five hard problems in #PostgreSQL:

  1. LWLock contention of various kinds when various buffers overflow / thresholds exceeded
  2. no simple path to sharding for OLTP
  3. single-threaded processes (eg replication bottlenecks)
  4. partitioning requires a lot of effort
  5. upgrades too

I've done a number of PostgreSQL major version upgrades for companies. They are usually pretty uneventful because of how careful Postgres is with backward-compatibility. That said, they need to be carefully planned and executed to avoid or minimize downtime.

I explored partitioning with one client who had a massive events table. It is awkward and challenging to set up a partition after the fact. Ideally, you've designed the schema with partitioning from the start. In lieu of partitioning, we decided to do a swap-and-drop migration to remove hundreds of millions of old, unneeded rows.

Wait a minute! — PostgreSQL extension pg_wait_sampling

https://andyatkinson.com/blog/2024/07/23/postgresql-extension-pg_wait_sampling

The pg_wait_sampling extension is a handy companion to pg_stat_statements and pg_locks, providing historical samplings of wait events. This helps with tracking down queries that block one and other causing contention in your database.

https://github.com/postgrespro/pg_wait_sampling

did u ever read so hard u accidentally wrote?

https://blog.danslimmon.com/2025/03/14/did-u-ever-read-so-hard-u-accidentally-wrote/

Reads (selects) won’t modify rows, but because of the internal bookkeeping that Postgres does, it can result in metadata that the WAL needs to write out to pages on disk. So in that sense, reads can result in writes.

This is such a relatable conclusion that echos many of my experiences chasing down odd production bugs:

Ops is like this a lot of the time. Once you get a working fix, you move on to whatever’s the next biggest source of anxiety. Sometimes you never get a fully satisfying “why.” But you can still love the chase.

Also, yes, Cybertec is always on top of it. One of the best in-depth Postgres blogs out there.

So I resort to Googling around for a while. I eventually land on this Cybertec blog post (there’s always a Cybertec post. God bless ’em), which demystifies shared buffers for me.

Release and EoL calendars for Amazon RDS for PostgreSQL

https://docs.aws.amazon.com/AmazonRDS/latest/PostgreSQLReleaseNotes/postgresql-release-calendar.html

AWS maintains a nice table of both the Community End of Life and RDS End of Standard Support dates for major versions of PostgreSQL. I had trouble finding this info all in a single place elsewhere.

PostgreSQL has an official page documenting their versioning policy and when the first and last release dates of each major version were. This doesn't include community end of life dates though.

Squeeze the hell out of the system you have, by Dan Slimmon

https://blog.danslimmon.com/2023/08/11/squeeze-the-hell-out-of-the-system-you-have/

This article is great because it gets at the higher-level thinking that engineering leads and CTOs need to bring to the table when your team is making high-impact technical decisions.

Anyone who has been in the industry a bit can throw around the pithy phrases we use to sway approval toward the decision we're pitching, e.g. "micro-services allow us to use the right tool for the job".

That can be a compelling argument alone if the stakes are low or we're not paying attention.

The higher-level thinking that needs to come in looks beyond the lists of Pros that we can make for any reasonable item that is put forward.

We have to have an understanding of tradeoffs and a more holistic sense of the costs.

But don’t just consider the implementation cost. The real cost of increased complexity – often the much larger cost – is attention.

The attention cost is an ongoing cost.

[Clever solution that adds complexity] complicates every subsequent technical decision.

Squeeze what you can out of the system, buying time, until you have to make a concession to complexity.

When complexity leaps are on the table, there’s usually also an opportunity to squeeze some extra juice out of the system you have.

because we squeezed first, we get to keep working with the most boring system possible.

Validating Data Types from Semi-Structured Data Loads in Postgres with pg_input_is_valid

https://www.crunchydata.com/blog/validating-data-types-from-semi-structured-data-loads-in-postgres-with-pg_input_is_valid

The pg_input_is_valid function is a handy way to check for whether rows of data conform to (are castable to) a specific data type. Let's say you have a ton of records with a payload text column. Those fields mostly represent valid JSON, but some are non-JSON error messages. You could write an update SQL statement like this:

UPDATE my_table
SET json_payload = CASE 
    WHEN pg_input_is_valid(payload, 'jsonb') 
    THEN payload::jsonb 
    ELSE '{}'::jsonb 
    END;

The post also shows off a nice technique for loading a ton of CSV data in where you may not be sure that every single row conforms to the various data types. It would be a shame to run a copy statement that loads 95% of the data and then suddenly fails and rolls back because of an errant field.

Instead, load everything in as text:

CREATE TEMP TABLE staging_customers (
    customer_id TEXT,
    name TEXT,
    email TEXT,
    age TEXT,
    signup_date TEXT
);

-- copy in the data to the temp table
COPY staging_customers FROM '/path/to/customers.csv' CSV HEADER;

Then use an approach similar to what I described in the first code block to migrate the valid values over to other rows in the same or a different table.

In Elizabeth's example, the invalid records are ignored while the rest are moved to the new table:

INSERT INTO customers (name, email, age, signup_date)
SELECT name, email, age::integer, signup_date::date
FROM staging_customers
WHERE pg_input_is_valid(age, 'integer')
  AND pg_input_is_valid(signup_date, 'date');

PostgreSQL Mistakes and How to Avoid Them

https://www.manning.com/books/postgresql-mistakes-and-how-to-avoid-them

PostgreSQL Mistakes and How To Avoid Them reveals dozens of configuration and operational mistakes you’re likely to make with PostgreSQL. The book covers common problems across all key PostgreSQL areas, from data types, to features, security, and high availability. For each mistake you’ll find a real-world narrative that lays out context and recommendations for improvement.

I might have expected a book like this to be all about PostgreSQL-specific SQL and data modeling concepts. Even better, it also covers things like Performance, Administration features, Security, and High-Availability concepts.

Via LinkedIn

Comparison of the transaction systems of Oracle and PostgreSQL

https://www.cybertec-postgresql.com/en/comparison-of-the-transaction-systems-of-oracle-and-postgresql/

Nice example of when deferrable constraints matter in PostgreSQL.

CREATE TABLE tab (id numeric PRIMARY KEY);

INSERT INTO tab (id) VALUES (1);
INSERT INTO tab (id) VALUES (2);

UPDATE tab SET id = id + 1;
ERROR:  duplicate key value violates unique constraint "tab_pkey"
DETAIL:  Key (id)=(2) already exists.

The reason is that PostgreSQL (in violation of the SQL standard) checks the constraint after each row, while Oracle checks it at the end of the statement. To make PostgreSQL behave the same as Oracle, create the constraint as DEFERRABLE. Then PostgreSQL will check it at the end of the statement.

End of the road for PostgreSQL streaming replication?

https://www.cybertec-postgresql.com/en/end-of-the-road-for-postgresql-streaming-replication/

These kinds of performance / stress tests of PostgreSQL are always fascinating to me. There are many factors you need to consider about the machine’s stats, you have to set up multiple databases, maybe multiple clusters, and then after running a tool like pgbench, you have to make sure you’ve observed and captured useful data that you can draw conclusions from.

To try to quantify the crossover point, I ran a small test. I initialized a 8GB pgbench database that fits into shared buffers, then I set up WAL archiving and took a backup. Next, I ran 30min of pgbench with synchronous_commit=off to generate some WAL. On a 24 thread workstation, this generated 70GB of WAL containing 66M pgbench transactions, with an average speed of 36.6k tps. Finally, I configured Postgres to run recovery on the backup. This recovery was able to complete in 372 seconds, or 177k tps.

Here are some notes from AWS on benchmarking Postgres with pgbench: https://docs.aws.amazon.com/whitepapers/latest/optimizing-postgresql-on-ec2-using-ebs/postgresql-benchmark-observations-and-considerations.html

Learn Postgres at the Playground

https://www.crunchydata.com/blog/learn-postgres-at-the-playground

The folks at CrunchyData created a playground for playing around with and learning PostgreSQL... right in the browser.

SQL Tricks for More Effective CRUD is one of dozens of the tutorials they have since published.

This is made possible via WASM. Crazy Idea to Postgres in the Browser goes into more details about how they pulled it off.

It seems like there is a lot of energy moving in this direction. When I search "postgres wasm", several of the results are about PGlite from ElectricSQL (which is building a sync engine, a play in the local-first space).

supabase released what they are calling database.build that connects you to a Postgres database in the browser and gives you AI tools for interacting with that database.

the in-browser Postgres sandbox with AI assistance. With database.build, you can instantly spin up an unlimited number of Postgres databases that run directly in your browser (and soon, deploy them to S3). Each database is paired with a large language model (LLM)...

Parameter Type Inference - could not determine data type of parameter $1

https://github.com/adelsz/pgtyped/issues/354

Odd PostgreSQL thing related to Prepared Statement / Parameter Type Inference I'm still trying to unravel.

I had the following bit of ActiveRecord query:

@tags =
  Tag.where("? is null or normalized_value ilike ?", normalized_query, "%#{normalized_query}%")
     .order(:normalized_value)
     .limit(10)

which short-circuits the filter (where) if normalized_query is nil. This worked in development when the normalized_query value was and wasn't present.

However, as soon as I shipped this to production, it was failing. I found the following error in the logs:

Caused by: PG::IndeterminateDatatype (ERROR:  could not determine data type of parameter $1)

I fixed it by rewriting the query to type cast to text which made postgres no longer unsure in production what the type of the parameter would be:

@tags =
  Tag.where("cast(? as text) is null or normalized_value ilike ?", normalized_query, "%#{normalized_query}%")
     .order(:normalized_value)
     .limit(10)

Yay, fixed. Buuut, I don't get why this worked in dev, but not production. My best guesses are either that there is some different level of type inference that production is configured for (seems unlikely) or that the prepared statement in production gets prepared with different type info. Perhaps different connections are getting different prepared statement versions which might lead to it being flaky?

This is weird. Any idea what could be going on here?

Interestingly, I found a typescript project that was reporting the EXACT same issue for the EXACT same type of query -- https://github.com/adelsz/pgtyped/issues/354