Skip to main content

SERIALIZABLE in PostgreSQL 11... and beyond

Thanks to the tireless work of Google Summer of Code student Shubham Barai with the support of reviewers, a mentor and a committer, PostgreSQL 11 will ship with predicate lock support for hash indexesgin indexes and gist indexes.  These will make SERIALIZABLE transaction isolation much more efficient with those indexes, filling in some of the feature combination gaps and quirks that exist in my favourite RDBMS.

It seems like a good time to write a bit about SERIALIZABLE and how it interacts with other selected PostgreSQL features, including indexes.

A bit of background

Violin VL100.pngIf you want to read something a little less dry than the usual papers about transaction isolation, I recommend ACIDRain: Concurrency-Related Attacks on Database-Backed Web Applications which, among other things, discusses a transaction isolation-based attack that bankrupted a bitcoin exchange.  It also makes some interesting observations about some of PostgreSQL's rivals.  Even excluding malicious attacks, I've been working with databases long enough to have heard plenty of transaction isolation screw-up stories that I can't repeat here including trading systems and credit limit snafus, unexpected arrest warrants and even ... a double booked violin teacher.

True SERIALIZABLE using the SSI algorithm is one of my favourite PostgreSQL features and was developed by Dan Ports and Kevin Grittner for release 9.1.  From what I've seen and heard, SERIALIZABLE has a reputation among application developers as a complicated expert-level feature for solving obscure problems with terrible performance, but that isn't at all justified... at least on PostgreSQL.  As the ACIDRain paper conveys much better than I can, weaker isolation levels are in fact more complicated to use correctly with concurrency.  For non-trivial applications, error-prone ad-hoc serialisation schemes based on explicit locking are often required for correctness.  While the theory behind SSI may sound complicated, that's the database's problem!  The end user experience is the exact opposite: it's a switch you can turn on that lets you write applications that assume that each transaction runs in complete isolation, dramatically cutting down the number of scenarios you have to consider (or fail to consider).  In short, it seems that it's the weaker isolation levels that are for experts.

As an example, suppose you are writing a scheduling system for a busy school.  One transaction might consist of a series of queries to check if a teacher is available at a certain time, check if a room is available, check if any of the enrolled students has another class at the same time, check if the room's capacity would be exceeded by the currently enrolled students, and then finally schedule a class.  If you do all of this in a SERIALIZABLE transaction then you don't even have to think about concurrent modifications to any of those things.  If you use a weaker level, then you have to come up with an ad-hoc locking strategy to make sure that a concurrent transaction doesn't create a scheduling clash.

Most other databases use a pessimistic locking strategy for SERIALIZABLE, which amounts to literally serialising transactions whose read/write set conflicts.  In contrast, PostgreSQL uses a recently discovered optimistic strategy which allows more concurrency and avoids deadlocks, but in exchange for the increased concurrency it sometimes needs to abort transactions if it determines that they are incompatible with all serial orderings of the transactions.  When that happens, the application must handle a special error code by retrying the whole transaction again.  Many workloads perform better under SSI than under the traditional locking strategy, though some workloads (notably queue-like workloads where sessions compete to access a 'hot' row) may be unsuitable because they generate too many retries.  In other words, optimistic locking strategies pay off as long as the optimism is warranted.  Pessimistic strategies may still be better if every transaction truly does conflict with every other transaction.

The type of locks used by SSI are known as "SIREAD" locks or "predicate" locks, and they are distinct from the regular PostgreSQL "heavyweight locks" in that you never wait for them and they can't deadlock.  The SSI algorithm permits spurious serialisation failures, which could be due to lack of memory for SIREAD locks leading to lock escalation (from row to page to relation level), lack of support in index types leading to lock escalation (see below), or the fundamental algorithm itself which is based on a fast conservative approximation of a graph cycle detector.  We want to minimise those.  A newer algorithm called Precise SSI might be interesting for that last problem, but there is certainly much lower hanging fruit.

Interaction with indexes

Unlike the regular locking that happens when you update rows, SERIALIZABLE needs to lock not only rows but also "predicates", representing gaps or hypothetical rows that would match some query.  If you run a query to check if there is already a class booked in a given classroom at a given time and found none, and then a concurrent transaction creates a row that would have matched your query, we need a way to determine that these transactions have an overlapping read/write set.  If you don't use SERIALIZABLE, you'd probably need to serialise such pairs of transactions by making sure that that there is an advisory lock or an explicit FOR UPDATE row lock on something else -- in the school example that might be the row representing the classroom (which is in some sense a kind of "advisory" lock by another name, since all transactions involved have to opt into this scheme).  Predicate locks handle the case automatically without the user having to do that analysis, and make sure that every transaction in the system gets it right.  In PostgreSQL, performing this magic efficiently requires special support from indexes.  PostgreSQL 11 adds more of that.

When SSI was first introduced, only btrees had support for predicate locks.  Conceptually, a predicate lock represents a logical predicate such as "X = 42" against which any concurrent writes must be compared.  Indexes that support predicate locks approximate that predicate by creating SIREAD locks for index pages that would have to be modified by any conflicting insert.  In PostgreSQL 11 that behaviour now extends to gin, gist and hash.

If your query happens to use an index that doesn't support predicate locks, then PostgreSQL falls back to predicate locking the whole relation.  This means that the SSI algorithm will report a lot more false positive serialisation failures.  In other words, in earlier releases if you were using SERIALIZABLE you'd have a good reason to avoid gin, gist and hash indexes, and vice versa, because concurrent transactions would produce a ton of serialisation anomalies and thus retries in your application.  In PostgreSQL 11 you can use all of those feature together without undue retries!

One quite subtle change to the SERIALIZABLE/index interaction landed in PostgreSQL 9.6.   It made sure that unique constraint violations wouldn't hide serialisation failures.  This is important, because if serialisation failures are hidden by other errors then it prevents application programming frameworks from automatically retrying transactions for you on serialisation failure.  For example, if your Java application is using Spring Retry you might configure it to retry any incoming service request on ConcurrencyFailureException; for Ruby applications you might use transaction_retry; similar things exist for other programming environments that provide managed transactions.  That one line change to PostgreSQL was later determined to be a bug-fix and back-patched to all supported versions.  If future index types add support for unique constraints, they will also need to consider this case.

Here ends the part of this blog article that concerns solid contributions to PostgreSQL 11.  The next sections are about progressively more vaporous contributions aiming to fill in the gaps where SERIALIZABLE interacts poorly with other features.

Parallelism

The parallel query facilities in PostgreSQL 9.6, 10 and the upcoming 11 release are disabled by SERIALIZABLE.  That is, if you enable SERIALIZABLE, your queries won't be able to use more than one CPU core.  I worked on a patch to fix that problem.  Unfortunately I didn't quite manage to get that into the right shape in time to land it in PostgreSQL 11 so the target is now PostgreSQL 12.  It's good that parallel query was released when it was and not held back by lack of SERIALIZABLE support, but we need to make sure that we plug gaps like these: you shouldn't have to choose between SERIALIZABLE and parallel query.

Update: this shipped in PostgreSQL 12!


Replication

PostgreSQL allows read-only queries to be run on streaming replica servers.  It doesn't allow SERIALIZABLE to be used on those sessions though, because even read-only transactions can create serialisation anomalies.  A solution to this problem was described by Kevin Grittner.  I have written some early prototype code to test the idea (or my interpretation of it), but I ran into a few problems that are going to require some more study.

Stepping back a bit, the general idea is to extend what SERIALIZABLE READ ONLY DEFERRABLE does on a single-node database server.  Before I explain that, I'll need to explain the concept of a "safe transaction".  One of the optimisations that Kevin and Dan made in their SSI implementation is to identify points in time when READ ONLY transactions become safe, meaning that there is no way that they can either suffer a serialisation failure or cause anyone else to suffer one.  When that point is reached, PostgreSQL effectively silently drops the current transaction from SERIALIZABLE to REPEATABLE READ, or in other words from SSI (serialisable snapshot isolation) to SI (snapshot isolation) because it has proven that the result will be the same.  That allows it to forget all about SIREAD locks and the transaction dependency graph, so that it can go faster.  SERIALIZABLE READ ONLY DEFERRABLE is a way to say that you would like to begin a READ ONLY transaction and then wait until it is safe before continuing.  In other words, it effectively runs in REPEATABLE READ isolation, but waits until a moment when that'll be indistinguishable from SERIALIZABLE.  It might return immediately if no writable SERIALIZABLE transactions are running, but otherwise it'll make you wait until all concurrent writable SERIALIZABLE transactions have ended.  As far as I know, PostgreSQL's safe read only transaction concept is an original contribution not described in the earlier papers.

The leading idea for how to make SERIALIZABLE work on standby servers is to cause it to silently behave like SERIALIZABLE READ ONLY DEFERRABLE.  That's complicated though, because the standby server doesn't know anything about transactions running on the primary server.  The proposed solution is to put a small amount of extra information into the WAL that would allow standby servers to know that read only transactions (or technically snapshots) begun at certain points must be safe, or are of unknown status and must wait for a later WAL record that contains the determination.

I really hope that we can get that to work, because as with the other features listed above, it's a shame to have to choose between load balancing and SERIALIZABLE.


SKIP LOCKED

SKIP LOCKED was the first patch that I wrote for PostgreSQL, released in 9.5.  I wrote it to scratch an itch: we used another RDBMS in my job of the time, and that was one of the features that came up as something missing from PostgreSQL that might prevent us from migrating.

It's designed to support distributing explicit row locks to multiple sessions when you don't care which rows each session gets, but you want to maximise concurrency.  The main use case is consuming jobs from a job queue, but other uses cases include reservation systems (booking free seats, rooms etc) and rolled-up lazily maintained aggregation tables (finding 'dirty' rows that need to be recomputed etc).

This is called a kind of "exotic isolation" by Jim Gray in Transaction Processing: Concepts and Techniques (under the name "read-past").  As far as I can see, it's philosophically opposed to SERIALIZABLE because it implies that you are using explicit row locks in the first place.  That shouldn't be necessary under SERIALIZABLE, or we have failed.  Philosophy aside, there is a more practical problem: the rows that you skip are still predicate-locked, so create conflicts among all the competing transactions.  You lock different rows but only one concurrent transaction ever manages to complete, and you waste a lot of energy retrying.

This forces a choice between SERIALIZABLE and SKIP LOCKED, not because the features exclude each other but because the resulting performance is terrible.

The idea I have to deal with this is to do a kind of implicit SIREAD lock skipping under certain conditions.  First, let's look at a typical job queue processing query using SKIP LOCKED:

    SELECT id, foo, bar
      FROM job_queue
     WHERE state = 'NEW'
     LIMIT 1
FOR UPDATE SKIP LOCKED;

The idea is that under SERIALIZABLE you should be able to remove the FOR UPDATE SKIP LOCKED clause and rely on SSI's normal protections.  Since you didn't specify an ORDER BY clause and you did specify a LIMIT N clause, you told us that you don't care which N rows you get back as long as state = 'NEW'.   This means we can change the scan order.  Seeing the LIMIT and the isolation level, the executor could skip (but not forget) any tuples that are already SIREAD-locked, and then only go back to the ones it skipped if it doesn't manage to find enough non-SIREAD-locked tuples to satisfy the query.  Instead of an explicit SKIP LOCKED mode, it's a kind of implicit REORDER LOCKED (meaning SIREAD locks) that minimises conflicts.

If you add an ORDER BY clause it wouldn't work, because you thereby remove the leeway granted by nondeterminism in the ordering.  But without it, this approach should fix a well known worst case workload for SERIALIZABLE.  Just an idea; no patch yet.

Comments

Popular posts from this blog

Parallel Hash for PostgreSQL

PostgreSQL 9.6 and 10 can use all three join strategies in parallel query plans, but they can only use a partial plan on the outer side of the join.  As of commit 18042840 , assuming nothing irreparably busted is discovered in the next few months, PostgreSQL 11 will ship with Parallel Hash.  Partial plans will be possible on both sides of a join for the first time. There will certainly be some adjustments  before it's released, but it seems like a good time to write a blog article to present Parallel Hash.  This is the biggest feature I've worked on in PostgreSQL so far, and I'm grateful to the reviewers, testers, committers and mentors of the PostgreSQL hacker community and EnterpriseDB for making this work possible. So what does this feature really do? A simple example Using the "orders" and "lineitem" tables from TPC-H scale 30GB, here is a very simple join query answering the (somewhat contrived) question "how many lineitems have there ev

The PostgreSQL Machine

PostgreSQL is a portable RDBMS targeting POSIX systems (and Windows).  It also makes some assumptions about the operating system and hardware it's running on that are not covered by POSIX, but hold on all typical systems.  For example: we assume that 32 bit aligned integers can be read and written atomically; that is, without any kind of synchronisation, you might read an arbitrarily stale value but you won't see a "torn" value with a mixture of bits from the before and after values of a concurrent write we assume that system calls (or at least IPC-related syscalls) synchronise memory; that is, if you write to shared memory and then signal another process, the other process will then be able to read the value we assume that disk blocks of 512 bytes (or some multiple) are written atomically when preallocated; that is, if you lose power and then come back up, you'll either see the old or the new version of a 512-byte block, and not a mixture of bits from the