SERIALIZABLE in PostgreSQL 11... and beyond

Thanks to the tireless work of Google Summer of Code student Shubham Barai with the support of reviewers, a mentor and a committer, PostgreSQL 11 will ship with predicate lock support for hash indexesgin indexes and gist indexes.  These will make SERIALIZABLE transaction isolation much more efficient with those indexes, filling in some of the feature combination gaps and quirks that exist in my favourite RDBMS.

It seems like a good time to write a bit about SERIALIZABLE and how it interacts with other selected PostgreSQL features, including indexes.

A bit of backgroundIf you want to read something a little less dry than the usual papers about transaction isolation, I recommend ACIDRain: Concurrency-Related Attacks on Database-Backed Web Applications which, among other things, discusses a transaction isolation-based attack that bankrupted a bitcoin exchange.  It also makes some interesting observations about some of PostgreSQL's rivals.  Even excluding malicious attacks, I've…

The PostgreSQL Machine

PostgreSQL is a portable RDBMS targeting POSIX systems (and Windows).  It also makes some assumptions about the operating system and hardware it's running on that are not covered by POSIX, but hold on all typical systems.  For example:
we assume that 32 bit aligned integers can be read and written atomically; that is, without any kind of synchronisation, you might read an arbitrarily stale value but you won't see a "torn" value with a mixture of bits from the before and after values of a concurrent writewe assume that system calls (or at least IPC-related syscalls) synchronise memory; that is, if you write to shared memory and then signal another process, the other process will then be able to read the valuewe assume that disk blocks of 512 bytes (or some multiple) are written atomically when preallocated; that is, if you lose power and then come back up, you'll either see the old or the new version of a 512-byte block, and not a mixture of bits from the two or o…

Parallel Hash for PostgreSQL

PostgreSQL 9.6 and 10 can use all three join strategies in parallel query plans, but they can only use a partial plan on the outer side of the join.  As of commit 18042840, assuming nothing irreparably busted is discovered in the next few months, PostgreSQL 11 will ship with Parallel Hash.  Partial plans will be possible on both sides of a join for the first time.

There will certainly be some adjustments before it's released, but it seems like a good time to write a blog article to present Parallel Hash.  This is the biggest feature I've worked on in PostgreSQL so far, and I'm grateful to the reviewers, testers, committers and mentors of the PostgreSQL hacker community and EnterpriseDB for making this work possible.

So what does this feature really do?
A simple example
Using the "orders" and "lineitem" tables from TPC-H scale 30GB, here is a very simple join query answering the (somewhat contrived) question "how many lineitems have there ever been, con…