Stop Thinking, Just Do!

Sungsoo Kim's Blog

ACID Isolation Level

tagsTags

23 February 2016


Article Source


When is “ACID” ACID? Rarely.

ACID and NewSQL databases rarely provide true ACID guarantees by default, if they are supported at all. See the table.

Many databases today differentiate themselves from their NoSQL counterparts by claiming to support “100% ACID” transactions or by “guaranteeing strong consistency (ACID).” In reality, few of these databases—including traditional “big iron” systems like Oracle—provide formal ACID guarantees, even when they claim to do so.

The textbook definition of ACID Isolation is serializability (e.g., Architecture of a Database System, Section 6.2), which states that the outcome of executing a set of transactions should be equivalent to some serial execution of those transactions. This means that each transaction gets to operate on the database as if it were running by itself, which ensures database correctness, or consistency. A database with serializability (“I” in ACID), provides arbitrary read/write transactions and guarantees consistency (“C” in ACID), or correctness, of the database. Without serializability, ACID, particularly consistency, is generally not guaranteed

Nevertheless, most publicly available databases (often claiming to provide “ACID” transactions) do not provide serializability. I’ve compiled the isolation guarantees provided by 18 popular databases below (sources hyperlinked). Only three of 18 databases provide serializability by default, and only nine provide serializability as an option at all (shaded):

DatabaseDefault IsolationMaximum Isolation
Actian Ingres 10.0/10SSS
AerospikeRCRC
Akiban PersistitSISI
Clustrix CLX 4100RR?
Greenplum 4.1RCS
IBM DB2 10 for z/OSCSS
IBM Informix 11.50DependsRR
MySQL 5.6RRS
MemSQL 1bRCRC
MS SQL Server 2012RCS
NuoDBCRCR
Oracle 11gRCSI
Oracle Berkeley DBSS
Oracle Berkeley DB JERRS
Postgres 9.2.2RCS
SAP HANARCSI
ScaleDB 1.02RCRC
VoltDBSS
Legend RC: read committed, RR: repeatable read, S: serializability,
SI: snapshot isolation, CS: cursor stability, CR: consistent read

Instead of providing serializability, many these databases provide one of several weaker variants, often when marketing material and documentation claim otherwise. There is no fundamental reason why a database shouldn’t support serializability—we have the algorithms, and we’ve made great strides in improving ACID scalability. So why not provide serializability by default, or, at the least, provide serializability as an option at all? One key factor is performance: serializable isolation can limit concurrency; traditional techniques such as two-phase locking are expensive compared to, say, taking short read locks on data items. Additionally, it is impossible to simultaneously achieve high availability and serializability (though most of these database implementations are not highly available anyway, even when providing weaker models). A third reason is that transactions may be less likely to deadlock or abort due to conflicts under weaker isolation. However, these benefits aren’t free: the consistency anomalies that arise from the weak levels shown above are well-understood and quantifiable.

Where’s the silver lining? We can get real ACID in some of our databases (if not by default). And, despite the fact that many other “ACID” databases don’t provide ACID properties—at least according to decades of research and development and formally proven guarantees regarding database correctness (although perhaps marketing has rewritten the books)—we can still reserve travel tickets, use our bank accounts, and fight crime. How? One possibility is that anomalies are rare and the performance benefits of weak isolation outweigh the cost of inconsistencies. Another possibility is that applications are performing their own concurrency control external to the database; database programmers can use commands like SELECT FOR UPDATE, manual LOCK TABLE, and UNIQUE constraints to manually perform their own synchronization. The answer is likely a mix of each, but, stepping back, these strategies should remind you of what’s often done today in NoSQL-style data infrastructure: “good enough” consistency and some hand-rolled, application-specific concurrency control. Perhaps there’s a better question: when is “ACID” NoSQL?


comments powered by Disqus