Quick Notes for Serializability Theory aka Transaction Scheduling



Note: All of the below content I have summarized from reading Phil Bernstein's book: Phil Bernstein at Microsoft Research Concurrency Control and Recovery in Database Systems


Systems ranging from web-services to databases have adopted the concept of transactions. Transactions are sequence of reads and writes changing the state (usually a database) via read and write operations. These transactions need to be handled by some notion of a scheduler. Serializability theory is a mathematical tool to prove whether the scheduler works or not.



  • Transactions: A representation of an execution of read and write operations that identifies the operations and the order in which they execute. Last operation is a commit or an abort. Commit indicates successful termination, abort indicates unsuccessful termination.


  • History: Represents concurrent execution of a set of transactions.


  • Serializable History: A history that represents a serializable execution of it's transactions. In easier terms it is a sequence of transactions that, when executed, produce the same result as if the transactions were executed serially, one after the other, without any interleaving. This ensures that the concurrent execution of transactions does not lead to inconsistencies in the database.


  • Partial Order: An abstract model that describes the order of executions in a set of transactions running in a system. Some executions describe dependencies whereas some don't, this allows for concurrent executions. When operations across transactions are interacting over shared data such that one is a read and the other is a write partial order needs to define strict happens before relationship since the outcome depends on which operations happens first. Histories are defined as partial orders.


  • Equivalent Histories: 2 histories are equivalent if they are defined over the same transactions and they order conflicting operations of non-aborted transactions in the same way. Mental model of this is that outcome of concurrent execution of transactions depends only on relative ordering of conflicting operations (2 ops on same data where atleast 1 is a write).


  • Serializability Theorem: Given a Serialization Graph (transactions are verticies and conflicts are edges between them), you can say a history is serializable if it's Serialization graph is acylic.


  • Recoverable Histories: To ensure correctness in failures Scheduler should produce executions that are not only Serializable but also recoverable. History is called recoverable if each transaction commits after the commitment of all transaction (other than itself) that it reads from.


  • Cascading Aborts: In recoverable histories let's say a transaction that N other transactions read from and are blocked on committing themselves aborts. Then all N transactions will have to abort too since the transaction they had initially read from did not commit so they can no longer commit.


  • Histories Avoiding Cascading Aborts: A transaction in such a history only read data that has been written by an ALREADY committed transaction or itself.


  • Strict History: No data item may be read or overwritten until the transaction that previously wrote into it terminates, either by aborting or by committing.


  • Strict vs ACA: strict histories provide a higher level of isolation by restricting both reads and writes until previous transactions commit, while ACA histories allow more concurrency by only restricting reads until previous transactions commit. Both approaches aim to prevent cascading aborts, but they balance isolation and concurrency differently.


Comments

Commenter: Hamdaan Khalid

ACA can have issues of dirty writes that strict can't so strict is common impl. Also recoverable, ACA, Strict is the order from least to most reduced concurrency but stronger consistency guarantees.

Add a comment: