Post on 20-Dec-2015
transcript
Signature Based Concurrency Control
Thomas Schwarz, S.J.JoAnne Holliday
Santa Clara UniversitySanta Clara, CA 95053
tjschwarz,jholliday@scu.edu
Overview
Transactional concurrency control in a distributed system: Signatures are a better version of
version numbers. Signatures are calculated from the
records.
Basic Idea
A signatures is a short string of f bits calculated from a record. We assume here an LH* file scenario. File is a dictionary data structure
associating keys with a non-key field:
key c non-key field
signature
Basic Idea When a transaction reads a record it
records the signature of the record. When the transaction is ready to
commit, it checks whether any signatures of records it read have changed. If this is the case, the transaction
restarts. Otherwise, it commits.
Basic Idea
Danger of false negative: Two different records can have the
same signature. Control the probability of false
negatives by the length of the signature (16B) MD5, (20B) SHA1 are excepted
in computer forensics.
Simple Signature Scheme
Each transaction i contains atomic operations: Ri(x) – Read record x Wi(x) – Write record x Vi(x) – Verify the signature of record x Ai – Abort Ci – Commit
Simple Signature Scheme
Rules for transaction i All reads precede all verify. All verifies precede all writes. If another transaction j writes to x
between a read and a verify, then transaction i aborts.
If all verifies are successful, then the transaction does all its writes and commits.
Simple Signature Scheme
Dirty Reads: Ri(x)Wj(x) Aj Ci
or Ri (x) Wj(x) Ci Aj
Impossible, because a transaction that writes also commits.
Simple Signature Scheme
Fuzzy Reads: Ri(x)Wj(x) Cj Ri(x)
Possible only if we were to allow multiple reads to the same item x:
R1(x) W2(x) C2 R1(x) V1(x) C1.
Simple Signature Scheme
If we do all the reads in a single block: Has arguably ANSI REPEATABLE READ
property. Even has ANSI ANOMALY
SERIALIZABLE. But it is certainly not serializable:
R1(x) R2(x) R1(y) R2(y) V1(x) V2(x) V1(y) V2(y) W1(x) W2(x) W2(y) W1(y) C1 C2
Extended Signature Scheme
Add: Verify-Write phase is atomic. Then: Scheme is (conflict)
serializable. Proof (Idea):
Consider all reads to be “pre-reads”. Only the verify operations are read in the
sense of concurrency control. Then the result follows by definition.
Implementation Lock based implementation:
Read-Calculate Phase No locking at all. However, a transaction that
reads an exclusively locked record might want to reread that record because that record might change.
Verify-Write Phase Read lock on all the signatures of records read. Write lock on all the signatures of records to be
modified. Verify signatures and decide on commit / abort. Release all locks.
Implementation Lock based implementation:
Conservative Strict Two-Phase Locking
Locks are short-lived: One round of messages to acquire locks and
signatures. One round of messages for commit / abort
and release messages.
Implementation
No-locking scheme
Transaction appear to servers to be very short.
Chance for conflict limited.
Signature Implementation
We do not use the record signature directly, but a region signature. A region is a contiguous set of keys
that all hash to the same bucket. Typically, a region should have
between 0.5 and 5 records on average.
Signature Implementation Let ci be the keys in a region.
Then set the region signature to be
Arithmetic is done in a GF. g hashes keys into GF. The record signature of a non-existing
record is zero.
( ) sig(record( ))i ii I
g c c
Signature Implementation The verify operations read region
signatures. Addressed by the key-space they
cover. Locking is done on regions. Store region signatures. Large regions have little storage
overhead, small ones have large storage overhead.
Implementation No-Locking Scheme Assumes loosely synchronized clocks.
Clocks that are accurate to within a small multiple of average message delay.
Transaction acquires a time-stamp at the lowest numbered SDDS bucket it visits.
Transaction sends verify / write / vote requests to all servers it visited.
Each server votes on whether the transaction should commit.
In the usual way. If every server returns a yes vote to the transaction
manager, then the transaction commits. Transaction manager sends out the result of the vote.
Discussion Signature scheme interesting if
transactions have large calculation times and updates are rare.
Signature scheme should be extendible to replicated databases.
Size of region can be fit to the scale of the file, so that a region always has about the same number of records. E.g. whenever the LH* split pointer returns
to zero, split regions in half.