Ch 3 - Storage and Retrieval
DBs, Comparing B-Trees and LSM-Trees
Even though B-tree implementations are generally more mature than LSM-tree implementations, LSM-trees are also interesting due to their performance characterâ istics. As a rule of thumb, LSM-trees are typically faster for writes, whereas B-trees are thought to be faster for reads [23]. Reads are typically slower on LSM-trees because they have to check several different data structures and SSTables at different stages of compaction.
Index
A compromise between a clustered index (storing all row data within the index) and a nonclustered index (storing only references to the data within the index) is known as a covering index or index with included columns, which stores some of a tableâs colâ umns within the index [33]. This allows some queries to be answered by using the index alone (in which case, the index is said to cover the query) [32].
In-memory vs traditional
Counterintuitively, the performance advantage of in-memory databases is not due to the fact that they donât need to read from disk. Even a disk-based storage engine may never need to read from disk if you have enough memory, because the operating sysâ tem caches recently used disk blocks in memory anyway. Rather, they can be faster because they can avoid the overheads of encoding in-memory data structures in a form that can be written to disk [44]
OLTP vs OLAP engines
In most OLTP databases, storage is laid out in a row-oriented fashion: all the values from one row of a table are stored next to each other. Document databases are simiâ lar: an entire document is typically stored as one contiguous sequence of bytes
âŚ
The idea behind column-oriented storage is simple: donât store all the values from one row together, but store all the values from each column together instead. If each colâ umn is stored in a separate file, a query only needs to read and parse those columns that are used in that query, which can save a lot of work.
Ch 4 - review
Ch 5 - Replication
Single vs multi-leader replication across datacenters
How single-leader and multi-leader configurations fare in a multi- datacenter deployment:
Performance
In a single-leader configuration, every write must go over the internet to the datacenter with the leader. This can add significant latency to writes and might contravene the purpose of having multiple datacenters in the first place. In a multi-leader configuration, every write can be processed in the local datacenter and is replicated asynchronously to the other datacenters. Thus, the inter- datacenter network delay is hidden from users, which means the perceived perâ formance may be better.
Tolerance of datacenter outages
In a single-leader configuration, if the datacenter with the leader fails, failover can promote a follower in another datacenter to be leader. In a multi-leader conâ figuration, each datacenter can continue operating independently of the others, and replication catches up when the failed datacenter comes back online.
Tolerance of network problems
Traffic between datacenters usually goes over the public internet, which may be less reliable than the local network within a datacenter. A single-leader configuâ ration is very sensitive to problems in this inter-datacenter link, because writes are made synchronously over this link. A multi-leader configuration with asynâ chronous replication can usually tolerate network problems better: a temporary network interruption does not prevent writes being processed.
Conflict resolution for multi-leader
There are various ways of achieving convergent conflict resolution: ⢠Give each write a unique ID (e.g., a timestamp, a long random number, a UUID, or a hash of the key and value), pick the write with the highest ID as the winner, and throw away the other writes. If a timestamp is used, this technique is known as last write wins (LWW). Although this approach is popular, it is dangerously prone to data loss [35]. We will discuss LWW in more detail at the end of this chapter (âDetecting Concurrent Writesâ on page 184). ⢠Give each replica a unique ID, and let writes that originated at a higher- numbered replica always take precedence over writes that originated at a lower- numbered replica. This approach also implies data loss. ⢠Somehow merge the values togetherâe.g., order them alphabetically and then concatenate them (in Figure 5-7, the merged title might be something like âB/Câ). ⢠Record the conflict in an explicit data structure that preserves all information, and write application code that resolves the conflict at some later time (perhaps by prompting the user).
Ch 6 - Transactions
Typical race conditions
Dirty reads
One client reads another clientâs writes before they have been committed. The read committed isolation level and stronger levels prevent dirty reads.
Dirty writes
One client overwrites data that another client has written, but not yet committed. Almost all transaction implementations prevent dirty writes.
Read skew (nonrepeatable reads)
A client sees different parts of the database at different points in time. This issue is most commonly prevented with snapshot isolation, which allows a transaction to read from a consistent snapshot at one point in time. It is usually implemented with multi-version concurrency control (MVCC).
Lost updates
Two clients concurrently perform a read-modify-write cycle. One overwrites the otherâs write without incorporating its changes, so data is lost. Some implemenâ tations of snapshot isolation prevent this anomaly automatically, while others require a manual lock (SELECT FOR UPDATE).
Write skew
A transaction reads something, makes a decision based on the value it saw, and writes the decision to the database. However, by the time the write is made, the premise of the decision is no longer true. Only serializable isolation prevents this anomaly.
Phantom reads
A transaction reads objects that match some search condition. Another client makes a write that affects the results of that search. Snapshot isolation prevents straightforward phantom reads, but phantoms in the context of write skew require special treatment, such as index-range locks.
Concurrency control
several widely used isolation levels, in particular
read committed
snapshot
isolation (sometimes called repeatable read), and
serializable
Approaches to implementing serializable transactions
Literally executing transactions in a serial order
If you can make each transaction very fast to execute, and the transaction throughput is low enough to process on a single CPU core, this is a simple and effective option.
Two-phase locking
For decades this has been the standard way of implementing serializability, but many applications avoid using it because of its performance characteristics.
Serializable snapshot isolation (SSI)
A fairly new algorithm that avoids most of the downsides of the previous approaches. It uses an optimistic approach, allowing transactions to proceed without blocking. When a transaction wants to commit, it is checked, and it is aborted if the execution was not serializable.