Skip to content

huachaohuang/awesome-dbdev

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Database Development

Database development is interesting and challenging. You can always find interesting things to learn and challenging problems to solve. You need to make a lot of things right to build a reliable and high-performance database. And it takes time, a lot of time, to think and practice. I have spent ten years working with databases. However, as the proverb goes, the more I know, the more I realize I don't know. So, I collect these awesome database development materials here to review them from time to time. I think it will be helpful to those who share the same interests as me.

Storage Device

Media

Interface

Operating System

Kernel

File system

  • ext4 Data Structures and Algorithms

  • The Design and Implementation of a Log-Structured File System (1991)

    This paper presents a new technique for disk storage management called a log-structured file system. A log- structured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery.

  • SFS: Random Write Considered Harmful in Solid State Drives (FAST, 2012)

    In this paper, we propose a new file system for SSDs, SFS. First, SFS exploits the maximum write bandwidth of SSD by taking a log-structured approach. SFS transforms all random writes at file system level to sequential ones at SSD level. Second, SFS takes a new data grouping strategy on writing, instead of the existing data separation strategy on segment cleaning. It puts the data blocks with similar update likelihood into the same segment. This minimizes the inevitable segment cleaning overhead in any log-structured file system by allowing the segments to form a sharp bimodal distribution of segment utilization.

Modern hardware

  • What Every Programmer Should Know About Memory (2007)

    This paper explains the structure of memory subsystems in use on modern commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them.

  • What Every Systems Programmer Should Know About Concurrency (2018)

    Seasoned programmers are familiar with tools like mutexes, semaphores, and condition variables. But what makes them work? How do we write concurrent code when we can’t use them, like when we’re working below the operating system in an embedded environment, or when we can’t block due to hard time constraints? And since your system transforms your code into things you didn’t write, running in orders you never asked for, how do multithreaded programs work at all? Concurrency — especially on modern hardware — is a complicated and unintuitive topic, but let’s try to cover some fundamentals.

Storage virtualization

Storage Engine

Database Optimizer

Papers

Links

Database Transaction

Papers

  • Granularity of Locks and Degrees of Consistency in a Shared Data Base (IBM, 1975)

    The first part of this paper introduces a locking protocol that allows simultaneous locking at various granularities in a database with a hierarchical structure. The second part of this paper introduces four degrees of consistency and the relationships of the four degrees to the locking protocol.

  • The Notion of Consistency and Predicate Locks in a Database System (IBM, 1976)

    This paper proofs that two-phase locking (2PL) guarantees serializability and introduces predicate locks to address the problem of phantom reads.

  • A Critique of ANSI SQL Isolation Levels (SIGMOD, 1995)

    This paper analyzes the ambiguities of ANSI isolation levels and provides clearer phenomena definitions. It also presents a new MVCC isolation level called snapshot isolation. A transaction in snapshot isolation reads data from a snapshot of the committed data as of the time the transaction started, and checks for write-write conflicts.

  • Generalized Isolation Level Definitions (ICDE, 2000)

    This paper proposes a graph-based approach to define existing ANSI isolation levels.

  • Serializable Isolation for Snapshot Databases (SIGMOD, 2008)

    This paper presents an algorithm to achieve serializable snapshot isolation based on anti-dependencies detection.

  • A Critique of Snapshot Isolation (EuroSys, 2012)

    This paper presents a new MVCC isolation level called write-snapshot isolation. A transaction in write-snapshot isolation checks for read-write conflicts instead of write-write conflicts in snapshot isolation.

Distributed Transaction

Papers

Books

Links

Distributed Algorithm

Theorem

Papers

Links

Consensus

Papers

  • Paxos Made Simple (Lamport, 2001)

    The Paxos algorithm, when presented in plain English, is very simple.

  • Paxos Made Live - An Engineering Perspective (PODC, 2007)

    This paper presents the experience of building Chubby, a fault-tolerant storage system using the Paxos consensus algorithm.

  • There Is More Consensus in Egalitarian Parliaments (SOSP, 2013)

    This paper presents the design and implementation of Egalitarian Paxos (EPaxos), a new distributed consensus algorithm based on Paxos that achieves uniform load balancing across all replicas.

  • Paxos Quorum Leases: Fast Reads Without Sacrificing Writes (SOCC, 2014)

    This paper presents quorum leases, a technique that allows Paxos-based systems to perform consistent local reads on multiple replicas.

  • In Search of an Understandable Consensus Algorithm (USENIX, 2014)

    This paper presents Raft, a consensus algorithm for managing a replicated log. Raft produces a result equivalent to Paxos, and it is as efficient as Paxos, but its structure is different from Paxos. Raft is more understandable than Paxos and also provides a better foundation for building practical systems.

Links

Consistency

Papers

Links

Replication

Papers

Distributed System

Papers

OLTP Database

Papers

Links

OLAP Database

Papers

Books

Miscellaneous

Papers

Books

Links

About

Awesome materials about database development.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published