Avery’s Law of Distributed Systems Reliability: “Distributed systems are more reliable when you can get a service from one node OR another. They get less reliable when a service depends on one node AND another. And the numbers combine multiplicatively, so the more nodes you have, the faster it drops off.”
Lamport’s Observation: “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.”
Both sound simple enough, even obvious for systems serving HTTP(S) or caching.
But apply those to clustered and distributed databases, and you can do powerful analysis on expected availability.
Topo | Nodes | Database | Configuration | Reads during single node failure | Writes without a failure | Writes during single node failure |
---|---|---|---|---|---|---|
WAN | 2 | MySQL | Master-Slave Async Replication | OR | Single Master | NA |
WAN | 2 | MySQL | Master-Master (standby) Async Replication | OR | OR (Single Active) | OR (Single Active) |
LAN | 2 | MySQL | Master-Slave Semi-sync Replication | OR | Single Master | NA |
LAN | 2/4 | MySQL CGE NDB Cluster | Sync Replication | OR | OR | NA |
LAN | 3 (Min.) | MySQL Galera | Sync Replication | OR | OR | Requires quorum for OR |
WAN | 3 (Min.) | Cassandra | Appropriate RF and CL | OR | OR | OR |
acm.org: Turing Award to Leslie Lamport