Why Majority Rules — And Why That’s a Problem
Raft keeps a replicated log consistent across a cluster by requiring a quorum — a majority of nodes — to agree before any entry is committed. In a five-node cluster, three nodes must respond. In a three-node cluster, two must. Drop below that threshold and the system stops accepting writes entirely. That is not a bug; it is the guarantee. Without a majority, Raft cannot be certain that two different leaders won’t commit conflicting entries, so it refuses to commit anything at all.
The problem is that real infrastructure does not fail one node at a time. Cloud providers schedule maintenance windows that take down entire availability zones. Rolling upgrades pull nodes offline in sequence, and if the upgrade stalls mid-way, the cluster sits in a degraded state waiting for humans to intervene. Network partitions split clusters along unpredictable boundaries. In any of these scenarios, a five-node cluster can lose two nodes simultaneously — perfectly survivable on paper — but if the timing is bad and a third node is already rebooting, writes stop. Production systems running etcd, CockroachDB, or TiKV all face this ceiling.
The elegance of Raft is what made it famous. Diego Ongaro designed it explicitly to be understandable, publishing it as a direct response to the opacity of Paxos. That clarity made Raft the consensus algorithm of choice for a generation of distributed databases and coordination services. But most explanations of Raft celebrate the clean decomposition into leader election, log replication, and safety without dwelling on what happens at the edges of that safety guarantee.
The majority requirement is not a tunable parameter. It is load-bearing. Shrink the quorum and you risk split-brain — two leaders, two diverging logs, corrupted state. The engineering community largely accepted this trade-off and moved on, treating quorum loss as an operational problem to be solved with better hardware redundancy or faster node recovery. That assumption held until someone asked whether the math behind a card game offered a different way to think about overlap guarantees entirely.
The Wacky Modification: Running Raft on a Minority
Raft has one non-negotiable rule: a cluster of n nodes requires at least ⌊n/2⌋ + 1 nodes to agree before committing anything. That majority quorum is the entire basis for Raft’s safety guarantee. Break it, and you risk two leaders writing conflicting entries to what is supposed to be a single consistent log. Systems like etcd, CockroachDB, and TiKV all rely on this assumption baked deep into the protocol.
The modification described here flips that assumption on its head. A cluster can make progress even when the active nodes fall below majority — no waiting for downed machines to recover, no halting the system until quorum is restored. This is the core claim, and it sounds immediately dangerous.
The safety net is a precise mathematical constraint on which nodes are active. Not every minority subset qualifies. The specific nodes that are online must satisfy an intersection property borrowed from combinatorial design theory — the same math that makes the card game Spot It! (also sold as Dobble) work. In Spot It!, every pair of cards shares exactly one symbol, guaranteed by a finite projective plane construction. That guaranteed overlap is what the modified Raft protocol demands from its active node subsets: any two valid minority quorums must share at least one node, preserving the overlap that prevents conflicting commits.
The authors are direct about the scope. They call the modification “wacky” — their word — and frame it as an experimental extension rather than a replacement for standard Raft. The core protocol machinery for leader election, log replication, and safety remains intact. What changes is the quorum definition, scoped tightly enough that the safety argument still holds when the mathematical constraint is satisfied.
This is not a production specification. It is a proof-of-concept showing that the majority rule is not the only path to consensus safety, provided the system can verify that the active minority meets the required structural conditions. Whether operators can reliably enforce those conditions in a real deployment is a separate question the experiment deliberately leaves open.
Enter Spot It!: The Card Game Behind the Math
Spot It! — sold as Dobble across Europe — is a card game built on a single, elegant constraint: any two cards in the deck share exactly one symbol, no more, no less. The deck ships with 55 cards, each printed with 8 symbols drawn from a pool of 57, and the geometry holding it together is a projective plane of order 7. In a projective plane, every pair of lines intersects at exactly one point. Map symbols to points and cards to lines, and the game’s entire structure falls out of that one mathematical fact.
That same structure is what makes the modified Raft protocol tick. In standard Raft, a quorum requires a strict majority of nodes — in a five-node cluster, three nodes must agree before any decision is final. The whole point of that majority rule is to guarantee overlap: any two majorities share at least one node, so the new leader always inherits the previous term’s committed entries. The modification asks what happens if you allow minority groups to act as quorums, provided those groups are chosen carefully. The answer is that they can, as long as any two valid minority sets are guaranteed to share at least one node. That is the Spot It! constraint, translated directly into distributed systems.
The insight didn’t originate in consensus protocol research. It came from recreational mathematics — specifically from the well-documented combinatorics behind a children’s card game. An engineer working on quorum design recognized that the “exactly one intersection” property solving a party game also solves the safety requirement at the heart of log replication. Cross-domain thinking, not a deeper reading of the Raft paper, opened the door.
Most technical coverage of distributed systems reaches for Byzantine fault tolerance or Paxos variants when explaining protocol innovation. The Spot It! connection cuts in from a completely different direction, and that is precisely why it works. Projective planes were not invented for computer science. They were studied for their own geometric beauty, and a card game made them famous at kitchen tables. The fact that the same combinatorics can keep a distributed log consistent with fewer active nodes than the rulebook demands is the kind of result that only appears when someone is paying attention to mathematics outside their own field.
How the Constraint Actually Works in Practice
The safety guarantee hinges on a single geometric rule: every two valid quorum subsets must share at least one node. This overlap requirement prevents split-brain. If two groups of nodes could each independently elect a leader without sharing a member, those leaders could diverge and write conflicting log entries with no node aware of both. The intersection property makes that scenario structurally impossible — any two quorums that could each reach a leadership decision must contain at least one common node that would resolve the conflict.
The projective plane construction translates that abstract rule into a concrete build list. Take a projective plane of order n, where n must be a prime power (2, 3, 4, 5, 7, and so on). Each point in the plane becomes a node; each line becomes a legal quorum subset. A projective plane of order n produces n² + n + 1 total nodes and guarantees that any two lines share exactly one point — which is precisely the overlap property you need. Order 2 gives you a 7-node cluster where valid quorums contain 3 nodes. Order 3 gives you a 13-node cluster with 4-node quorums. The math scales predictably, but only at those specific cluster sizes. A 10-node or 11-node cluster has no valid projective-plane construction, so the technique simply does not apply.
The cost of operating at a minority is pre-planned structure. Standard Raft lets any majority of nodes form a quorum dynamically — operators add or remove nodes and the protocol adapts. Minority quorums require operators to define upfront which specific subsets are permitted to elect a leader. An unapproved combination of nodes, even a healthy majority, cannot make progress if it does not appear on the pre-authorized list. That rigidity demands careful capacity planning and complicates failure recovery. If the wrong set of nodes goes down and no approved quorum remains reachable, the cluster stalls regardless of how many individual nodes are still running. Engineers adopting this approach trade operational simplicity for the ability to survive on fewer active participants — a worthwhile exchange in some environments, and a maintenance burden in others.
What This Really Means for Distributed Systems Builders
The Spot It! modification to Raft forces a concrete engineering choice: swap a simple, universal rule — majority quorum — for a structured minority arrangement that demands careful upfront planning about which nodes exist and how they map to projective plane geometry. That tradeoff pays off in high-availability environments where node loss is frequent and predictable, such as rolling deployments or hardware fleets with known failure rates. The operational complexity is real, but so is the availability gain.
The idea also surfaces a question the distributed systems field has largely left sitting: should consensus protocols like Raft, Paxos, or etcd ship with pluggable quorum strategies rather than hard-coded majority rules? Right now, the majority assumption is baked deep into these protocols. Engineers who want different behavior rewrite the protocol or work around it. A pluggable quorum interface — where majority is the default but structured minority configurations are available — would let teams match quorum logic to their actual failure models without forking core infrastructure.
The most compelling near-term applications are geo-distributed clusters and edge computing deployments. In a cluster spread across three continents or running on IoT hardware at the network edge, getting a majority of nodes to respond within a useful latency window can be physically impossible — the speed of light is not a configuration option. A minority quorum strategy that guarantees overlap through structure, rather than through raw node count, directly addresses that constraint.
The approach is not production-ready as described. The constraints on which minority subsets are valid are strict, the operational tooling does not yet exist, and failure modes outside the valid configurations need deeper formal analysis. But the direction is clear enough to point future work.
The broader takeaway cuts across the entire field. The mathematical foundation for this Raft modification came from a children’s card game sold in toy stores. That is not a curiosity — it is a signal. The next meaningful advance in fault-tolerance design may be waiting in a puzzle, a combinatorics textbook, or a board game mechanic that no one has thought to apply yet. Engineers who only read engineering papers will miss it.