High Availability Distributed Services

Quorum and Gerrymandering : Thoughts on managing failover in a distributed HA system

By | High Availability Distributed Services | No Comments

Defining Quorum

Quorum is defined as the number of voters necessary to carry an election or vote.  We would like to ensure that all services carry on functioning  under various failure mechanisms. We need to ensure that our high availability computer systems have access to valid authoritative data at all times.

The Problem

We have found some failure cases where everything still works, except the Raft based HA services.  The reason is that there are not enough members of the electorate available to form a consensus. Read More