Est. Reading time:
[WIP]: This post is a work in progress as of now. [TODO]: update date
FLP - Impossibility of Distributed Consensus with One Faulty Process
In this post I go over the FLP paper in the most simplified way possible, this post is in the making for weeks. I've tried my best to not let any proof be just "assumed" albeit I have failed in some places. This post is the result of me going through the free version of the paper multiple times and other online resources which I link down in the last section of this post. In the same section I've also linked my annotated version of the free version of the paper.
DISCLAIMER: I'm no expert wrt Distributed systems or theory of the same. FLP paper proved to quite difficult for me to understand, where proof of
lemmas
in the paper felt like smoke and mirrors at most times. This post serves its purpose to only help you gain an intuition into the paper.
Preface:
Let's first begin by defining what this paper aims to prove and on what existing solution does this prove this fault
.
- The problem we consider here is for
reliable
processes to accept on a binary value. We. - This problem arises if the problem on accepting on a binary value is done entirely asynchronously. [This is a very critical and must condition for this proof]. These nodes do not have access to a synchronised clocks, hence algorithms like paxos are out of the picture. To quote AI :
The Paxos algorithm uses clocks to ensure liveliness and to avoid the impossibility of reaching consensus with a single faulty process.
- We do not consider byzantine faults in the nodes. All nodes behave as they should.
With the above restriction, the faults are now only possible in a state where our algorithm cannot decide one on value consistently.
Network
The nodes are connected by a network, we assume this network to be reliable but not timely
. By that we mean that a message will be delivered as long as there are infinite attempts to receive the message. The message will be delivered and exactly once, but we have no promises on when the message will be delivered.
Defining Consensus
In this section let's define nodes and different components that are part of our consensus algorithm.
Nodes :
- Every node starts with a initial value of {0,1} (Read as zero or one)
- Every node is designed like an automata. In one step, a process (nodes are often called process throughout this paper) can :
- attempt to receive message
- perform local compute based on the value it received (If received that is)
- send a finite set of messages to other processes.
System :
- Each node starts with an internal state.
- The start state of a systems is each node with its internal state and an empty message buffer.
- Each state can transition from one to another where only one process (node) can make some progress
- These states are called
Configurations
. - There are two steps, first one being receive(p) where m belongs M union {phi}
- Depending on p's internal state and m from the message buffer
- These states are called