Here's an explanation of the Paxos algorithm and how it fits our scenario.
Basic Paxos is a protocol and we use an instance of this protocol to come to some sort of
consensus about which node a group id should be bound to. This means that we would create
an instance of the protocol for each group id we need a consensus on.
The protocol will have proposers and acceptors and is broken up into 2 phases,
'prepare and promise' and 'accept and accepted. In our case each node will be
both proposers and acceptors.
1st Phase
When a message arrives with a group id set the node receiving the message becomes a
proposer.
The proposer will send a 'Prepare(N)' message to all the acceptors carrying the
queue it proposes to use for the groupid. The N value is a sequence number for that
proposer that is unique across the cluster, i.e. each proposer must choose a sequence
number from its own disjointed set. This is used when deciding which proposer should take
preference, If an acceptor receives a proposal with a sequence number lower than one it
has already receives then it declines that proposal, if it is the highest then the
acceptor will return a 'promise' that it will not receive any proposals with a
lower sequence number, the 'promise' also contains any previously chosen proposals
(queues). The proposer with the highest sequence number we refer to as the leader. A
promise will contain the sequence number promised, plus all other sequence numbers
promised until that point.
At this point we move into phase 2 where the value is committed, i.e. chosen. But before i
do you may have spotted an issue. If the leader suddenly fails then we are stuck waiting
for a value to be chosen. Because of this it is possible for a proposer after failing to
re propose with a higher sequence number (it knows the current highest sequence number as
it was returned via the previous declined proposal). This however can lead to a situation
where a consensus may have duelling proposers, i.e. p1 proposes with seq1, p2 with seq2,
p1 with seq3 and so on. Typically for this you make 1 proposer back off before retrying to
give another proposer the time to commit.
2nd phase
At this point any proposers that have received a majority of promises back from the
acceptors can choose a value and commit it by sending an accept message. Note that because
proposals may be received by different acceptors in different orders this could happen.
The accept contains the proposers sequence number and the value chosen. The acceptor will
accept if the value chosen is the same as received in the initial proposal and the
sequence number is the highest agrred to and send an 'accepted' message to the
proposer.
If a proposer receives enough accepts to a commit then it deems the value chosen. If it
doesn't then the proposer can start again.
There are also 'learners' that basically are just notified of any values chosen. A
client would normally create a proposer but receive the value as a learner. we may not
need to do this, alternatively we could just receive chosen values back when a proposal
fails.
Multi Paxos
This is a variant of Paxos. basically the same instance is used for all consensus
decisions, an extra param is sent round containing the instance of the protocol (the
groupid we are deciding on). This means that you only need to send a prepare message on
the first round. Once a leader is chosen the prepare stage is not needed and you cut down
on messages sent. For this to work the leader has to remain stable which typically in our
scenarion wouldnt happen, i.e. we want the local node receiving the message always to
propose its local queue. however, if we merge the roles of acceptor and proposer the
leader will be able to commit on behalf of the initial proposer. i.e. a proposer proposes
a new instance of the protocol for a new groupid and sends to all acceptors. The leader
will receive this and call accept on its behalf.
Using a single node
Obviously implementing the above is quite complex. A simple solution that Tim suggested is
to assign a specific node as a leader. Basically any other node will always ask the leader
to make the decision as to which value to choose, the leader holds all the values in a map
for future reference. we could do this via some sort of configuration on the server.
thoughts, obviously implementing the latter will be much quicker but less elegant.
PS if it hurts your head, try reading the spec :)
View the original post :
http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4241458#...
Reply to the post :
http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&a...