[jboss-dev-forums] [Design of Messaging on JBoss (Messaging/JBoss)] - Re: Good article on consistent hashing
ataylor
do-not-reply at jboss.com
Thu Jul 2 07:55:29 EDT 2009
Here's an explanation of the Paxos algorithm and how it fits our scenario.
Basic Paxos is a protocol and we use an instance of this protocol to come to some sort of consensus about which node a group id should be bound to. This means that we would create an instance of the protocol for each group id we need a consensus on.
The protocol will have proposers and acceptors and is broken up into 2 phases, 'prepare and promise' and 'accept and accepted. In our case each node will be both proposers and acceptors.
1st Phase
When a message arrives with a group id set the node receiving the message becomes a proposer.
The proposer will send a 'Prepare(N)' message to all the acceptors carrying the queue it proposes to use for the groupid. The N value is a sequence number for that proposer that is unique across the cluster, i.e. each proposer must choose a sequence number from its own disjointed set. This is used when deciding which proposer should take preference, If an acceptor receives a proposal with a sequence number lower than one it has already receives then it declines that proposal, if it is the highest then the acceptor will return a 'promise' that it will not receive any proposals with a lower sequence number, the 'promise' also contains any previously chosen proposals (queues). The proposer with the highest sequence number we refer to as the leader. A promise will contain the sequence number promised, plus all other sequence numbers promised until that point.
At this point we move into phase 2 where the value is committed, i.e. chosen. But before i do you may have spotted an issue. If the leader suddenly fails then we are stuck waiting for a value to be chosen. Because of this it is possible for a proposer after failing to re propose with a higher sequence number (it knows the current highest sequence number as it was returned via the previous declined proposal). This however can lead to a situation where a consensus may have duelling proposers, i.e. p1 proposes with seq1, p2 with seq2, p1 with seq3 and so on. Typically for this you make 1 proposer back off before retrying to give another proposer the time to commit.
2nd phase
At this point any proposers that have received a majority of promises back from the acceptors can choose a value and commit it by sending an accept message. Note that because proposals may be received by different acceptors in different orders this could happen.
The accept contains the proposers sequence number and the value chosen. The acceptor will accept if the value chosen is the same as received in the initial proposal and the sequence number is the highest agrred to and send an 'accepted' message to the proposer.
If a proposer receives enough accepts to a commit then it deems the value chosen. If it doesn't then the proposer can start again.
There are also 'learners' that basically are just notified of any values chosen. A client would normally create a proposer but receive the value as a learner. we may not need to do this, alternatively we could just receive chosen values back when a proposal fails.
Multi Paxos
This is a variant of Paxos. basically the same instance is used for all consensus decisions, an extra param is sent round containing the instance of the protocol (the groupid we are deciding on). This means that you only need to send a prepare message on the first round. Once a leader is chosen the prepare stage is not needed and you cut down on messages sent. For this to work the leader has to remain stable which typically in our scenarion wouldnt happen, i.e. we want the local node receiving the message always to propose its local queue. however, if we merge the roles of acceptor and proposer the leader will be able to commit on behalf of the initial proposer. i.e. a proposer proposes a new instance of the protocol for a new groupid and sends to all acceptors. The leader will receive this and call accept on its behalf.
Using a single node
Obviously implementing the above is quite complex. A simple solution that Tim suggested is to assign a specific node as a leader. Basically any other node will always ask the leader to make the decision as to which value to choose, the leader holds all the values in a map for future reference. we could do this via some sort of configuration on the server.
thoughts, obviously implementing the latter will be much quicker but less elegant.
PS if it hurts your head, try reading the spec :)
View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4241458#4241458
Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4241458
More information about the jboss-dev-forums
mailing list