[jboss-user] [JBoss Cache] Document updated/added: "JBoss Cache Non-Blocking State Transfer"

Mon Feb 22 10:30:25 EST 2010

User development,

The document "JBoss Cache Non-Blocking State Transfer", was updated Feb 22, 2010
by Manik Surtani.

To view the document, visit:
http://community.jboss.org/docs/DOC-10275#cf

Document:
--------------------------------------------------------------
h2. Non-Blocking State Transfe
* Please use http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4112547#4112547 to discuss ideas/enhancements and provide comments and feedback.
* JIRA: http://jira.jboss.org/jira/browse/JBCACHE-1236

h3.  What we have now

Right now, JBoss Cache uses http://www.jgroups.org/javagroupsnew/docs/manual/html/protlist.html#d0e3916 to make sure any in-flight messages are received and prevent any more messages from being broadcast so that a stable state can be transferred.

While this provides a high level of data consistency, it is a brute force method that will not scale when there is a large amount of state to be transferred, since it effectively +freezes+ the cluster during the state transfer period.

h3.  An alternative - *Non-Blocking State Transfer*

The alternative discussed here attempts to:

* provide state to a new node joining a cluster or a partition quickly and effectively
* provide consistency guarantees
* not hold up the cluster at all so the rest of the cluster can proceed as usual even while state is being transferred

This new approach would need MVCC locking to be implemented first since non-blocking reads is necessary.  It also assumes idempotency of cache updates provided they are applied in the correct order.

h4. Assumptions
1. Non-blocking reads are available (MVCC)
2. Modifications are idempotent
3. Streaming state transfer is present in the JGroups stack (To provide an open stream between the 2 instances)
h4.  Approach
Assume a 3-instance cluster, containing instances +A+, +B+, and +C+.  Instance +D+ joins the cluster.

+All nodes track pending prepares since startup.  This additional overhead means that whenever a transaction enters its prepare phase it is recorded in a concurrent collection and when the transaction commits or rolls back it is removed from this concurrent collection.
+

1. D asks A for state, and starts responding to all 1 and 2 phase prepares/commits positively, but doesn't log any transactions.
2. A starts logging all transactions and non-transactional writes
3. A starts sending transient and persistent state to D.  This does not block on anything.
4. D applies state.
5. A starts sending the transaction log to D
6. A continues to write the transaction log until the log is either empty, or progress is no longer being made.
7. Lack of progress occurs when the log size is repeatedly not reduced after writing
8. A waits for pending incoming and outgoing requests to complete and suspends new ones
9. A sends a marker indicating the need to stop all modifications on A and D.

10. D receives the marker and unicasts a StateTransferControlCommand to A.
10.1. On receipt of this command, A closes a latch that prevents its RPC dispatcher from sending or receiving any commands.
10.2. D too closes a similar latch on its RPC dispatcher
10.3. Note that this latch does NOT block StateTransferControlCommands
12. These latches guarantee that other transactions originating at B or C will block in their communications to A or D until the latches are released.
13. D retrieves and applies the final transaction log, which should no longer be growing
14. D retrieves and applies all non-committed prepares
15. A sends a marker indicating transmission is complete
16. A resumes processing of incoming / outgoing requests
17. D unicasts another StateTransferControlCommand to A.
17.1. This releases latches on A
17.2. D also releases similar latches on D
19. D sets it's cache status to *STARTED*.
h4.  Transaction Log

This is a data structure that will contain an ordered list of:

   public static class LogEntry
   {
      private final GlobalTransaction transaction;
      private final List<WriteCommand> modifications;
   }

The receiving node will apply this log by starting transactions using the given gtx, applying the modifications, and commit the transaction.

h4.  Capturing the transaction log

It is imperative that the transaction log is captured in the order in which locks are acquired/transactions completed.  As such, in the Synchronization on the state sender (A), the transaction is added in afterCompletion. In addition all non-committed prepares must be kept in a table indexed by gtx. Once the gtx has completed, it is removed from the table.

h3.  Idempotency

Idempotency is a requirement since it is feasible that state read may or may not include a given update.  As such, all transactions recorded during the state generation process will have to be re-applied.  Still, this isn't a problem - even with node deletions, creation or moving - provided the transaction log is replayed in +exactly+ the same order as it was applied on the node generating state.

h4.  Benefits

* Cluster continues operations, and is not held up

h4.  Drawbacks

* D may take longer to join as it would need to replay a transaction log after acquiring state

h4.  Assumptions/Requirements

* MVCC is in place to provide efficient non-blocking READ on the state provider
* Cache updates are idempotent

--------------------------------------------------------------