On 3/15/12 11:29 AM, Dan Berindei wrote:
That was basically what we did in the blocking design: the ST
commands
could execute during ST, but regular commands would block until the
end of the ST. With async caches, that meant we would use JGroups' 1
queue per sender (so not a global queue, but close).
The problem was not with the regular commands that arrived after the
start of the ST, but with the commands that had already started
executing when ST started. This is the classic example:
1. A prepare command for Tx1 locks k1 on node A
2. A prepare command for Tx2 tries to acquire lock k1 on node A
3. State transfer starts up and blocks all write commands
4. The Tx1 commit command, which will unlock k1, arrives but can't run
until state transfer has ended
5. The Tx2 prepare command times out on the lock acquisition after 10
seconds (by default)
6. State transfer can can now proceed and push or receive data.
7. The Tx1 commit can now run and unlock k1. It's too late for Tx2, however.
The solution I had in mind for the old design was to add some kind of
deadlock detection to the LockManager and throw a
StateTransferInProgress when a deadlock with the state transfer is
detected.
OK. I don't like the old design, as ST has to wait until all pending TXs
(those with locks held) have to commit before we can make progress. If
the lock acquition timeout is high, we'll have to wait for a long time.
With the new design I thought it would be simpler to not acquire a
big
lock for the entire duration of the write command that would prevent
state transfer. Instead I would acquire different locks for much
shorter amounts of time, and at the beginning of each lock acquisition
we would just check that the command's view id is still the correct
one.
OK. Perhaps an overview of the new design in the document is warranted.
There's a section on transfer of CacheEntries and one on locks, but I
didn't see a combined discussion. Perhaps an example like the one above
would be good ?
I now realize how much simpler the use of total order is here: since all
updates in a cluster happen in total order, we don't need to acquire
locks in 1 phase and release them in another phase. ST is then just
another update, inserted at a certain place in the stream of updates.
I assume the Cloud-TM guys don't do state transfer in their prototype,
or do they ? Pedro ? If not, then there needs to be an implementation of
ST for TO.
Cheers,
--
Bela Ban, JGroups lead (
http://www.jgroups.org)