]
RH Bugzilla Integration commented on ISPN-4484:
-----------------------------------------------
William Burns <wburns(a)redhat.com> changed the Status of [bug
Outbound transfers can be cancelled by old CANCEL_STATE_TRANSFER
command
------------------------------------------------------------------------
Key: ISPN-4484
URL:
https://issues.jboss.org/browse/ISPN-4484
Project: Infinispan
Issue Type: Bug
Security Level: Public(Everyone can see)
Components: Core, State Transfer
Affects Versions: 6.0.2.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Critical
Fix For: 7.0.0.Alpha5
This appeared during the 32-nodes elasticity test in the Hyperion environment.
Just as apex947 left, it started a rebalance, which apex948 dutifully cancelled as it
became the new coordinator. apex949 had already requested segments from apex959, so it
sent a StateRequestCommand(CANCEL_STATE_TRANSFER) asynchronously to apex959. Then apex948
started a new rebalance, and apex949 asked apex959 for the same segments. When apex959
finally received the cancel request, it didn't check the topology id and it
incorrectly cancelled the outbound transfer to apex949.
The solution would be to verify the topology id in the CANCEL_STATE_TRANSFER command
before cancelling the transfer. I also think we can avoid sending the cancel command
completely in this case, and only send it as we are about to stop.