]
Vladimir Blagojevic commented on ISPN-4575:
-------------------------------------------
[~dan.berindei] Can you share a code snippet on how to wait for all the members to finish
joining in a particular command?
Map/Reduce incorrect results with a non-shared non-tx intermediate
cache
------------------------------------------------------------------------
Key: ISPN-4575
URL:
https://issues.jboss.org/browse/ISPN-4575
Project: Infinispan
Issue Type: Bug
Security Level: Public(Everyone can see)
Components: Core, Distributed Execution and Map/Reduce
Affects Versions: 7.0.0.Alpha5
Reporter: Dan Berindei
Assignee: Vladimir Blagojevic
Priority: Blocker
Labels: testsuite_stability
Fix For: 7.0.0.Beta1
In a non-tx cache, if a command is started with topology id {{T}}, and when it is
replicated on another node the distribution interceptor sees topology {{T+1}}, it throws
an {{OutdatedTopologyException}}. The originator of the command will then retry the
command, setting topology {{T+1}}.
When this happens with a {{PutKeyValueCommand(k, MapReduceManagerImpl.DeltaAwareList)}},
it can lead to duplicate intermediate values.
Say _A_ is the primary owner of {{k}} in {{T}}, _B_ is a backup owner both in {{T}} and
{{T+1}}, and _C_ is the backup owner in {{T}} and the primary owner in {{T+1}} (i.e. _C_
just joined and a rebalance is in progress during {{T}} - see
{{NonTxBackupOwnerBecomingPrimaryOwnerTest}}).
_A_ starts the {{PutKeyValueCommand}} and replicates it to _B_ and _C_. _C_ applies the
command, but _B_ already has topology {{T+1}} and throws an {{OutdatedTopologyException}}.
_A_ installs topology {{T+1}}, sends the command to _C_ (as the new primary owner), which
replicates it to _B_ and then applies it locally a second time.
This scenario can happen during a M/R task even without nodes joining or leaving.
That's because {{CreateCacheCommand}} only calls {{getCache()}} on each member, it
doesn't wait for the cache to have a certain number of members or for state transfer
to be complete for all the members. The last member to join the intermediate cache is
guaranteed to have topology {{T+1}}, but the others may have topology {{T}} by the time
the combine phase starts inserting values in the intermediate cache.
I have seen the {{OutdatedTopologyException}} happen pretty often during the test suite,
especially after I removed the duplicate {{invokeRemotely}} call in
{{MapReduceTask.executeTaskInit()}}. Most of them were harmless, but there was one failure
in CI:
http://ci.infinispan.org/viewLog.html?buildId=9811&tab=buildResultsDi...
A short-term fix would be to wait for all the members to finish joining in
{{CreateCacheCommand}}. Long-term, M/R tasks should be resilient to topology changes, so
we should investigate making {{PutKeyValue(k, DeltaAwareList)}} handle
{{OutdatedTopologyException}} s.