]
Gustavo Fernandes updated ISPN-6350:
------------------------------------
Status: Open (was: New)
Data race in the ShardIndexManager under topology changes
---------------------------------------------------------
Key: ISPN-6350
URL:
https://issues.jboss.org/browse/ISPN-6350
Project: Infinispan
Issue Type: Bug
Components: Embedded Querying
Affects Versions: 8.2.0.Final
Reporter: Gustavo Fernandes
Assignee: Gustavo Fernandes
Labels: affinity
The following example data race can cause unrecoverable errors during indexing:
\[node1\] cache.put(key) // key maps to segment 48, owned by node1
\[node1\] starts shard 48
\[node1\] acquires lock on shard 48
\[node1\] starts writing to the index
\[node1\] notification of topology changed, lock released on shard 48
\[node1\] lock reacquired (still writing to the index)
\[node1\] commit on shard 48
\[node1\] shard still locked
\[node2\] cache.put(key) // Node2 now owns segment 48
\[node2\] starts shard 48
\[node2\] tries to acquire the lock on shard 48
\[node2\] fail (lock still owned by node1)
The current mechanism employed by the {{ShardIndexManager}} during topology changes
involves using a listener and closing the IndexWriter on all nodes upon ownership changes,
so that the lock is released and can be reacquired by the new owner (1 segment maps to 1
shard).
Since writing to a shard can take some time, the listener can be triggered in the middle
of an index operation and the closing of the index writer will have a very short duration
because it is sudden reacquired, and not released anymore.