]
Gustavo Fernandes updated ISPN-4777:
------------------------------------
Affects Version/s: 7.2.4.Final
(was: 5.2.6.Final)
Replace command not atomic in REPL_SYNC cache mode
--------------------------------------------------
Key: ISPN-4777
URL:
https://issues.jboss.org/browse/ISPN-4777
Project: Infinispan
Issue Type: Bug
Affects Versions: 7.2.4.Final
Reporter: Anuj Shah
Assignee: Gustavo Fernandes
Attachments: ReaderLockerTest.java
This problem was discovered using the Lucene InfinispanDirectory with
DistributedSegmentReadLocker. We found after a while of production usage that some Lucene
files were randomly removed from the caches, but remained in the file listing entry, which
resulted in an unusable index.
We managed to replicate the problem in a test that acquires and releases read lock
concurrently and checks for file deletion. We found this fails quickly when using
REPL_SYNC mode, but runs for a while with DIST_SYNC.
Some extra logging indicated that the replace command used to increment the lock counter
across multiple cluster members, results in an single increment when called concurrently,
with both calls reporting success. This eventually causes the file deletion, as we have
now mis-counted the number of readers. We also observed the opposite effect of the counter
only decrementing by one when releasing.
Our conclusion is that the replace command fails atomicity when in REPL_SYNC mode, but
works in other modes, we tried DIST_SYNC, DIST_ASYNC and REPL_ASYNC.