[infinispan-issues] [JBoss JIRA] (ISPN-4777) Replace command not atomic in REPL_SYNC cache mode
Gustavo Fernandes (JIRA)
issues at jboss.org
Thu Aug 27 08:59:44 EDT 2015
[ https://issues.jboss.org/browse/ISPN-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102535#comment-13102535 ]
Gustavo Fernandes edited comment on ISPN-4777 at 8/27/15 8:58 AM:
------------------------------------------------------------------
Hi Anuj, thanks for providing the testcase!
Unfortunately I found an issue with the supplied test: it basically uses a {{DistributedSegmentReadLocker}} to acquire a lock in a file that has size 0 and bufferSize 10, and after that uses the same {{DistributedSegmentReadLocker}} to release this lock. The trouble is that path is never really followed during normal Lucene Directory execution, since the file having size 0 is less than the buffer size of 10, so it is not eligible to be broken into chunks: so no read lock will be ever acquired/released.
But since the test does artificially acquires the lock, when the call to {{deleteOrReleaseReadLock()}} happens, the Lucene directory will ALWAYS delete it because the file is single chunked, and the test always fail regardless of REPL/DIST single/multiple threads.
Anyway, I pushed an updated test at https://github.com/gustavonalle/infinispan/commit/e6a2ccd93fc60250d3a51495ad8017c8c454a929 on top of 7.2.x branch and been trying to reproduce the issue with it.
was (Author: gustavonalle):
Hi Anuj, thanks for providing the testcase!
Unfortunately I found an issue with the supplied test: it basically uses a {{DistributedSegmentReadLocker}} to acquire a lock in a file that has size 0 and bufferSize 10, and after that uses the same {{DistributedSegmentReadLocker}} to release this lock. The trouble is that path is never really followed during normal Lucene Directory execution, since the file having size 0 is less than the buffer size of 10, so it is not eligible to be broken into chunks: so no read lock will be ever acquired/released.
But since the test does artificially acquires the lock, when the call to {{deleteOrReleaseReadLock()}} happens, the Lucene directory will ALWAYS delete it because the file is single chunked,
and the test always fail regardless of REPL/DIST single/multiple threads.
Anyway, I pushed an updated test at https://github.com/gustavonalle/infinispan/commit/e6a2ccd93fc60250d3a51495ad8017c8c454a929 on top of 7.2.x branch and been trying to reproduce the issue with it.
> Replace command not atomic in REPL_SYNC cache mode
> --------------------------------------------------
>
> Key: ISPN-4777
> URL: https://issues.jboss.org/browse/ISPN-4777
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 7.2.4.Final
> Reporter: Anuj Shah
> Assignee: Gustavo Fernandes
> Attachments: ReaderLockerTest.java
>
>
> This problem was discovered using the Lucene InfinispanDirectory with DistributedSegmentReadLocker. We found after a while of production usage that some Lucene files were randomly removed from the caches, but remained in the file listing entry, which resulted in an unusable index.
> We managed to replicate the problem in a test that acquires and releases read lock concurrently and checks for file deletion. We found this fails quickly when using REPL_SYNC mode, but runs for a while with DIST_SYNC.
> Some extra logging indicated that the replace command used to increment the lock counter across multiple cluster members, results in an single increment when called concurrently, with both calls reporting success. This eventually causes the file deletion, as we have now mis-counted the number of readers. We also observed the opposite effect of the counter only decrementing by one when releasing.
> Our conclusion is that the replace command fails atomicity when in REPL_SYNC mode, but works in other modes, we tried DIST_SYNC, DIST_ASYNC and REPL_ASYNC.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
More information about the infinispan-issues
mailing list