[
https://issues.jboss.org/browse/ISPN-5307?page=com.atlassian.jira.plugin....
]
Carsten Lohmann updated ISPN-5307:
----------------------------------
Description:
In our application we had the problem that under high load a {{Cache#remove(key)}}
invocation returned the old value _although_ the same key was already removed on another
node a few milliseconds before.
This occurred while another node was shutdown - that node happened to be the coordinator
and also the primary owner of the key here.
The cache is a distributed, DB-persisted, transactional (locking mode Pessimistic) cache
with numOwners=1.
I could reproduce such an issue with the attached unit test (tested on ISPN 7.1).
Scenario:
- MultipleCacheManagersTest with 3 nodes
- Cache is distributed, persisted, transactional (locking mode Pessimistic) with numOwners
1 or 2 (see below; {{getTestCacheNumOwners()}} in the test)
- a number of test keys get inserted; only keys with the coordinator as primary owner get
used for the test
- while the keys get removed one-by-one and concurrently by 2 threads...
- ... the coordinator node gets stopped
At the end, the return values of the {{remove()}} invocations get checked.
{{remove()}} invocations that threw an error (SuspectException) are ignored.
*The following problems occurred on some test runs*:
# the {{remove(key)}} invocations on both nodes returned {{null}} and the entry was still
there in the end
#* => I would have expected an exception here as {{remove(key)}} essentially failed
# the {{remove(key)}} invocations on both nodes returned {{null}} but the entry was
actually removed
#* => wrong return value of first invocation
# the {{remove(key)}} invocations on both nodes returned both the old value
#* => wrong return value of second invocation
These issues occur not always but quite often with the test.
When testing with {{numOwners=2}}, I only got the last of the above issues - in the test
with {{numOwners=1}} that occurred only rarely.
Attached are log files from tests with {{numOwners=1}}
("org.infinispan.transaction" log level set to TRACE):
- ~40% of test runs produced:
issue 3 ("2 remove() invocations returned the old value for key 'k4'")
- occurred rarely:
issue 2 ("remove() did actually remove the key 'k4', but the return value was
null")
- didn't occur with "org.infinispan.transaction" log level set to TRACE; but
otherwise quite often:
issue 1 ("remove() of key 'k4' did not succeed, but we got no exception
(entry still there; value: 'v')")
was:
In our application we had the problem that under high load a {{Cache#remove(key)}}
invocation returned the old value _although_ the same key was already removed on another
node a few milliseconds before.
This occurred while another node was shutdown - that node happened to be the coordinator
and also the primary owner of the key here.
The cache is a distributed, DB-persisted, transactional (locking mode Pessimistic) cache
with numOwners=1.
I could reproduce such an issue with the attached unit test (tested on ISPN 7.1).
Scenario:
- MultipleCacheManagersTest with 3 nodes
- Cache is distributed, persisted, transactional (locking mode Pessimistic) with numOwners
1 or 2 (see below; {{getTestCacheNumOwners()}} in the test)
- a number of test keys get inserted; only keys with the coordinator as primary owner get
used for the test
- while the keys get removed one-by-one and concurrently by 2 threads...
- ... the coordinator node gets stopped
At the end, the return values of the {{remove()}} invocations get checked.
{{remove()}} invocations that threw an error (SuspectException) are ignored.
*The following problems occurred on some test runs*:
# the {{remove(key)}} invocations on both nodes returned {{null}} and the entry was still
there in the end
#* => I would have expected an exception here as {{remove(key)}} essentially failed
# the {{remove(key)}} invocations on both nodes returned {{null}} but the entry was
actually removed
#* => wrong return value of first invocation
# the {{remove(key)}} invocations on both nodes returned both the old value
#* => wrong return value of second invocation
These issues occur not always but quite often with the test.
When testing with {{numOwners=2}}, I only got the last of the above issues - in the test
with {{numOwners=1}} that occurred only rarely.
Attached are log files from tests with {{numOwners=1}}
("org.infinispan.transaction" log level set to TRACE):
- ~40% of test runs produced issue 3 ("2 remove() invocations returned the old value
for key 'k4'")
- occurred rarely: issue 2 ("remove() did actually remove the key 'k4', but
the return value was null)
- didn't occur with "org.infinispan.transaction" log level set to TRACE; but
otherwise quite often: issue 1 ("remove() of key 'k4' did not succeed, but we
got no exception (entry still there; value: 'v')")
Cache#remove(key) not behaving correctly under high load - while
primary owner node is stopped
----------------------------------------------------------------------------------------------
Key: ISPN-5307
URL:
https://issues.jboss.org/browse/ISPN-5307
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 7.1.1.Final
Reporter: Carsten Lohmann
Attachments: RemoveEntryWhilePrimaryOwnerStopsTest.java,
RemoveEntryWhileSingleOwnerStopsTest_2RemoveOpsReturnedNonNullValue.log,
RemoveEntryWhileSingleOwnerStopsTest_RemoveFailButNoException.log,
RemoveEntryWhileSingleOwnerStopsTest_RemoveOkButReturnedNull.log
In our application we had the problem that under high load a {{Cache#remove(key)}}
invocation returned the old value _although_ the same key was already removed on another
node a few milliseconds before.
This occurred while another node was shutdown - that node happened to be the coordinator
and also the primary owner of the key here.
The cache is a distributed, DB-persisted, transactional (locking mode Pessimistic) cache
with numOwners=1.
I could reproduce such an issue with the attached unit test (tested on ISPN 7.1).
Scenario:
- MultipleCacheManagersTest with 3 nodes
- Cache is distributed, persisted, transactional (locking mode Pessimistic) with
numOwners 1 or 2 (see below; {{getTestCacheNumOwners()}} in the test)
- a number of test keys get inserted; only keys with the coordinator as primary owner get
used for the test
- while the keys get removed one-by-one and concurrently by 2 threads...
- ... the coordinator node gets stopped
At the end, the return values of the {{remove()}} invocations get checked.
{{remove()}} invocations that threw an error (SuspectException) are ignored.
*The following problems occurred on some test runs*:
# the {{remove(key)}} invocations on both nodes returned {{null}} and the entry was still
there in the end
#* => I would have expected an exception here as {{remove(key)}} essentially failed
# the {{remove(key)}} invocations on both nodes returned {{null}} but the entry was
actually removed
#* => wrong return value of first invocation
# the {{remove(key)}} invocations on both nodes returned both the old value
#* => wrong return value of second invocation
These issues occur not always but quite often with the test.
When testing with {{numOwners=2}}, I only got the last of the above issues - in the test
with {{numOwners=1}} that occurred only rarely.
Attached are log files from tests with {{numOwners=1}}
("org.infinispan.transaction" log level set to TRACE):
- ~40% of test runs produced:
issue 3 ("2 remove() invocations returned the old value for key 'k4'")
- occurred rarely:
issue 2 ("remove() did actually remove the key 'k4', but the return value
was null")
- didn't occur with "org.infinispan.transaction" log level set to TRACE;
but otherwise quite often:
issue 1 ("remove() of key 'k4' did not succeed, but we got no exception
(entry still there; value: 'v')")
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)