[infinispan-issues] [JBoss JIRA] (ISPN-5307) Cache#remove(key) not behaving correctly under high load - while primary owner node is stopped

Carsten Lohmann (JIRA) issues at jboss.org
Tue Mar 17 09:40:20 EDT 2015


     [ https://issues.jboss.org/browse/ISPN-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carsten Lohmann updated ISPN-5307:
----------------------------------
    Description: 
In our application we had the problem that under high load a {{Cache#remove(key)}} invocation returned the old value _although_ the same key was already removed on another node a few milliseconds before.
This occurred while another node was shutdown - that node happened to be the coordinator and also the primary owner of the key here.
The cache is a distributed, DB-persisted, transactional (locking mode Pessimistic) cache with numOwners=1.
 
I could reproduce such an issue with the attached unit test (tested on ISPN 7.1).
Scenario:
- MultipleCacheManagersTest with 3 nodes
- Cache is distributed, persisted, transactional (locking mode Pessimistic) with numOwners 1 or 2 (see below; {{getTestCacheNumOwners()}} in the test)
- a number of test keys get inserted; only keys with the coordinator as primary owner get used for the test
- while the keys get removed one-by-one and concurrently by 2 threads...
- ... the coordinator node gets stopped
 
At the end, the return values of the {{remove()}} invocations get checked.
{{remove()}} invocations that threw an error (SuspectException) are ignored.
 
*The following problems occurred on some test runs*:
# the {{remove(key)}} invocations on both nodes returned {{null}} and the entry was still there in the end
#* => I would have expected an exception here as {{remove(key)}} essentially failed
# the {{remove(key)}} invocations on both nodes returned {{null}} but the entry was actually removed
#* => wrong return value of first invocation
# the {{remove(key)}} invocations on both nodes returned both the old value
#* => wrong return value of second invocation
 
These issues occur not always but quite often with the test.
When testing with {{numOwners=2}}, I only got the last of the above issues - in the test with {{numOwners=1}} that occurred only rarely.

Attached are log files from tests with {{numOwners=1}}
("org.infinispan.transaction" log level set to TRACE):
- ~40% of test runs produced: 
issue 3 ("2 remove() invocations returned the old value for key 'k4'")
- occurred rarely: 
issue 2 ("remove() did actually remove the key 'k4', but the return value was null")
- didn't occur with "org.infinispan.transaction" log level set to TRACE; but otherwise quite often: 
issue 1 ("remove() of key 'k4' did not succeed, but we got no exception (entry still there; value: 'v')")

  was:
In our application we had the problem that under high load a {{Cache#remove(key)}} invocation returned the old value _although_ the same key was already removed on another node a few milliseconds before.
This occurred while another node was shutdown - that node happened to be the coordinator and also the primary owner of the key here.
The cache is a distributed, DB-persisted, transactional (locking mode Pessimistic) cache with numOwners=1.
 
I could reproduce such an issue with the attached unit test (tested on ISPN 7.1).
Scenario:
- MultipleCacheManagersTest with 3 nodes
- Cache is distributed, persisted, transactional (locking mode Pessimistic) with numOwners 1 or 2 (see below; {{getTestCacheNumOwners()}} in the test)
- a number of test keys get inserted; only keys with the coordinator as primary owner get used for the test
- while the keys get removed one-by-one and concurrently by 2 threads...
- ... the coordinator node gets stopped
 
At the end, the return values of the {{remove()}} invocations get checked.
{{remove()}} invocations that threw an error (SuspectException) are ignored.
 
*The following problems occurred on some test runs*:
# the {{remove(key)}} invocations on both nodes returned {{null}} and the entry was still there in the end
#* => I would have expected an exception here as {{remove(key)}} essentially failed
# the {{remove(key)}} invocations on both nodes returned {{null}} but the entry was actually removed
#* => wrong return value of first invocation
# the {{remove(key)}} invocations on both nodes returned both the old value
#* => wrong return value of second invocation
 
These issues occur not always but quite often with the test.
When testing with {{numOwners=2}}, I only got the last of the above issues - in the test with {{numOwners=1}} that occurred only rarely.

Attached are log files from tests with {{numOwners=1}}
("org.infinispan.transaction" log level set to TRACE):
- ~40% of test runs produced issue 3 ("2 remove() invocations returned the old value for key 'k4'")
- occurred rarely: issue 2 ("remove() did actually remove the key 'k4', but the return value was null)
- didn't occur with "org.infinispan.transaction" log level set to TRACE; but otherwise quite often: issue 1 ("remove() of key 'k4' did not succeed, but we got no exception (entry still there; value: 'v')")



> Cache#remove(key) not behaving correctly under high load - while primary owner node is stopped
> ----------------------------------------------------------------------------------------------
>
>                 Key: ISPN-5307
>                 URL: https://issues.jboss.org/browse/ISPN-5307
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 7.1.1.Final
>            Reporter: Carsten Lohmann
>         Attachments: RemoveEntryWhilePrimaryOwnerStopsTest.java, RemoveEntryWhileSingleOwnerStopsTest_2RemoveOpsReturnedNonNullValue.log, RemoveEntryWhileSingleOwnerStopsTest_RemoveFailButNoException.log, RemoveEntryWhileSingleOwnerStopsTest_RemoveOkButReturnedNull.log
>
>
> In our application we had the problem that under high load a {{Cache#remove(key)}} invocation returned the old value _although_ the same key was already removed on another node a few milliseconds before.
> This occurred while another node was shutdown - that node happened to be the coordinator and also the primary owner of the key here.
> The cache is a distributed, DB-persisted, transactional (locking mode Pessimistic) cache with numOwners=1.
>  
> I could reproduce such an issue with the attached unit test (tested on ISPN 7.1).
> Scenario:
> - MultipleCacheManagersTest with 3 nodes
> - Cache is distributed, persisted, transactional (locking mode Pessimistic) with numOwners 1 or 2 (see below; {{getTestCacheNumOwners()}} in the test)
> - a number of test keys get inserted; only keys with the coordinator as primary owner get used for the test
> - while the keys get removed one-by-one and concurrently by 2 threads...
> - ... the coordinator node gets stopped
>  
> At the end, the return values of the {{remove()}} invocations get checked.
> {{remove()}} invocations that threw an error (SuspectException) are ignored.
>  
> *The following problems occurred on some test runs*:
> # the {{remove(key)}} invocations on both nodes returned {{null}} and the entry was still there in the end
> #* => I would have expected an exception here as {{remove(key)}} essentially failed
> # the {{remove(key)}} invocations on both nodes returned {{null}} but the entry was actually removed
> #* => wrong return value of first invocation
> # the {{remove(key)}} invocations on both nodes returned both the old value
> #* => wrong return value of second invocation
>  
> These issues occur not always but quite often with the test.
> When testing with {{numOwners=2}}, I only got the last of the above issues - in the test with {{numOwners=1}} that occurred only rarely.
> Attached are log files from tests with {{numOwners=1}}
> ("org.infinispan.transaction" log level set to TRACE):
> - ~40% of test runs produced: 
> issue 3 ("2 remove() invocations returned the old value for key 'k4'")
> - occurred rarely: 
> issue 2 ("remove() did actually remove the key 'k4', but the return value was null")
> - didn't occur with "org.infinispan.transaction" log level set to TRACE; but otherwise quite often: 
> issue 1 ("remove() of key 'k4' did not succeed, but we got no exception (entry still there; value: 'v')")



--
This message was sent by Atlassian JIRA
(v6.3.11#6341)


More information about the infinispan-issues mailing list