Dan Berindei created ISPN-12679:
-----------------------------------
Summary: AsyncBackupExpirationTest.testExpiredAccess random failures
Key: ISPN-12679
URL:
https://issues.redhat.com/browse/ISPN-12679
Project: Infinispan
Issue Type: Bug
Components: Core, Cross-Site Replication, Expiration, Test Suite
Affects Versions: 12.0.0.Final
Reporter: Dan Berindei
Fix For: 12.1.0.Final
Sometimes the backup {{IracPutKeyValueCommand}} is delayed on NodeD and it stores the
value only after the test advanced the time service and invokes the get operation
that's supposed to expire the entry, causing the entry expiration to fail in the
{{NYC}} site.
First {{IracPutKeyValueCommand}} arrives at NodeD and locks the key:
{noformat}
15:59:07,907 TRACE (jgroups-7,Test-NodeB:[]) [InvocationContextInterceptor] Invoked with
command PrepareCommand
{modifications=[PutKeyValueCommand{key=MagicKey{7FD/E1260FE1/147@Test-NodeB}, value=v,
flags=[], commandInvocationId=CommandInvocation:local:0, ...
15:59:07,910 TRACE (jgroups-5,bridge-org.infinispan.xsite.Test,_Test-NodeC:NYC-2:[])
[InvocationContextInterceptor] Invoked with command
IracPutKeyValueCommand{key=MagicKey{7FD/E1260FE1/147@Test-NodeB}, value=v,
metadata=EmbeddedLifespanExpirableMetadata{lifespan=1000, version=null},
privateMetadata=PrivateMetadata{iracMetadata=IracMetadata{site='LON-1',
version=(LON-1=(6:1))}, entryVersion=null}, successful=true,
commandInvocationId=CommandInvocation:Test-NodeC:45679} and InvocationContext
[SingleKeyNonTxInvocationContext{isLocked=false, key=null, cacheEntry=null, origin=null,
lockOwner=CommandInvocation:Test-NodeC:45679}]
15:59:07,912 TRACE (jgroups-8,Test-NodeD:[]) [InvocationContextInterceptor] Invoked with
command IracPutKeyValueCommand{key=MagicKey{7FD/E1260FE1/147@Test-NodeB}, value=v,
metadata=EmbeddedLifespanExpirableMetadata{lifespan=1000, version=null},
privateMetadata=PrivateMetadata{iracMetadata=IracMetadata{site='LON-1',
version=(LON-1=(6:1))}, entryVersion=null}, successful=true,
commandInvocationId=CommandInvocation:Test-NodeC:45679} and InvocationContext
[SingleKeyNonTxInvocationContext{isLocked=false, key=null, cacheEntry=null,
origin=Test-NodeC, lockOwner=CommandInvocation:Test-NodeC:45679}]
15:59:07,912 TRACE (jgroups-8,Test-NodeD:[]) [DefaultLockManager] Lock
key=MagicKey{7FD/E1260FE1/147@Test-NodeB} for owner=CommandInvocation:Test-NodeC:45679.
timeout=10000 (MILLISECONDS)
15:59:07,912 TRACE (jgroups-8,Test-NodeD:[]) [InfinispanLock]
LockPlaceHolder{lockState=ACQUIRED, owner=CommandInvocation:Test-NodeC:45679} successfully
acquired the lock.
{noformat}
Then {{RemoteExpiredCommand}} arrives at NodeD, tries to lock the key, and fails:
{noformat}
15:59:07,917 TRACE (testng-Test:[]) [InvocationContextInterceptor] Invoked with command
RemoveExpiredCommand{key=MagicKey{7FD/E1260FE1/147@Test-NodeB}, value=v, lifespan=1000,
maxIde=false, internalMetadata=null} and InvocationContext
[SingleKeyNonTxInvocationContext{isLocked=false, key=null, cacheEntry=null, origin=null,
lockOwner=CommandInvocation:local:0}]
15:59:07,918 TRACE (jgroups-7,Test-NodeD:[]) [InvocationContextInterceptor] Invoked with
command RemoveExpiredCommand{key=MagicKey{7FD/E1260FE1/147@Test-NodeB}, value=v,
lifespan=1000, maxIde=false, internalMetadata=null} and InvocationContext
[SingleKeyNonTxInvocationContext{isLocked=false, key=null, cacheEntry=null,
origin=Test-NodeC, lockOwner=CommandInvocation:local:0}]
15:59:07,918 TRACE (jgroups-7,Test-NodeD:[]) [DefaultLockManager] Lock
key=MagicKey{7FD/E1260FE1/147@Test-NodeB} for owner=CommandInvocation:local:0. timeout=0
(MILLISECONDS)
15:59:07,918 TRACE (jgroups-7,Test-NodeD:[]) [InfinispanLock] Unable to acquire. Lock is
held.
{noformat}
Next {{IracPutKeyValueCommand}} stores the value and releases the lock:
{noformat}
15:59:07,919 TRACE (jgroups-8,Test-NodeD:[]) [AbstractInternalDataContainer] Store
MagicKey{7FD/E1260FE1/147@Test-NodeB}=MortalCacheEntry{key=MagicKey{7FD/E1260FE1/147@Test-NodeB},
value=v, internalMetadata=PrivateMetadata{iracMetadata=IracMetadata{site='LON-1',
version=(LON-1=(6:1))}, entryVersion=null}, created=1002000, lifespan=1000} in container
15:59:07,919 TRACE (jgroups-8,Test-NodeD:[]) [DefaultLockManager] Release locks for
keys=[MagicKey{7FD/E1260FE1/147@Test-NodeB}]. owner=CommandInvocation:Test-NodeC:45679
{noformat}
But it's too late for the {{RemoveExpiredCommand}}, which gives up:
{noformat}
15:59:07,920 ERROR (jgroups-7,Test-NodeD:[]) [InvocationContextInterceptor] ISPN000136:
Error executing command RemoveExpiredCommand on Cache 'defaultcache', writing keys
[MagicKey{7FD/E1260FE1/147@Test-NodeB}]
org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after
0 milliseconds for key MagicKey{7FD/E1260FE1/147@Test-NodeB} and requestor
CommandInvocation:local:0. Lock is held by null
at
org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:291)
~[classes/:?]
at
org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:221)
~[classes/:?]
at
org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:439)
~[classes/:?]
at
org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.toInvocationStage(InfinispanLock.java:411)
~[classes/:?]
at
org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:251)
~[classes/:?]
at
org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:275)
~[classes/:?]
at
org.infinispan.interceptors.locking.AbstractLockingInterceptor.lockAndRecord(AbstractLockingInterceptor.java:287)
~[classes/:?]
at
org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitNonTxDataWriteCommand(AbstractLockingInterceptor.java:137)
~[classes/:?]
at
org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitRemoveExpiredCommand(AbstractLockingInterceptor.java:107)
~[classes/:?]
at
org.infinispan.commands.write.RemoveExpiredCommand.acceptVisitor(RemoveExpiredCommand.java:46)
~[classes/:?]
at
org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:59)
~[classes/:?]
at
org.infinispan.interceptors.impl.TxInterceptor.visitRemoveExpiredCommand(TxInterceptor.java:238)
~[classes/:?]
15:59:07,926 TRACE (jgroups-7,Test-NodeD:[]) [CallInterceptor] Cannot remove entry due to
it not being expired - this can be caused by different clocks on nodes or a concurrent
write
15:59:07,926 TRACE (jgroups-7,Test-NodeD:[]) [BaseDistributionInterceptor] Skipping the
replication of the conditional command as it did not succeed on primary owner
(RemoveExpiredCommand{key=MagicKey{7FD/E1260FE1/147@Test-NodeB}, value=v, lifespan=1000,
maxIde=false, internalMetadata=null}).
{noformat}
And because the entry was not expired, the test fails:
{noformat}
15:59:07,929 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed:
org.infinispan.xsite.AsyncBackupExpirationTest.testExpiredAccess[LON=PESSIMISTIC_TX,
NYC=OPTIMISTIC_TX_RC]([true, false])
java.lang.AssertionError:
at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.14.3.jar:?]
at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) ~[testng-6.14.3.jar:?]
at org.testng.AssertJUnit.assertNull(AssertJUnit.java:282) ~[testng-6.14.3.jar:?]
at org.testng.AssertJUnit.assertNull(AssertJUnit.java:274) ~[testng-6.14.3.jar:?]
at
org.infinispan.xsite.AsyncBackupExpirationTest.testExpiredAccess(AsyncBackupExpirationTest.java:182)
~[test-classes/:?]
{noformat}
I believe this can happen in production code just as in the test, if the entry lifespan
was shorter.
So we shouldn't just add a delay in the test, we should make the
{{RemoveExpiredCommand}} wait for the {{IracPutKeyValueCommand}} to finish writing the
value and release the lock.
--
This message was sent by Atlassian Jira
(v8.13.1#813001)