[infinispan-issues] [JBoss JIRA] (ISPN-3742) StateTransferReplicationQueueTest random failures

Dan Berindei (JIRA) jira-events at lists.jboss.org
Thu Nov 28 04:05:05 EST 2013


    [ https://issues.jboss.org/browse/ISPN-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927275#comment-12927275 ] 

Dan Berindei commented on ISPN-3742:
------------------------------------

Ok, there seem to be 2 different ways in which the test can fail, because the test uses implicit transactions and every put() and remove() is in a separate transaction.

1. Because the test thread starts checking the values before the background thread finished replicating the commands, it's possible that the main thread will check a value right after the put transaction was executed on the 2nd node but before the remove.
2. Because the test uses async marshalling, it's actually possible for the put and the remove to be executed on the 2nd node in the wrong order (ISPN-2939). That will cause an inconsistency and cause the test to fail.

{noformat}
23:58:10,237 TRACE (WriterThread:nbst-replqueue) [RpcManagerImpl] NodeA-20050 invoking PrepareCommand {modifications=[PutKeyValueCommand{key=test7076, value=org.infinispan.statetransfer.StateTransferReplicationQueueTest$PojoValue at 1bc3, flags=null, putIfAbsent=false, metadata=EmbeddedMetadata{version=null}, successful=true, ignorePreviousValue=true}], onePhaseCommit=true, gtx=GlobalTransaction:<NodeA-20050>:66103:local, cacheName='nbst-replqueue', topologyId=4} to recipient list [NodeA-20050, NodeC-33979] with options RpcOptions{timeout=15000, unit=MILLISECONDS, fifoOrder=true, totalOrder=false, responseFilter=null, responseMode=ASYNCHRONOUS, skipReplicationQueue=false}
23:58:10,237 TRACE (WriterThread:nbst-replqueue) [ReadCommittedEntry] Updating entry (key=test7076 removed=false valid=true changed=true created=true loaded=false value=org.infinispan.statetransfer.StateTransferReplicationQueueTest$PojoValue at 1bc3 metadata=EmbeddedMetadata{version=null}, providedMetadata=null)
23:58:10,257 TRACE (Scheduled-replicationQueue-thread-2759:nbst-replqueue) [RpcManagerImpl] NodeA-20050 invoking MultipleRpcCommand{commands=[32 x PrepareCommand, PrepareCommand {modifications=[PutKeyValueCommand{key=test7076, value=org.infinispan.statetransfer.StateTransferReplicationQueueTest$PojoValue at 1bc3, flags=null, putIfAbsent=false, metadata=EmbeddedMetadata{version=null}, successful=true, ignorePreviousValue=true}], onePhaseCommit=true, gtx=GlobalTransaction:<NodeA-20050>:66103:local, cacheName='nbst-replqueue', topologyId=4}], cacheName='nbst-replqueue'} to single recipient NodeC-33979 with response mode GET_NONE
23:58:10,259 TRACE (asyncTransportThread-5,NodeA:) [CommandAwareRpcDispatcher] Replication task sending MultipleRpcCommand{commands=[PrepareCommand {modifications=[32 x PrepareCommand, PrepareCommand {modifications=[PutKeyValueCommand{key=test7076, value=org.infinispan.statetransfer.StateTransferReplicationQueueTest$PojoValue at 1bc3, flags=null, putIfAbsent=false, metadata=EmbeddedMetadata{version=null}, successful=true, ignorePreviousValue=true}], onePhaseCommit=true, gtx=GlobalTransaction:<NodeA-20050>:66103:local, cacheName='nbst-replqueue', topologyId=4}], cacheName='nbst-replqueue'} to single recipient NodeC-33979 with response mode GET_NONE
23:58:10,263 TRACE (WriterThread:nbst-replqueue) [RpcManagerImpl] NodeA-20050 invoking PrepareCommand {modifications=[RemoveCommand{key=test7076, value=null, flags=null, ignorePreviousValue=true}], onePhaseCommit=true, gtx=GlobalTransaction:<NodeA-20050>:66104:local, cacheName='nbst-replqueue', topologyId=4} to recipient list [NodeA-20050, NodeC-33979] with options RpcOptions{timeout=15000, unit=MILLISECONDS, fifoOrder=true, totalOrder=false, responseFilter=null, responseMode=ASYNCHRONOUS, skipReplicationQueue=false}
23:58:10,263 TRACE (WriterThread:nbst-replqueue) [ReadCommittedEntry] Updating entry (key=test7076 removed=true valid=false changed=true created=false loaded=false value=org.infinispan.statetransfer.StateTransferReplicationQueueTest$PojoValue at 1bc3 metadata=EmbeddedMetadata{version=null}, providedMetadata=null)
23:58:10,359 TRACE (Scheduled-replicationQueue-thread-2759:nbst-replqueue) [RpcManagerImpl] NodeA-20050 invoking MultipleRpcCommand{commands=[PrepareCommand {modifications=[RemoveCommand{key=test7076, value=null, flags=null, ignorePreviousValue=true}], onePhaseCommit=true, gtx=GlobalTransaction:<NodeA-20050>:66104:local, cacheName='nbst-replqueue', topologyId=4}, 27 x PrepareCommand], cacheName='nbst-replqueue'} to recipient list null with options RpcOptions{timeout=15000, unit=MILLISECONDS, fifoOrder=true, totalOrder=false, responseFilter=null, responseMode=ASYNCHRONOUS, skipReplicationQueue=true}
23:58:10,360 TRACE (asyncTransportThread-1,NodeA:) [CommandAwareRpcDispatcher] Replication task sending MultipleRpcCommand{commands=[PrepareCommand {modifications=[RemoveCommand{key=test7076, value=null, flags=null, ignorePreviousValue=true}], onePhaseCommit=true, gtx=GlobalTransaction:<NodeA-20050>:66104:local, cacheName='nbst-replqueue', topologyId=4}, 27 x PrepareCommand], cacheName='nbst-replqueue'} to single recipient NodeC-33979 with response mode GET_NONE
23:58:10,361 TRACE (asyncTransportThread-1,NodeA:) [CommandAwareRpcDispatcher] Response: null
23:58:15,261 TRACE (asyncTransportThread-5,NodeA:) [CommandAwareRpcDispatcher] Response: null
23:59:17,428 TRACE (Incoming-1,NodeC-33979:) [PrepareCommand] Invoking remotely originated prepare: PrepareCommand {modifications=[RemoveCommand{key=test7076, value=null, flags=null, ignorePreviousValue=true}], onePhaseCommit=true, gtx=GlobalTransaction:<NodeA-20050>:66104:local, cacheName='nbst-replqueue', topologyId=4} with invocation context: org.infinispan.context.impl.RemoteTxInvocationContext at 3a27fd82
23:59:17,428 TRACE (Incoming-1,NodeC-33979:nbst-replqueue) [ReadCommittedEntry] Updating entry (key=test7076 removed=true valid=false changed=true created=true loaded=false value=null metadata=null, providedMetadata=null)
23:59:17,715 TRACE (Incoming-1,NodeC-33979:) [PrepareCommand] Invoking remotely originated prepare: PrepareCommand {modifications=[PutKeyValueCommand{key=test7076, value=org.infinispan.statetransfer.StateTransferReplicationQueueTest$PojoValue at 1bc3, flags=null, putIfAbsent=false, metadata=EmbeddedMetadata{version=null}, successful=true, ignorePreviousValue=true}], onePhaseCommit=true, gtx=GlobalTransaction:<NodeA-20050>:66103:local, cacheName='nbst-replqueue', topologyId=4} with invocation context: org.infinispan.context.impl.RemoteTxInvocationContext at 3a27fd81
23:59:17,716 TRACE (Incoming-1,NodeC-33979:nbst-replqueue) [ReadCommittedEntry] Updating entry (key=test7076 removed=false valid=true changed=true created=true loaded=false value=org.infinispan.statetransfer.StateTransferReplicationQueueTest$PojoValue at 1bc3 metadata=EmbeddedMetadata{version=null}, providedMetadata=null)
00:00:00,914 TRACE (testng-StateTransferReplicationQueueTest:nbst-replqueue) [CallInterceptor] Executing command: GetKeyValueCommand {key=test7076, flags=null}.
00:00:00,914 TRACE (testng-StateTransferReplicationQueueTest:nbst-replqueue) [GetKeyValueCommand] Found value org.infinispan.statetransfer.StateTransferReplicationQueueTest$PojoValue at 1bc3
{noformat}
                
> StateTransferReplicationQueueTest random failures
> -------------------------------------------------
>
>                 Key: ISPN-3742
>                 URL: https://issues.jboss.org/browse/ISPN-3742
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Test Suite
>    Affects Versions: 6.0.0.Final
>            Reporter: Dan Berindei
>            Assignee: Mircea Markus
>              Labels: testsuite_stability
>
> The writer thread in StateTransferReplicationQueueTest is too fast, and sometimes it generates so many writes that it takes more than 5 seconds for the second node to apply all of them. 
> We should use a much smaller limit for the replication queue, and the test should wait until all the commands have been replicated instead of using an arbitrary timeout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the infinispan-issues mailing list