2025

January

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-issues May 2015

infinispan-issues@lists.jboss.org

1 participants
819 discussions

Start a nNew thread

[JBoss JIRA] (ISPN-5454) XSite: RetryMechanismTest random failures

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-5454?page=com.atlassian.jira.plugin.... ] Dan Berindei updated ISPN-5454: ------------------------------- Status: Open (was: New) > XSite: RetryMechanismTest random failures > ----------------------------------------- > > Key: ISPN-5454 > URL: https://issues.jboss.org/browse/ISPN-5454 > Project: Infinispan > Issue Type: Bug > Components: Cross-Site Replication > Affects Versions: 7.2.1.Final > Reporter: Dan Berindei > Priority: Blocker > Labels: testsuite_stability > Fix For: 8.0.0.Alpha1 > > > {{ClusteredCacheBackupReceiver.awaitRemoteTask()}} doesn't respect the state push command's timeout, at least when it's smaller than the sync replication timeout in the target cache. When that happens, the state provider will resend the state, and there will be 2 state push commands executing at the same time. > RetryMechanismTest changes the state push timeout to 2 seconds, but the sync replication timeout stays at 15 seconds. This causes failures in {{testRetryLocally}} and {{testFailRetryLocally}}, if it takes more than 2 seconds to suspect the killed node. > {noformat} > 10:02:13,007 TRACE (asyncTransportThread-8,NodeN:) [RetryOnFailureXSiteCommand] Sending XSiteStatePushCommand{cacheName=___defaultcache, timeout=2000 (1 keys)} to [NYC (sync, timeout=2000)] > 10:02:16,008 TRACE (asyncTransportThread-8,NodeN:) [RetryOnFailureXSiteCommand] Sending XSiteStatePushCommand{cacheName=___defaultcache, timeout=2000 (1 keys)} to [NYC (sync, timeout=2000)] > 10:02:16,040 TRACE (asyncTransportThread-4,NodeP:) [RpcManagerImpl] replication exception: > org.infinispan.remoting.transport.jgroups.SuspectException: Node NodeQ-56809 was suspected > 10:02:16,040 TRACE (asyncTransportThread-0,NodeP:) [RpcManagerImpl] replication exception: > org.infinispan.remoting.transport.jgroups.SuspectException: Node NodeQ-56809 was suspected > 10:02:19,147 ERROR (testng-RetryMechanismTest:) [UnitTestTestNGListener] Test testFailRetryLocally(org.infinispan.xsite.statetransfer.failures.RetryMechanismTest) failed. > java.lang.AssertionError: expected:<2> but was:<3> > at org.testng.AssertJUnit.fail(AssertJUnit.java:59) > at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364) > at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80) > at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:245) > at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:252) > at org.infinispan.xsite.statetransfer.failures.RetryMechanismTest.testFailRetryLocally(RetryMechanismTest.java:227) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)

9 years, 8 months

1
0
0 / 0

[JBoss JIRA] (ISPN-5274) Inconsistent data after transaction rollback (with success on originator)

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-5274?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-5274: ----------------------------------------------- Dan Berindei <dberinde(a)redhat.com> changed the Status of [bug 1199490|https://bugzilla.redhat.com/show_bug.cgi?id=1199490] from ASSIGNED to POST > Inconsistent data after transaction rollback (with success on originator) > ------------------------------------------------------------------------- > > Key: ISPN-5274 > URL: https://issues.jboss.org/browse/ISPN-5274 > Project: Infinispan > Issue Type: Bug > Components: Core > Affects Versions: 7.1.1.Final > Reporter: Matej Čimbora > Assignee: Dan Berindei > Fix For: 7.2.0.Final > > > Scenario > Nodes edg-perf[10-13], partition handling on > 1.Transaction is started on edg-perf10 > {code} > 10:24:48,228 TRACE [org.infinispan.transaction.xa.TransactionXaAdapter] (DefaultStressor-6) start called on tx GlobalTransaction:<edg-perf10-20667>:38910:local > {code} > 2. Value of key_000000000000065B is updated within the transaction > {code} > 10:24:48,405 TRACE [org.radargun.service.InfinispanOperations$Cache] (DefaultStressor-6) PUT cache=testCache key=key_000000000000065B value=[6 #15: 296, 501, 1109, 1119, 1459, 1544, 1999, 2083, 2130, 2257, 2298, 2784, 2941, 3009, 3147, ] > {code} > 3. Transaction is successfully prepared on edg-perf11 & edg-perf12 > {code} > 10:24:48,559 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (DefaultStressor-6) Responses: [sender=edg-perf11-61837, received=true, suspected=false] > [sender=edg-perf12-7305, received=true, suspected=false] > {code} > 4. Transaction commit is issued > {code} > 10:24:48,562 TRACE [org.infinispan.transaction.TransactionCoordinator] (DefaultStressor-6) Committing transaction GlobalTransaction:<edg-perf10-20667>:38910:local > {code} > 5. Other participating nodes (edg-perf11 & edg-perf12) are suspected... > {code} > 10:24:52,705 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (DefaultStressor-6) Target node edg-perf11-61837 left during remote call, ignoring > 10:24:52,716 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (DefaultStressor-6) Target node edg-perf12-7305 left during remote call, ignoring > {code} > ... as they received a new view (without edg-perf10) meanwhile. > {code} > 10:24:48,547 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-1,edg-perf12-7305) ISPN000093: Received new, MERGED cluster view: MergeView::[edg-perf12-7305|20] (3) [edg-perf12-7305, edg-perf11-61837, edg-perf13-25187], 2 subgroups: [edg-perf13-25187|8] (1) [edg-perf13-25187], [edg-perf13-25187|19] (1) [edg-perf13-25187] > {code} > Still, the transaction is commited on edg-perf10 & updated entry is stored locally > {code} > 10:24:52,894 TRACE [org.infinispan.statetransfer.CommitManager] (DefaultStressor-6) Trying to commit. Key=key_000000000000065B. Operation Flag=null, L1 invalidation=false > 10:24:52,896 TRACE [org.infinispan.statetransfer.CommitManager] (DefaultStressor-6) Committing key=key_000000000000065B. It is a L1 invalidation or a normal put and no tracking is enabled! > 10:24:52,908 TRACE [org.infinispan.container.DefaultDataContainer] (DefaultStressor-6) Creating new ICE for writing. Existing=ImmortalCacheEntry{key=key_000000000000065B, value=[6 #14: 296, 501, 1109, 1119, 1459, 1544, 1999, 2083, 2130, 2257, 2298, 2784, 2941, 3009, ]}, metadata=EmbeddedMetadata{version=null}, new value=[6 #15: 296, 501, 1109, 1119, 1459, 1544, 1999, 2083, 2130, 2257, 2298, 2784, 2941, 3009, 3147, ] > {code} > 6. Other nodes rollback the transaction > {code} > 10:24:50,376 DEBUG [org.infinispan.transaction.TransactionTable] (transport-thread-10) Rolling back transaction GlobalTransaction:<edg-perf10-20667>:38910:remote because originator edg-perf10-20667 left the cluster > {code} > 7. edg-perf10 receives a new view, containing nodes edg-perf[10,11,13]. Incoming state transfer overwrites the updated value > {code} > 10:25:09,614 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,edg-perf10-20667) ISPN000093: Received new, MERGED cluster view: MergeView::[edg-perf10-20667|22] (3) [edg-perf10-20667, edg-perf11-61837, edg-perf13-25187], 4 subgroups: [edg-perf10-20667|15] (2) [edg-perf10-20667, edg-perf11-61837], [edg-perf10-20667|19] (3) [edg-perf10-20667, edg-perf12-7305, edg-perf11-61837], [edg-perf10-20667|18] (1) [edg-perf10-20667], [edg-perf10-20667|21] (2) [edg-perf10-20667, edg-perf13-25187] > {code} > 8. get operation returns outdated value > {code} > 10:26:21,020 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (DefaultStressor-6) Response(s) to ClusteredGetCommand{key=key_000000000000065B, flags=null} is {edg-perf12-7305=SuccessfulResponse{responseValue=ImmortalCacheValue {value=[6 #14: 296, 501, 1109, 1119, 1459, 1544, 1999, 2083, 2130, 2257, 2298, 2784, 2941, 3009, ]}} } > {code} > From client perspective, this behavior is not transparent. Provided the transaction ended up successfully, presence of the updated entry can be assumed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)

9 years, 8 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4546) Possible stale lock when the primary owner leaves during rebalance

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4546?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-4546: ----------------------------------------------- Dan Berindei <dberinde(a)redhat.com> changed the Status of [bug 1163727|https://bugzilla.redhat.com/show_bug.cgi?id=1163727] from ASSIGNED to POST > Possible stale lock when the primary owner leaves during rebalance > ------------------------------------------------------------------ > > Key: ISPN-4546 > URL: https://issues.jboss.org/browse/ISPN-4546 > Project: Infinispan > Issue Type: Bug > Components: Core, State Transfer > Affects Versions: 7.0.0.Alpha5, 7.1.1.Final > Reporter: Dan Berindei > Assignee: Dan Berindei > Priority: Critical > Fix For: 7.2.0.Final > > > Topology T: coordinator = A, owners(k) = [C, D], pending_owners(k) = null > B sends prepareCommand(tx1, put(k, v)) to C, D > D adds backup locks and replies > C acquires lock, ready to send reply to B > A starts installing topology T+1: owners(k) = [C, D], pending_owners(k) = [C, E] > A, C and E install topology T+1, B and D do not > E requests and receives tx data from C, including tx1 > C leaves > B sees a SuspectException, sends rollbackCommand(tx1) to C, D > D removes tx1 > C has left, but is ignored > B reports to the user that the tx has been rolled back > B and D install topology T+1 (optional) > A starts installing topology T+2: owners(k) = [D], pending_owners(k) = [E] > A, B, D, E all install topology T+2 > E requests and receives state from D, but it does not remove tx1 > A starts installing topology T+3: owners(k) = [E], pending_owners(k) = null > E now has a stale backup lock on k > It seems very hard to reproduce in production: C would have to leave soon enough so that B and D haven't received the T+1 topology yet, but late enough for it to send its transaction data to E. > A possible solution would be to catch any SuspectException during prepare/commit/rollback (without ignoring leavers), wait for a new topology, and replicate the command again on the new owners. Obviously, this wouldn't work with asynchronous prepare/commit/rollback. -- This message was sent by Atlassian JIRA (v6.3.15#6346)

9 years, 8 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4198) Commands sent by another node before the local node joined are not ignored properly

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4198?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-4198: ----------------------------------------------- Dan Berindei <dberinde(a)redhat.com> changed the Status of [bug 1220328|https://bugzilla.redhat.com/show_bug.cgi?id=1220328] from NEW to ASSIGNED > Commands sent by another node before the local node joined are not ignored properly > ----------------------------------------------------------------------------------- > > Key: ISPN-4198 > URL: https://issues.jboss.org/browse/ISPN-4198 > Project: Infinispan > Issue Type: Bug > Components: Core > Affects Versions: 6.0.2.Final, 7.0.0.Alpha2 > Reporter: Dan Berindei > Assignee: Dan Berindei > Labels: 630, testsuite_stability > Fix For: 7.0.0.Alpha3 > > > This is related to ISPN-3731. Commands that were sent by another node before the local node joined were supposed to be ignored in {{InboundInvocationHandler}}. > But the ISPN-3731 fix was incomplete, and old commands are still executed on the joiner - although the response is sent to the originator before the command is executed. > I believe this was causing random failures in ReplCommandForwardingTest. -- This message was sent by Atlassian JIRA (v6.3.15#6346)

9 years, 8 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4503) Commands with topology id 0 are not properly ignored on joiners

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4503?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration commented on ISPN-4503: ----------------------------------------------- Dan Berindei <dberinde(a)redhat.com> changed the Status of [bug 1220328|https://bugzilla.redhat.com/show_bug.cgi?id=1220328] from NEW to ASSIGNED > Commands with topology id 0 are not properly ignored on joiners > --------------------------------------------------------------- > > Key: ISPN-4503 > URL: https://issues.jboss.org/browse/ISPN-4503 > Project: Infinispan > Issue Type: Bug > Components: Core, State Transfer > Affects Versions: 7.0.0.Alpha4 > Reporter: Dan Berindei > Assignee: Dan Berindei > Labels: testsuite_stability > Fix For: 7.0.0.Beta1 > > > InboundInvocationHandlerImpl is supposed to ignore commands sent with a topology id smaller than the first topology id in which the local node was a member. But there is a loophole when the command was sent with topology id 0. > This is visible in StateTransferFunctionalTest, where the writing thread keeps the cpu busy and can delay the 2nd node joining for a long time (especially when run on a single core with {{taskset -c 0}}). For some reason, the PrepareCommands are sent only to the local node, while the TxCompletionNotificationCommands are sent to the entire cluster ({{null}}). When the 2nd node manages to join, it receives a lot of TxCompletionNotificationCommands and processing them delays the processing of the rebalance commands. Since the writes eventually block waiting for the new topology to be installed on the joiner, the delayed rebalance commands cause the write to time out. -- This message was sent by Atlassian JIRA (v6.3.15#6346)

9 years, 8 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4503) Commands with topology id 0 are not properly ignored on joiners

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4503?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration updated ISPN-4503: ------------------------------------------ Bugzilla Update: Perform Bugzilla References: https://bugzilla.redhat.com/show_bug.cgi?id=1220328 > Commands with topology id 0 are not properly ignored on joiners > --------------------------------------------------------------- > > Key: ISPN-4503 > URL: https://issues.jboss.org/browse/ISPN-4503 > Project: Infinispan > Issue Type: Bug > Components: Core, State Transfer > Affects Versions: 7.0.0.Alpha4 > Reporter: Dan Berindei > Assignee: Dan Berindei > Labels: testsuite_stability > Fix For: 7.0.0.Beta1 > > > InboundInvocationHandlerImpl is supposed to ignore commands sent with a topology id smaller than the first topology id in which the local node was a member. But there is a loophole when the command was sent with topology id 0. > This is visible in StateTransferFunctionalTest, where the writing thread keeps the cpu busy and can delay the 2nd node joining for a long time (especially when run on a single core with {{taskset -c 0}}). For some reason, the PrepareCommands are sent only to the local node, while the TxCompletionNotificationCommands are sent to the entire cluster ({{null}}). When the 2nd node manages to join, it receives a lot of TxCompletionNotificationCommands and processing them delays the processing of the rebalance commands. Since the writes eventually block waiting for the new topology to be installed on the joiner, the delayed rebalance commands cause the write to time out. -- This message was sent by Atlassian JIRA (v6.3.15#6346)

9 years, 8 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4504) Topology id is not properly set on ClusteredGetCommands

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4504?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration updated ISPN-4504: ------------------------------------------ Bugzilla Update: Perform Bugzilla References: https://bugzilla.redhat.com/show_bug.cgi?id=1220328 > Topology id is not properly set on ClusteredGetCommands > ------------------------------------------------------- > > Key: ISPN-4504 > URL: https://issues.jboss.org/browse/ISPN-4504 > Project: Infinispan > Issue Type: Bug > Components: Core > Affects Versions: 7.0.0.Alpha4 > Reporter: Dan Berindei > Assignee: Dan Berindei > Fix For: 7.0.0.Beta1 > > > Because the topology id is not properly set on ClusteredGetCommands, they don't wait for the sender's topology to be installed on the target. > I have tried to fix this while implementing a fix for ISPN-4503. My solution caused 2 failures in RemoteGetDuringStateTransferTest, scenarios 5 and 7:: > | sc | currentTopologyId | currentTopologyId + 1 (rebalance) | currentTopologyId + 2 (finish) | > | 5 | 0:remoteGet | 2:sendReply | 0:receiveReply, 1:sendReply | > | 7 | | 0:remoteGet, 2: sendReply | 0:receiveReply, 1:sendReply | -- This message was sent by Atlassian JIRA (v6.3.15#6346)

9 years, 8 months

1
0
0 / 0

[JBoss JIRA] (ISPN-4198) Commands sent by another node before the local node joined are not ignored properly

by RH Bugzilla Integration (JIRA)

[ https://issues.jboss.org/browse/ISPN-4198?page=com.atlassian.jira.plugin.... ] RH Bugzilla Integration updated ISPN-4198: ------------------------------------------ Bugzilla Update: Perform Bugzilla References: https://bugzilla.redhat.com/show_bug.cgi?id=1220328 > Commands sent by another node before the local node joined are not ignored properly > ----------------------------------------------------------------------------------- > > Key: ISPN-4198 > URL: https://issues.jboss.org/browse/ISPN-4198 > Project: Infinispan > Issue Type: Bug > Components: Core > Affects Versions: 6.0.2.Final, 7.0.0.Alpha2 > Reporter: Dan Berindei > Assignee: Dan Berindei > Labels: 630, testsuite_stability > Fix For: 7.0.0.Alpha3 > > > This is related to ISPN-3731. Commands that were sent by another node before the local node joined were supposed to be ignored in {{InboundInvocationHandler}}. > But the ISPN-3731 fix was incomplete, and old commands are still executed on the joiner - although the response is sent to the originator before the command is executed. > I believe this was causing random failures in ReplCommandForwardingTest. -- This message was sent by Atlassian JIRA (v6.3.15#6346)

9 years, 8 months

1
0
0 / 0

[JBoss JIRA] (ISPN-5454) XSite: RetryMechanismTest random failures

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-5454?page=com.atlassian.jira.plugin.... ] Dan Berindei updated ISPN-5454: ------------------------------- Description: {{ClusteredCacheBackupReceiver.awaitRemoteTask()}} doesn't respect the state push command's timeout, at least when it's smaller than the sync replication timeout in the target cache. When that happens, the state provider will resend the state, and there will be 2 state push commands executing at the same time. RetryMechanismTest changes the state push timeout to 2 seconds, but the sync replication timeout stays at 15 seconds. This causes failures in {{testRetryLocally}} and {{testFailRetryLocally}}, if it takes more than 2 seconds to suspect the killed node. {noformat} 10:02:13,007 TRACE (asyncTransportThread-8,NodeN:) [RetryOnFailureXSiteCommand] Sending XSiteStatePushCommand{cacheName=___defaultcache, timeout=2000 (1 keys)} to [NYC (sync, timeout=2000)] 10:02:16,008 TRACE (asyncTransportThread-8,NodeN:) [RetryOnFailureXSiteCommand] Sending XSiteStatePushCommand{cacheName=___defaultcache, timeout=2000 (1 keys)} to [NYC (sync, timeout=2000)] 10:02:16,040 TRACE (asyncTransportThread-4,NodeP:) [RpcManagerImpl] replication exception: org.infinispan.remoting.transport.jgroups.SuspectException: Node NodeQ-56809 was suspected 10:02:16,040 TRACE (asyncTransportThread-0,NodeP:) [RpcManagerImpl] replication exception: org.infinispan.remoting.transport.jgroups.SuspectException: Node NodeQ-56809 was suspected 10:02:19,147 ERROR (testng-RetryMechanismTest:) [UnitTestTestNGListener] Test testFailRetryLocally(org.infinispan.xsite.statetransfer.failures.RetryMechanismTest) failed. java.lang.AssertionError: expected:<2> but was:<3> at org.testng.AssertJUnit.fail(AssertJUnit.java:59) at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364) at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80) at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:245) at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:252) at org.infinispan.xsite.statetransfer.failures.RetryMechanismTest.testFailRetryLocally(RetryMechanismTest.java:227) {noformat} was: {{ClusteredCacheBackupReceiver.awaitRemoteTask()}} doesn't respect the state push command's timeout, at least when it's smaller than the sync replication timeout in the target cache. When that happens, the state provider will resend the state, and there will be 2 state push commands executing at the same time. RetryMechanismTest changes the state push timeout to 2 seconds, but the sync replication timeout stays at 15 seconds. This causes failures in {{testRetryLocally}} and {{testFailRetryLocally}}, if it takes more than 2 seconds to suspect the killed node. {noformat} 10:02:13,007 TRACE (asyncTransportThread-8,NodeN:) [RetryOnFailureXSiteCommand] Sending XSiteStatePushCommand{cacheName=___defaultcache, timeout=2000 (1 keys)} to [NYC (sync, timeout=2000)] 10:02:16,008 TRACE (asyncTransportThread-8,NodeN:) [RetryOnFailureXSiteCommand] Sending XSiteStatePushCommand{cacheName=___defaultcache, timeout=2000 (1 keys)} to [NYC (sync, timeout=2000)] 10:02:16,040 TRACE (asyncTransportThread-4,NodeP:) [RpcManagerImpl] replication exception: org.infinispan.remoting.transport.jgroups.SuspectException: Node NodeQ-56809 was suspected 10:02:16,040 TRACE (asyncTransportThread-0,NodeP:) [RpcManagerImpl] replication exception: org.infinispan.remoting.transport.jgroups.SuspectException: Node NodeQ-56809 was suspected 10:02:19,147 ERROR (testng-RetryMechanismTest:) [UnitTestTestNGListener] Test testFailRetryLocally(org.infinispan.xsite.statetransfer.failures.RetryMechanismTest) failed. java.lang.AssertionError: expected:<2> but was:<3> at org.testng.AssertJUnit.fail(AssertJUnit.java:59) at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364) at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80) at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:245) at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:252) at org.infinispan.xsite.statetransfer.failures.RetryMechanismTest.testFailRetryLocally(RetryMechanismTest.java:227) {noformat} {noformat} > XSite: RetryMechanismTest random failures > ----------------------------------------- > > Key: ISPN-5454 > URL: https://issues.jboss.org/browse/ISPN-5454 > Project: Infinispan > Issue Type: Bug > Components: Cross-Site Replication > Affects Versions: 7.2.1.Final > Reporter: Dan Berindei > Priority: Blocker > Labels: testsuite_stability > Fix For: 8.0.0.Alpha1 > > > {{ClusteredCacheBackupReceiver.awaitRemoteTask()}} doesn't respect the state push command's timeout, at least when it's smaller than the sync replication timeout in the target cache. When that happens, the state provider will resend the state, and there will be 2 state push commands executing at the same time. > RetryMechanismTest changes the state push timeout to 2 seconds, but the sync replication timeout stays at 15 seconds. This causes failures in {{testRetryLocally}} and {{testFailRetryLocally}}, if it takes more than 2 seconds to suspect the killed node. > {noformat} > 10:02:13,007 TRACE (asyncTransportThread-8,NodeN:) [RetryOnFailureXSiteCommand] Sending XSiteStatePushCommand{cacheName=___defaultcache, timeout=2000 (1 keys)} to [NYC (sync, timeout=2000)] > 10:02:16,008 TRACE (asyncTransportThread-8,NodeN:) [RetryOnFailureXSiteCommand] Sending XSiteStatePushCommand{cacheName=___defaultcache, timeout=2000 (1 keys)} to [NYC (sync, timeout=2000)] > 10:02:16,040 TRACE (asyncTransportThread-4,NodeP:) [RpcManagerImpl] replication exception: > org.infinispan.remoting.transport.jgroups.SuspectException: Node NodeQ-56809 was suspected > 10:02:16,040 TRACE (asyncTransportThread-0,NodeP:) [RpcManagerImpl] replication exception: > org.infinispan.remoting.transport.jgroups.SuspectException: Node NodeQ-56809 was suspected > 10:02:19,147 ERROR (testng-RetryMechanismTest:) [UnitTestTestNGListener] Test testFailRetryLocally(org.infinispan.xsite.statetransfer.failures.RetryMechanismTest) failed. > java.lang.AssertionError: expected:<2> but was:<3> > at org.testng.AssertJUnit.fail(AssertJUnit.java:59) > at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364) > at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80) > at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:245) > at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:252) > at org.infinispan.xsite.statetransfer.failures.RetryMechanismTest.testFailRetryLocally(RetryMechanismTest.java:227) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)

9 years, 8 months

1
0
0 / 0

[JBoss JIRA] (ISPN-5454) XSite: RetryMechanismTest random failures

by Dan Berindei (JIRA)

Dan Berindei created ISPN-5454: ---------------------------------- Summary: XSite: RetryMechanismTest random failures Key: ISPN-5454 URL: https://issues.jboss.org/browse/ISPN-5454 Project: Infinispan Issue Type: Bug Components: Cross-Site Replication Affects Versions: 7.2.1.Final Reporter: Dan Berindei Priority: Blocker Fix For: 8.0.0.Alpha1 {{ClusteredCacheBackupReceiver.awaitRemoteTask()}} doesn't respect the state push command's timeout, at least when it's smaller than the sync replication timeout in the target cache. When that happens, the state provider will resend the state, and there will be 2 state push commands executing at the same time. RetryMechanismTest changes the state push timeout to 2 seconds, but the sync replication timeout stays at 15 seconds. This causes failures in {{testRetryLocally}} and {{testFailRetryLocally}}, if it takes more than 2 seconds to suspect the killed node. {noformat} 10:02:13,007 TRACE (asyncTransportThread-8,NodeN:) [RetryOnFailureXSiteCommand] Sending XSiteStatePushCommand{cacheName=___defaultcache, timeout=2000 (1 keys)} to [NYC (sync, timeout=2000)] 10:02:16,008 TRACE (asyncTransportThread-8,NodeN:) [RetryOnFailureXSiteCommand] Sending XSiteStatePushCommand{cacheName=___defaultcache, timeout=2000 (1 keys)} to [NYC (sync, timeout=2000)] 10:02:16,040 TRACE (asyncTransportThread-4,NodeP:) [RpcManagerImpl] replication exception: org.infinispan.remoting.transport.jgroups.SuspectException: Node NodeQ-56809 was suspected 10:02:16,040 TRACE (asyncTransportThread-0,NodeP:) [RpcManagerImpl] replication exception: org.infinispan.remoting.transport.jgroups.SuspectException: Node NodeQ-56809 was suspected 10:02:19,147 ERROR (testng-RetryMechanismTest:) [UnitTestTestNGListener] Test testFailRetryLocally(org.infinispan.xsite.statetransfer.failures.RetryMechanismTest) failed. java.lang.AssertionError: expected:<2> but was:<3> at org.testng.AssertJUnit.fail(AssertJUnit.java:59) at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364) at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80) at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:245) at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:252) at org.infinispan.xsite.statetransfer.failures.RetryMechanismTest.testFailRetryLocally(RetryMechanismTest.java:227) {noformat} {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)

9 years, 8 months

1
0
0 / 0

← Newer
1
...
49
50
51
52
53
54
55
...
82
Older →

Jump to page: