October 2018 - infinispan-issues - Jboss List Archives

[JBoss JIRA] (ISPN-9601) Hot Rod client cluster switching over JMX

by Tristan Tarrant (Jira)

[ https://issues.jboss.org/browse/ISPN-9601?page=com.atlassian.jira.plugin.... ] Tristan Tarrant updated ISPN-9601: ---------------------------------- Status: Pull Request Sent (was: Open) Git Pull Request: https://github.com/infinispan/infinispan/pull/6330 > Hot Rod client cluster switching over JMX > ----------------------------------------- > > Key: ISPN-9601 > URL: https://issues.jboss.org/browse/ISPN-9601 > Project: Infinispan > Issue Type: Task > Components: Hot Rod, JMX, reporting and management > Affects Versions: 9.4.0.Final > Reporter: Tristan Tarrant > Assignee: Tristan Tarrant > Priority: Major > Fix For: 9.4.1.Final > > > The RemoteCacheManager MBean should expose the switchToCluster methods. -- This message was sent by Atlassian Jira (v7.12.1#712002)

7 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-9602) Use Util.EMPTY_BYTE_ARRAY everywhere instead of allocating new ones

by Tristan Tarrant (Jira)

[ https://issues.jboss.org/browse/ISPN-9602?page=com.atlassian.jira.plugin.... ] Tristan Tarrant updated ISPN-9602: ---------------------------------- Status: Open (was: New) > Use Util.EMPTY_BYTE_ARRAY everywhere instead of allocating new ones > ------------------------------------------------------------------- > > Key: ISPN-9602 > URL: https://issues.jboss.org/browse/ISPN-9602 > Project: Infinispan > Issue Type: Enhancement > Components: Core > Affects Versions: 9.4.0.Final > Reporter: Tristan Tarrant > Assignee: Tristan Tarrant > Priority: Trivial > Fix For: 9.4.1.Final > > -- This message was sent by Atlassian Jira (v7.12.1#712002)

7 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-9602) Use Util.EMPTY_BYTE_ARRAY everywhere instead of allocating new ones

by Tristan Tarrant (Jira)

Tristan Tarrant created ISPN-9602: ------------------------------------- Summary: Use Util.EMPTY_BYTE_ARRAY everywhere instead of allocating new ones Key: ISPN-9602 URL: https://issues.jboss.org/browse/ISPN-9602 Project: Infinispan Issue Type: Enhancement Components: Core Affects Versions: 9.4.0.Final Reporter: Tristan Tarrant Assignee: Tristan Tarrant Fix For: 9.4.1.Final -- This message was sent by Atlassian Jira (v7.12.1#712002)

7 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-9601) Hot Rod client cluster switching over JMX

by Tristan Tarrant (Jira)

Tristan Tarrant created ISPN-9601: ------------------------------------- Summary: Hot Rod client cluster switching over JMX Key: ISPN-9601 URL: https://issues.jboss.org/browse/ISPN-9601 Project: Infinispan Issue Type: Task Components: Hot Rod, JMX, reporting and management Affects Versions: 9.4.0.Final Reporter: Tristan Tarrant Assignee: Tristan Tarrant Fix For: 9.4.1.Final The RemoteCacheManager MBean should expose the switchToCluster methods. -- This message was sent by Atlassian Jira (v7.12.1#712002)

7 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-9323) RemoteCache.getWithMetadata returns expired metadata when server is transactional cache

by Tristan Tarrant (Jira)

[ https://issues.jboss.org/browse/ISPN-9323?page=com.atlassian.jira.plugin.... ] Tristan Tarrant reassigned ISPN-9323: ------------------------------------- Assignee: William Burns > RemoteCache.getWithMetadata returns expired metadata when server is transactional cache > --------------------------------------------------------------------------------------- > > Key: ISPN-9323 > URL: https://issues.jboss.org/browse/ISPN-9323 > Project: Infinispan > Issue Type: Bug > Components: Hot Rod, Server > Affects Versions: 9.2.4.Final, 9.4.0.Final, 9.3.1.Final > Reporter: Marek Posolda > Assignee: William Burns > Priority: Major > Attachments: RemoteCacheExpirationTest.java > > > I have transactional cache defined on server side (infinispan-server). I have HotRod client application (RemoteCache), which is connected to server above and which calls this: > {code} > remoteCache.put("key", "value", 1000, TimeUnit.SECONDS, 100, TimeUnit.SECONDS); > MetadataValue loaded = remoteCache.getWithMetadata("key"); > System.out.println("created: " + loaded.getCreated() + ", lastUsed: " + loaded.getLastUsed()); > {code} > The output is: > {code} > created: 0, lastUsed: 0 > {code} > This is not correct and causes some issues as entries are treated as expired. Especially when used together with remote-store, which is not able to load such entries, as PersistenceUtil#loadAndCheckExpiration method always treat the item loaded from RemoteCache as expired. See the attached application for more details. > STEPS TO REPRODUCE: > 1) Use infinispan-server-9.2.4.Final on server side and HotRod client (RemoteCache) version 9.2.4.Final on client side (maven dependency org.infinispan:infinispan-client-hotrod:9.2.4.Final for client app) > 2) Unzip server 9.2.4.Final to some directory > 3) Change ISPN_SERVER_HOME/standalone/configuration/clustered.xml and just add this transactional cache to the cache-container (NOTE: configuration "transactional" is already defined in the clustered.xml file): > {code} > <distributed-cache name="trans" configuration="transactional" /> > {code} > 4) Run server with: > {code} > ./standalone.sh -c clustered.xml > {code} > 5) Run the client application from attachement. You can see the output as described above and also that entry is considered as expired (Example app uses same algorithm like RemoteStore together with PersistenceUtil.loadAndCheckExpired ). Original sources here: https://github.com/mposolda/misc/blob/master/ispn-client-listener/src/mai... -- This message was sent by Atlassian Jira (v7.12.1#712002)

7 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-9323) RemoteCache.getWithMetadata returns expired metadata when server is transactional cache

by Marek Posolda (Jira)

[ https://issues.jboss.org/browse/ISPN-9323?page=com.atlassian.jira.plugin.... ] Marek Posolda updated ISPN-9323: -------------------------------- Affects Version/s: 9.4.0.Final [~NadirX] Unfortunately can be still reproduced with 9.4.0.Final as well > RemoteCache.getWithMetadata returns expired metadata when server is transactional cache > --------------------------------------------------------------------------------------- > > Key: ISPN-9323 > URL: https://issues.jboss.org/browse/ISPN-9323 > Project: Infinispan > Issue Type: Bug > Components: Hot Rod, Server > Affects Versions: 9.2.4.Final, 9.4.0.Final, 9.3.1.Final > Reporter: Marek Posolda > Priority: Major > Attachments: RemoteCacheExpirationTest.java > > > I have transactional cache defined on server side (infinispan-server). I have HotRod client application (RemoteCache), which is connected to server above and which calls this: > {code} > remoteCache.put("key", "value", 1000, TimeUnit.SECONDS, 100, TimeUnit.SECONDS); > MetadataValue loaded = remoteCache.getWithMetadata("key"); > System.out.println("created: " + loaded.getCreated() + ", lastUsed: " + loaded.getLastUsed()); > {code} > The output is: > {code} > created: 0, lastUsed: 0 > {code} > This is not correct and causes some issues as entries are treated as expired. Especially when used together with remote-store, which is not able to load such entries, as PersistenceUtil#loadAndCheckExpiration method always treat the item loaded from RemoteCache as expired. See the attached application for more details. > STEPS TO REPRODUCE: > 1) Use infinispan-server-9.2.4.Final on server side and HotRod client (RemoteCache) version 9.2.4.Final on client side (maven dependency org.infinispan:infinispan-client-hotrod:9.2.4.Final for client app) > 2) Unzip server 9.2.4.Final to some directory > 3) Change ISPN_SERVER_HOME/standalone/configuration/clustered.xml and just add this transactional cache to the cache-container (NOTE: configuration "transactional" is already defined in the clustered.xml file): > {code} > <distributed-cache name="trans" configuration="transactional" /> > {code} > 4) Run server with: > {code} > ./standalone.sh -c clustered.xml > {code} > 5) Run the client application from attachement. You can see the output as described above and also that entry is considered as expired (Example app uses same algorithm like RemoteStore together with PersistenceUtil.loadAndCheckExpired ). Original sources here: https://github.com/mposolda/misc/blob/master/ispn-client-listener/src/mai... -- This message was sent by Atlassian Jira (v7.12.1#712002)

7 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-9064) SiteManualSwitchTest.testManualClusterSwitch randomly fails

by Tristan Tarrant (Jira)

[ https://issues.jboss.org/browse/ISPN-9064?page=com.atlassian.jira.plugin.... ] Tristan Tarrant reopened ISPN-9064: ----------------------------------- > SiteManualSwitchTest.testManualClusterSwitch randomly fails > ----------------------------------------------------------- > > Key: ISPN-9064 > URL: https://issues.jboss.org/browse/ISPN-9064 > Project: Infinispan > Issue Type: Bug > Components: Remote Protocols > Reporter: Galder Zamarreño > Priority: Major > Labels: testsuite_stability > > {code} > java.lang.AssertionError: expected:<1> but was:<0> > at org.infinispan.client.hotrod.xsite.AbstractHotRodSiteFailoverTest.assertSiteHit(AbstractHotRodSiteFailoverTest.java:146) > at org.infinispan.client.hotrod.xsite.SiteManualSwitchTest.assertSingleSiteHit(SiteManualSwitchTest.java:47) > at org.infinispan.client.hotrod.xsite.SiteManualSwitchTest.testManualClusterSwitch(SiteManualSwitchTest.java:37) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v7.12.1#712002)

7 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-9064) SiteManualSwitchTest.testManualClusterSwitch randomly fails

by Tristan Tarrant (Jira)

[ https://issues.jboss.org/browse/ISPN-9064?page=com.atlassian.jira.plugin.... ] Tristan Tarrant updated ISPN-9064: ---------------------------------- Fix Version/s: (was: 9.4.0.Final) > SiteManualSwitchTest.testManualClusterSwitch randomly fails > ----------------------------------------------------------- > > Key: ISPN-9064 > URL: https://issues.jboss.org/browse/ISPN-9064 > Project: Infinispan > Issue Type: Bug > Components: Remote Protocols > Reporter: Galder Zamarreño > Priority: Major > Labels: testsuite_stability > > {code} > java.lang.AssertionError: expected:<1> but was:<0> > at org.infinispan.client.hotrod.xsite.AbstractHotRodSiteFailoverTest.assertSiteHit(AbstractHotRodSiteFailoverTest.java:146) > at org.infinispan.client.hotrod.xsite.SiteManualSwitchTest.assertSingleSiteHit(SiteManualSwitchTest.java:47) > at org.infinispan.client.hotrod.xsite.SiteManualSwitchTest.testManualClusterSwitch(SiteManualSwitchTest.java:37) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v7.12.1#712002)

7 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-9588) JGroups fails to install new cluster view after coordinator leave

by Dan Berindei (Jira)

[ https://issues.jboss.org/browse/ISPN-9588?page=com.atlassian.jira.plugin.... ] Dan Berindei commented on ISPN-9588: ------------------------------------ I found out why some tests also hang with {{FD_SOCK}}: when one node's {{FD_SOCK}} closes the socket gracefully, the other node doesn't consider it a suspect: {noformat} 10:10:21,248 DEBUG (FD_SOCK pinger-12,C:[]) [FD_SOCK] C: A closed socket gracefully 10:10:21,248 DEBUG (FD_SOCK pinger-12,C:[]) [FD_SOCK] C: pingable_mbrs=[B, C], ping_dest=B 10:10:21,248 TRACE (FD_SOCK pinger-12,C:[]) [FD_SOCK] C: ping_dest=B, ping_sock=Socket[addr=/127.0.0.1,port=33279,localport=33423], cache=A: 127.0.0.1:33159 (2 secs old) 10:10:21,248 TRACE (FD_SOCK acceptor-2,B:[]) [FD_SOCK] B: accepted connection from /127.0.0.1:33423 {noformat} https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/... > JGroups fails to install new cluster view after coordinator leave > ----------------------------------------------------------------- > > Key: ISPN-9588 > URL: https://issues.jboss.org/browse/ISPN-9588 > Project: Infinispan > Issue Type: Bug > Components: Core, Test Suite - Core > Affects Versions: 9.4.0.CR3 > Reporter: Dan Berindei > Assignee: Dan Berindei > Priority: Major > Fix For: 10.0.0.Alpha1 > > Attachments: ISPN-9127_ScatteredCrashInSequenceTest-infinispan-core.log.gz, ISPN-9127_ThreeWaySplitAndMergeTest-infinispan-core.log.gz > > > The core test suite normally shuts down cache managers in the reverse order of their start, so that the coordinator stops last. This should be just an optimization, to reduce the number of coordinator changes, and with it the test suite duration and log size. > Unfortunately, the few tests that stop the coordinator first are routinely failing to stop the cluster properly, usually when there are 2 nodes left in the cluster the 2nd node doesn't receive the view: > {noformat} > 10:08:01,656 DEBUG (jgroups-26,Test-NodeB-20661:[]) [GMS] Test-NodeB-20661: installing view [Test-NodeC-25266|33] (3) [Test-NodeC-25266, Test-NodeA-8936, Test-NodeB-20661] > 10:08:01,862 TRACE (testng-Test:[]) [GMS] Test-NodeC-25266: sending LEAVE request to Test-NodeA-8936 > 10:08:01,863 DEBUG (jgroups-6,Test-NodeA-8936:[]) [GMS] Test-NodeA-8936: members are (3) Test-NodeC-25266,Test-NodeA-8936,Test-NodeB-20661, coord=Test-NodeA-8936: I'm the new coordinator > 10:08:01,896 TRACE (jgroups-6,Test-NodeA-8936:[]) [GMS] Test-NodeA-8936: joiners=[], suspected=[], leaving=[Test-NodeC-25266], new view: [Test-NodeA-8936|34] (2) [Test-NodeA-8936, Test-NodeB-20661] > 10:08:01,896 TRACE (jgroups-6,Test-NodeA-8936:[]) [GMS] Test-NodeA-8936: mcasting view [Test-NodeA-8936|34] (2) [Test-NodeA-8936, Test-NodeB-20661] > 10:08:01,900 DEBUG (jgroups-6,Test-NodeA-8936:[]) [GMS] Test-NodeA-8936: installing view [Test-NodeA-8936|34] (2) [Test-NodeA-8936, Test-NodeB-20661] > 10:08:02,245 TRACE (testng-Test:[]) [JGroupsTransport] Test-NodeB-20661 sending request 140 to Test-NodeC-25266: CacheTopologyControlCommand{cache=org.infinispan.CONFIG, type=LEAVE, sender=Test-NodeB-20661, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=33} > 10:12:02,247 DEBUG (testng-Test:[]) [LocalTopologyManagerImpl] Error sending the leave request for cache org.infinispan.CONFIG to coordinator > org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 140 from 7d5d965d-eacc-fd47-2ab8-5cec95f4c10b > {noformat} > Sometimes the new coordinator gets stuck in a merge loop, keeps trying to re-include the stopped node and fails: > {noformat} > 10:06:21,484 DEBUG (jgroups-24,Test-NodeC-57941:[]) [GMS] Test-NodeC-57941: installing view [Test-NodeD-37696|8] (3) [Test-NodeD-37696, Test-NodeB-39137, Test-NodeC-57941] > 10:06:21,486 DEBUG (jgroups-25,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: installing view [Test-NodeD-37696|8] (3) [Test-NodeD-37696, Test-NodeB-39137, Test-NodeC-57941] > 10:06:22,070 TRACE (testng-Test:[]) [GMS] Test-NodeD-37696: sending LEAVE request to Test-NodeB-39137 > 10:06:22,078 TRACE (jgroups-10,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view [Test-NodeB-39137|9] (2) [Test-NodeB-39137, Test-NodeC-57941] > 10:06:22,080 DEBUG (jgroups-10,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: installing view [Test-NodeB-39137|9] (2) [Test-NodeB-39137, Test-NodeC-57941] > 10:06:30,887 DEBUG (jgroups-24,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: I will be the merge leader. Starting the merge task. Views: {Test-NodeB-39137=[Test-NodeB-39137|9] (2) [Test-NodeB-39137, Test-NodeC-57941], Test-NodeC-57941=[Test-NodeD-37696|8] (3) [Test-NodeD-37696, Test-NodeB-39137, Test-NodeC-57941]} > 10:06:30,888 DEBUG (MergeTask-108,Test-NodeB-39137:[]) [STABLE] suspending message garbage collection > 10:06:30,888 DEBUG (MergeTask-108,Test-NodeB-39137:[]) [STABLE] Test-NodeB-39137: resume task started, max_suspend_time=220000 > 10:06:30,888 DEBUG (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge task Test-NodeB-39137::7 started with 2 participants > 10:06:30,888 TRACE (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: sending MERGE_REQ to [Test-NodeD-37696, Test-NodeB-39137] > 10:06:36,041 TRACE (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: collected 1 merge response(s) in 5153 ms > 10:06:36,041 DEBUG (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:06:36,043 TRACE (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: consolidated view=MergeView::[Test-NodeB-39137|10] (2) [Test-NodeB-39137, Test-NodeC-57941], 1 subgroups: [Test-NodeB-39137|9] (2) [Test-NodeB-39137, Test-NodeC-57941] > consolidated digest=Test-NodeB-39137: [15 (16)], Test-NodeC-57941: [5 (5)] > 10:06:36,043 DEBUG (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: installing merge view [Test-NodeB-39137|10] (2 members) in 1 coords > 10:06:36,043 DEBUG (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge Test-NodeB-39137::7 took 5155 ms > 10:06:36,043 TRACE (jgroups-24,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view MergeView::[Test-NodeB-39137|10] (2) [Test-NodeB-39137, Test-NodeC-57941], 1 subgroups: [Test-NodeB-39137|9] (2) [Test-NodeB-39137, Test-NodeC-57941] > 10:06:49,877 DEBUG (MergeTask-229,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:07:03,755 DEBUG (MergeTask-342,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:07:17,703 DEBUG (MergeTask-447,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:07:31,547 DEBUG (MergeTask-568,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:07:45,381 DEBUG (MergeTask-688,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:07:59,196 DEBUG (MergeTask-806,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:08:13,031 DEBUG (MergeTask-932,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:08:26,885 DEBUG (MergeTask-1061,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:08:40,697 DEBUG (MergeTask-1197,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:08:54,510 DEBUG (MergeTask-1330,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:09:08,324 DEBUG (MergeTask-1463,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:09:22,135 DEBUG (MergeTask-1596,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:09:35,953 DEBUG (MergeTask-1729,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:09:49,770 DEBUG (MergeTask-1857,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:10:03,609 DEBUG (MergeTask-1986,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:10:17,420 DEBUG (MergeTask-2117,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:10:31,252 DEBUG (MergeTask-2248,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:10:45,099 DEBUG (MergeTask-2383,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:10:58,916 DEBUG (MergeTask-2516,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:11:12,743 DEBUG (MergeTask-2648,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > {noformat} -- This message was sent by Atlassian Jira (v7.12.1#712002)

7 years, 6 months

1
0
0 / 0

[JBoss JIRA] (ISPN-9588) JGroups fails to install new cluster view after coordinator leave

by Dan Berindei (Jira)

[ https://issues.jboss.org/browse/ISPN-9588?page=com.atlassian.jira.plugin.... ] Dan Berindei updated ISPN-9588: ------------------------------- Git Pull Request: https://github.com/infinispan/infinispan/pull/6329 > JGroups fails to install new cluster view after coordinator leave > ----------------------------------------------------------------- > > Key: ISPN-9588 > URL: https://issues.jboss.org/browse/ISPN-9588 > Project: Infinispan > Issue Type: Bug > Components: Core, Test Suite - Core > Affects Versions: 9.4.0.CR3 > Reporter: Dan Berindei > Assignee: Dan Berindei > Priority: Major > Fix For: 10.0.0.Alpha1 > > Attachments: ISPN-9127_ScatteredCrashInSequenceTest-infinispan-core.log.gz, ISPN-9127_ThreeWaySplitAndMergeTest-infinispan-core.log.gz > > > The core test suite normally shuts down cache managers in the reverse order of their start, so that the coordinator stops last. This should be just an optimization, to reduce the number of coordinator changes, and with it the test suite duration and log size. > Unfortunately, the few tests that stop the coordinator first are routinely failing to stop the cluster properly, usually when there are 2 nodes left in the cluster the 2nd node doesn't receive the view: > {noformat} > 10:08:01,656 DEBUG (jgroups-26,Test-NodeB-20661:[]) [GMS] Test-NodeB-20661: installing view [Test-NodeC-25266|33] (3) [Test-NodeC-25266, Test-NodeA-8936, Test-NodeB-20661] > 10:08:01,862 TRACE (testng-Test:[]) [GMS] Test-NodeC-25266: sending LEAVE request to Test-NodeA-8936 > 10:08:01,863 DEBUG (jgroups-6,Test-NodeA-8936:[]) [GMS] Test-NodeA-8936: members are (3) Test-NodeC-25266,Test-NodeA-8936,Test-NodeB-20661, coord=Test-NodeA-8936: I'm the new coordinator > 10:08:01,896 TRACE (jgroups-6,Test-NodeA-8936:[]) [GMS] Test-NodeA-8936: joiners=[], suspected=[], leaving=[Test-NodeC-25266], new view: [Test-NodeA-8936|34] (2) [Test-NodeA-8936, Test-NodeB-20661] > 10:08:01,896 TRACE (jgroups-6,Test-NodeA-8936:[]) [GMS] Test-NodeA-8936: mcasting view [Test-NodeA-8936|34] (2) [Test-NodeA-8936, Test-NodeB-20661] > 10:08:01,900 DEBUG (jgroups-6,Test-NodeA-8936:[]) [GMS] Test-NodeA-8936: installing view [Test-NodeA-8936|34] (2) [Test-NodeA-8936, Test-NodeB-20661] > 10:08:02,245 TRACE (testng-Test:[]) [JGroupsTransport] Test-NodeB-20661 sending request 140 to Test-NodeC-25266: CacheTopologyControlCommand{cache=org.infinispan.CONFIG, type=LEAVE, sender=Test-NodeB-20661, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=33} > 10:12:02,247 DEBUG (testng-Test:[]) [LocalTopologyManagerImpl] Error sending the leave request for cache org.infinispan.CONFIG to coordinator > org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 140 from 7d5d965d-eacc-fd47-2ab8-5cec95f4c10b > {noformat} > Sometimes the new coordinator gets stuck in a merge loop, keeps trying to re-include the stopped node and fails: > {noformat} > 10:06:21,484 DEBUG (jgroups-24,Test-NodeC-57941:[]) [GMS] Test-NodeC-57941: installing view [Test-NodeD-37696|8] (3) [Test-NodeD-37696, Test-NodeB-39137, Test-NodeC-57941] > 10:06:21,486 DEBUG (jgroups-25,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: installing view [Test-NodeD-37696|8] (3) [Test-NodeD-37696, Test-NodeB-39137, Test-NodeC-57941] > 10:06:22,070 TRACE (testng-Test:[]) [GMS] Test-NodeD-37696: sending LEAVE request to Test-NodeB-39137 > 10:06:22,078 TRACE (jgroups-10,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view [Test-NodeB-39137|9] (2) [Test-NodeB-39137, Test-NodeC-57941] > 10:06:22,080 DEBUG (jgroups-10,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: installing view [Test-NodeB-39137|9] (2) [Test-NodeB-39137, Test-NodeC-57941] > 10:06:30,887 DEBUG (jgroups-24,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: I will be the merge leader. Starting the merge task. Views: {Test-NodeB-39137=[Test-NodeB-39137|9] (2) [Test-NodeB-39137, Test-NodeC-57941], Test-NodeC-57941=[Test-NodeD-37696|8] (3) [Test-NodeD-37696, Test-NodeB-39137, Test-NodeC-57941]} > 10:06:30,888 DEBUG (MergeTask-108,Test-NodeB-39137:[]) [STABLE] suspending message garbage collection > 10:06:30,888 DEBUG (MergeTask-108,Test-NodeB-39137:[]) [STABLE] Test-NodeB-39137: resume task started, max_suspend_time=220000 > 10:06:30,888 DEBUG (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge task Test-NodeB-39137::7 started with 2 participants > 10:06:30,888 TRACE (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: sending MERGE_REQ to [Test-NodeD-37696, Test-NodeB-39137] > 10:06:36,041 TRACE (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: collected 1 merge response(s) in 5153 ms > 10:06:36,041 DEBUG (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:06:36,043 TRACE (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: consolidated view=MergeView::[Test-NodeB-39137|10] (2) [Test-NodeB-39137, Test-NodeC-57941], 1 subgroups: [Test-NodeB-39137|9] (2) [Test-NodeB-39137, Test-NodeC-57941] > consolidated digest=Test-NodeB-39137: [15 (16)], Test-NodeC-57941: [5 (5)] > 10:06:36,043 DEBUG (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: installing merge view [Test-NodeB-39137|10] (2 members) in 1 coords > 10:06:36,043 DEBUG (MergeTask-108,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge Test-NodeB-39137::7 took 5155 ms > 10:06:36,043 TRACE (jgroups-24,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: mcasting view MergeView::[Test-NodeB-39137|10] (2) [Test-NodeB-39137, Test-NodeC-57941], 1 subgroups: [Test-NodeB-39137|9] (2) [Test-NodeB-39137, Test-NodeC-57941] > 10:06:49,877 DEBUG (MergeTask-229,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:07:03,755 DEBUG (MergeTask-342,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:07:17,703 DEBUG (MergeTask-447,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:07:31,547 DEBUG (MergeTask-568,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:07:45,381 DEBUG (MergeTask-688,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:07:59,196 DEBUG (MergeTask-806,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:08:13,031 DEBUG (MergeTask-932,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:08:26,885 DEBUG (MergeTask-1061,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:08:40,697 DEBUG (MergeTask-1197,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:08:54,510 DEBUG (MergeTask-1330,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:09:08,324 DEBUG (MergeTask-1463,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:09:22,135 DEBUG (MergeTask-1596,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:09:35,953 DEBUG (MergeTask-1729,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:09:49,770 DEBUG (MergeTask-1857,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:10:03,609 DEBUG (MergeTask-1986,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:10:17,420 DEBUG (MergeTask-2117,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:10:31,252 DEBUG (MergeTask-2248,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:10:45,099 DEBUG (MergeTask-2383,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:10:58,916 DEBUG (MergeTask-2516,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > 10:11:12,743 DEBUG (MergeTask-2648,Test-NodeB-39137:[]) [GMS] Test-NodeB-39137: merge leader Test-NodeB-39137 did not get responses from all 2 partition coordinators; missing responses from 1 members, removing them from the merge > {noformat} -- This message was sent by Atlassian Jira (v7.12.1#712002)

7 years, 6 months

1
0
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-issues October 2018