[infinispan-issues] [JBoss JIRA] (ISPN-8976) 2 subclusters failed to merge to 1 cluster - IllegalLifecycleStateException

Dan Berindei (JIRA) issues at jboss.org
Fri Apr 27 09:54:00 EDT 2018


    [ https://issues.jboss.org/browse/ISPN-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568210#comment-13568210 ] 

Dan Berindei commented on ISPN-8976:
------------------------------------

[~robertcernak] from the logs it looks like the cache manager on node 1 really is stopping when the {{IllegalLifecycleStateException}} is logged:

{noformat}
2018-03-09 11:10:39,151 DEBUG  [Camel (camel-1) thread #0 - seda://systemInitializer] (DefaultCacheManager.java:717) - Stopping cache manager cloud_103 on 744a974a-2811-4f79-ac63-f32daf005d7f-60108
{noformat}

[~ryanemerson] unfortunately the coordinator keeps being the coordinator while it is stopping, and it will try (and fail) to perform conflict resolution if it sees a merge:

{noformat}
2018-03-09 11:10:41,653 DEBUG  [transport-thread-744a974a-2811-4f79-ac63-f32daf005d7f-p4-t6] (DelegatingBasicLogger.java:409) - Recovered 2 partition(s) for cache BacnetAdapterCache: [CacheTopology{id=21, rebalanceId=6, currentCH=DefaultConsistentHash{ns=256, owners = (6)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 40+45, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 45+37, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 45+42, 398c8c9b-5e02-4699-956c-52a55802d546-42840: 40+40, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 41+51, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 45+41]}, pendingCH=null, unionCH=null, phase=NO_REBALANCE, actualMembers=[85d60140-6da8-4618-9540-44d0bdec2b68-42234, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553, 398c8c9b-5e02-4699-956c-52a55802d546-42840, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469], persistentUUIDs=[24211d0a-66bc-45c5-8ec7-4ec93d4ab8bb, 4ad1da72-5f44-4ec9-8c8c-d132501b4e20, 262918d4-fa1d-4b80-9ed3-2deb76736036, 860e5eb9-964e-4461-936f-810a8fa3bfd6, ae77f87c-722a-43cd-8041-fc5d660ccf44, f617d3d9-e370-4efb-b000-ce0d35d04b73]}, CacheTopology{id=5, rebalanceId=2, currentCH=DefaultConsistentHash{ns=256, owners = (2)[744a974a-2811-4f79-ac63-f32daf005d7f-60108: 127+129, 41099d73-b9a6-4daf-857b-7436feb5f0a1-48336: 129+127]}, pendingCH=null, unionCH=null, phase=NO_REBALANCE, actualMembers=[744a974a-2811-4f79-ac63-f32daf005d7f-60108, 41099d73-b9a6-4daf-857b-7436feb5f0a1-48336], persistentUUIDs=[e4ecc1db-b374-46e6-b5a0-a0d52eaf538a, a1534115-8f12-4d8f-ad5a-f7f1c28f582d]}]
2018-03-09 11:10:41,905 DEBUG  [transport-thread-744a974a-2811-4f79-ac63-f32daf005d7f-p4-t6] (ClusterCacheStatus.java:180) - Updating topologies after merge for cache BacnetAdapterCache, current topology = CacheTopology{id=22, rebalanceId=7, currentCH=DefaultConsistentHash{ns=256, owners = (6)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 40+45, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 45+37, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 45+42, 398c8c9b-5e02-4699-956c-52a55802d546-42840: 40+40, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 41+51, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 45+41]}, pendingCH=DefaultConsistentHash{ns=256, owners = (8)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 29+31, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 33+30, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 33+33, 398c8c9b-5e02-4699-956c-52a55802d546-42840: 29+30, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 30+31, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 34+32, 744a974a-2811-4f79-ac63-f32daf005d7f-60108: 34+34, 41099d73-b9a6-4daf-857b-7436feb5f0a1-48336: 34+35]}, unionCH=DefaultConsistentHash{ns=256, owners = (8)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 29+31, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 33+30, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 33+33, 398c8c9b-5e02-4699-956c-52a55802d546-42840: 29+30, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 30+31, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 34+32, 744a974a-2811-4f79-ac63-f32daf005d7f-60108: 34+34, 41099d73-b9a6-4daf-857b-7436feb5f0a1-48336: 34+35]}, phase=CONFLICT_RESOLUTION, actualMembers=[85d60140-6da8-4618-9540-44d0bdec2b68-42234, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553, 398c8c9b-5e02-4699-956c-52a55802d546-42840, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469, 744a974a-2811-4f79-ac63-f32daf005d7f-60108, 41099d73-b9a6-4daf-857b-7436feb5f0a1-48336], persistentUUIDs=[24211d0a-66bc-45c5-8ec7-4ec93d4ab8bb, 4ad1da72-5f44-4ec9-8c8c-d132501b4e20, 262918d4-fa1d-4b80-9ed3-2deb76736036, 860e5eb9-964e-4461-936f-810a8fa3bfd6, ae77f87c-722a-43cd-8041-fc5d660ccf44, f617d3d9-e370-4efb-b000-ce0d35d04b73, e4ecc1db-b374-46e6-b5a0-a0d52eaf538a, a1534115-8f12-4d8f-ad5a-f7f1c28f582d]}, stable topology = CacheTopology{id=21, rebalanceId=6, currentCH=DefaultConsistentHash{ns=256, owners = (6)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 40+45, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 45+37, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 45+42, 398c8c9b-5e02-4699-956c-52a55802d546-42840: 40+40, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 41+51, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 45+41]}, pendingCH=null, unionCH=null, phase=NO_REBALANCE, actualMembers=[85d60140-6da8-4618-9540-44d0bdec2b68-42234, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553, 398c8c9b-5e02-4699-956c-52a55802d546-42840, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469], persistentUUIDs=[24211d0a-66bc-45c5-8ec7-4ec93d4ab8bb, 4ad1da72-5f44-4ec9-8c8c-d132501b4e20, 262918d4-fa1d-4b80-9ed3-2deb76736036, 860e5eb9-964e-4461-936f-810a8fa3bfd6, ae77f87c-722a-43cd-8041-fc5d660ccf44, f617d3d9-e370-4efb-b000-ce0d35d04b73]}, availability mode = null, resolveConflicts = true
2018-03-09 11:10:41,960 ERROR  [transport-thread-744a974a-2811-4f79-ac63-f32daf005d7f-p4-t6] (ClusterCacheStatus.java:599) - ISPN000228: Failed to recover cache BacnetAdapterCache state after the current node became the coordinator
org.infinispan.IllegalLifecycleStateException: Cache container has been stopped and cannot be reused. Recreate the cache container.
	at org.infinispan.manager.DefaultCacheManager.assertIsNotTerminated(DefaultCacheManager.java:956) ~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
	at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:452) ~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:448) ~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:434) ~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
	at org.infinispan.topology.ClusterCacheStatus.updateTopologiesAfterMerge(ClusterCacheStatus.java:191) ~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
	at org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.onPartitionMerge(PreferAvailabilityStrategy.java:197) ~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
	at org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:597) ~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
	at org.infinispan.topology.ClusterTopologyManagerImpl.lambda$recoverClusterStatus$6(ClusterTopologyManagerImpl.java:528) ~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
{noformat}

The other nodes also log a {{NullPointerException}}, I think it happens when they receive two conflict resolution topology updates in a row:

{noformat}
2018-03-09 11:10:42,360 DEBUG  [transport-thread-41099d73-b9a6-4daf-857b-7436feb5f0a1-p4-t16] (LocalTopologyManagerImpl.java:394) - Updating local topology for cache BacnetAdapterCache: CacheTopology{id=22, rebalanceId=7, currentCH=DefaultConsistentHash{ns=256, owners = (6)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 40+45, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 45+37, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 45+42, 398c8c9b-5e02-4699-956c-52a55802d546-42840: 40+40, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 41+51, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 45+41]}, pendingCH=DefaultConsistentHash{ns=256, owners = (8)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 29+31, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 33+30, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 33+33, 398c8c9b-5e02-4699-956c-52a55802d546-42840: 29+30, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 30+31, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 34+32, 744a974a-2811-4f79-ac63-f32daf005d7f-60108: 34+34, 41099d73-b9a6-4daf-857b-7436feb5f0a1-48336: 34+35]}, unionCH=null, phase=CONFLICT_RESOLUTION, actualMembers=[85d60140-6da8-4618-9540-44d0bdec2b68-42234, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553, 398c8c9b-5e02-4699-956c-52a55802d546-42840, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469, 744a974a-2811-4f79-ac63-f32daf005d7f-60108, 41099d73-b9a6-4daf-857b-7436feb5f0a1-48336], persistentUUIDs=[24211d0a-66bc-45c5-8ec7-4ec93d4ab8bb, 4ad1da72-5f44-4ec9-8c8c-d132501b4e20, 262918d4-fa1d-4b80-9ed3-2deb76736036, 860e5eb9-964e-4461-936f-810a8fa3bfd6, ae77f87c-722a-43cd-8041-fc5d660ccf44, f617d3d9-e370-4efb-b000-ce0d35d04b73, e4ecc1db-b374-46e6-b5a0-a0d52eaf538a, a1534115-8f12-4d8f-ad5a-f7f1c28f582d]}
2018-03-09 11:10:42,881 DEBUG  [transport-thread-41099d73-b9a6-4daf-857b-7436feb5f0a1-p4-t23] (LocalTopologyManagerImpl.java:394) - Updating local topology for cache BacnetAdapterCache: CacheTopology{id=23, rebalanceId=7, currentCH=DefaultConsistentHash{ns=256, owners = (6)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 40+45, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 45+37, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 45+42, 398c8c9b-5e02-4699-956c-52a55802d546-42840: 40+40, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 41+51, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 45+41]}, pendingCH=DefaultConsistentHash{ns=256, owners = (6)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 33+31, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 43+20, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 44+23, 398c8c9b-5e02-4699-956c-52a55802d546-42840: 43+17, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 44+21, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 49+19]}, unionCH=null, phase=CONFLICT_RESOLUTION, actualMembers=[85d60140-6da8-4618-9540-44d0bdec2b68-42234, ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553, 398c8c9b-5e02-4699-956c-52a55802d546-42840, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703, b2fd8ffe-5322-4bdc-909d-a418906cebae-7469, 41099d73-b9a6-4daf-857b-7436feb5f0a1-48336], persistentUUIDs=[24211d0a-66bc-45c5-8ec7-4ec93d4ab8bb, 4ad1da72-5f44-4ec9-8c8c-d132501b4e20, 262918d4-fa1d-4b80-9ed3-2deb76736036, 860e5eb9-964e-4461-936f-810a8fa3bfd6, ae77f87c-722a-43cd-8041-fc5d660ccf44, f617d3d9-e370-4efb-b000-ce0d35d04b73, a1534115-8f12-4d8f-ad5a-f7f1c28f582d]}
2018-03-09 11:10:42,895 ERROR  [transport-thread-41099d73-b9a6-4daf-857b-7436feb5f0a1-p4-t23] (LocalTopologyManagerImpl.java:272) - ISPN000452: Failed to update topology for cache BacnetAdapterCache
java.lang.NullPointerException
	at org.infinispan.distribution.ch.impl.SyncConsistentHashFactory.union(SyncConsistentHashFactory.java:152) ~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
	at org.infinispan.distribution.ch.impl.SyncConsistentHashFactory.union(SyncConsistentHashFactory.java:53) ~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
	at org.infinispan.topology.LocalTopologyManagerImpl.doHandleTopologyUpdate(LocalTopologyManagerImpl.java:342) ~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
	at org.infinispan.topology.LocalTopologyManagerImpl.lambda$handleTopologyUpdate$1(LocalTopologyManagerImpl.java:270) ~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
{noformat}


> 2 subclusters failed to merge to 1 cluster - IllegalLifecycleStateException
> ---------------------------------------------------------------------------
>
>                 Key: ISPN-8976
>                 URL: https://issues.jboss.org/browse/ISPN-8976
>             Project: Infinispan
>          Issue Type: Bug
>    Affects Versions: 9.1.4.Final
>            Reporter: Robert Cernak
>         Attachments: logs.zip
>
>
> At the beginning I have main cluster consisted of 8 nodes.
> Then I disconnected main switch on which these nodes were connected.
> This leaded to separating main cluster to 2 subclusters - first with 2 nodes and second with 6 nodes. This was expected.
> After that I rebooted the nodes. After reboot, nodes again correctly formed 2 subclusters with 2 and 6 members.
> After a long time when all nodes were stable with low cpu load, I connected the main switch back which should lead to recreation of main cluster with 8 controllers.
> However main cluster did not recovered:
> subcluster2 did not change - still had 6 nodes connected - no new members
> subcluster1 - nodes did not connect with subcluster2 and after cca 30min they left the cluster.
> When I checked infinispan logs of node1 from 1st subcluster I had IllegalLifecycleStateException for every created cache (see included logs.zip):
> [transport-thread-744a974a-2811-4f79-ac63-f32daf005d7f-p4-t6] (ClusterCacheStatus.java:599) - ISPN000228: Failed to recover cache XXX state after the current node became the coordinator
> org.infinispan.IllegalLifecycleStateException: Cache container has been stopped and cannot be reused. Recreate the cache container.



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)



More information about the infinispan-issues mailing list