[
https://issues.jboss.org/browse/ISPN-8976?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-8976:
------------------------------------
[~robertcernak] from the logs it looks like the cache manager on node 1 really is stopping
when the {{IllegalLifecycleStateException}} is logged:
{noformat}
2018-03-09 11:10:39,151 DEBUG [Camel (camel-1) thread #0 - seda://systemInitializer]
(DefaultCacheManager.java:717) - Stopping cache manager cloud_103 on
744a974a-2811-4f79-ac63-f32daf005d7f-60108
{noformat}
[~ryanemerson] unfortunately the coordinator keeps being the coordinator while it is
stopping, and it will try (and fail) to perform conflict resolution if it sees a merge:
{noformat}
2018-03-09 11:10:41,653 DEBUG
[transport-thread-744a974a-2811-4f79-ac63-f32daf005d7f-p4-t6]
(DelegatingBasicLogger.java:409) - Recovered 2 partition(s) for cache BacnetAdapterCache:
[CacheTopology{id=21, rebalanceId=6, currentCH=DefaultConsistentHash{ns=256, owners =
(6)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 40+45,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 45+37,
fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 45+42,
398c8c9b-5e02-4699-956c-52a55802d546-42840: 40+40,
41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 41+51,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 45+41]}, pendingCH=null, unionCH=null,
phase=NO_REBALANCE, actualMembers=[85d60140-6da8-4618-9540-44d0bdec2b68-42234,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553,
398c8c9b-5e02-4699-956c-52a55802d546-42840, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469],
persistentUUIDs=[24211d0a-66bc-45c5-8ec7-4ec93d4ab8bb,
4ad1da72-5f44-4ec9-8c8c-d132501b4e20, 262918d4-fa1d-4b80-9ed3-2deb76736036,
860e5eb9-964e-4461-936f-810a8fa3bfd6, ae77f87c-722a-43cd-8041-fc5d660ccf44,
f617d3d9-e370-4efb-b000-ce0d35d04b73]}, CacheTopology{id=5, rebalanceId=2,
currentCH=DefaultConsistentHash{ns=256, owners =
(2)[744a974a-2811-4f79-ac63-f32daf005d7f-60108: 127+129,
41099d73-b9a6-4daf-857b-7436feb5f0a1-48336: 129+127]}, pendingCH=null, unionCH=null,
phase=NO_REBALANCE, actualMembers=[744a974a-2811-4f79-ac63-f32daf005d7f-60108,
41099d73-b9a6-4daf-857b-7436feb5f0a1-48336],
persistentUUIDs=[e4ecc1db-b374-46e6-b5a0-a0d52eaf538a,
a1534115-8f12-4d8f-ad5a-f7f1c28f582d]}]
2018-03-09 11:10:41,905 DEBUG
[transport-thread-744a974a-2811-4f79-ac63-f32daf005d7f-p4-t6]
(ClusterCacheStatus.java:180) - Updating topologies after merge for cache
BacnetAdapterCache, current topology = CacheTopology{id=22, rebalanceId=7,
currentCH=DefaultConsistentHash{ns=256, owners =
(6)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 40+45,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 45+37,
fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 45+42,
398c8c9b-5e02-4699-956c-52a55802d546-42840: 40+40,
41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 41+51,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 45+41]},
pendingCH=DefaultConsistentHash{ns=256, owners =
(8)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 29+31,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 33+30,
fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 33+33,
398c8c9b-5e02-4699-956c-52a55802d546-42840: 29+30,
41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 30+31,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 34+32,
744a974a-2811-4f79-ac63-f32daf005d7f-60108: 34+34,
41099d73-b9a6-4daf-857b-7436feb5f0a1-48336: 34+35]}, unionCH=DefaultConsistentHash{ns=256,
owners = (8)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 29+31,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 33+30,
fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 33+33,
398c8c9b-5e02-4699-956c-52a55802d546-42840: 29+30,
41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 30+31,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 34+32,
744a974a-2811-4f79-ac63-f32daf005d7f-60108: 34+34,
41099d73-b9a6-4daf-857b-7436feb5f0a1-48336: 34+35]}, phase=CONFLICT_RESOLUTION,
actualMembers=[85d60140-6da8-4618-9540-44d0bdec2b68-42234,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553,
398c8c9b-5e02-4699-956c-52a55802d546-42840, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469, 744a974a-2811-4f79-ac63-f32daf005d7f-60108,
41099d73-b9a6-4daf-857b-7436feb5f0a1-48336],
persistentUUIDs=[24211d0a-66bc-45c5-8ec7-4ec93d4ab8bb,
4ad1da72-5f44-4ec9-8c8c-d132501b4e20, 262918d4-fa1d-4b80-9ed3-2deb76736036,
860e5eb9-964e-4461-936f-810a8fa3bfd6, ae77f87c-722a-43cd-8041-fc5d660ccf44,
f617d3d9-e370-4efb-b000-ce0d35d04b73, e4ecc1db-b374-46e6-b5a0-a0d52eaf538a,
a1534115-8f12-4d8f-ad5a-f7f1c28f582d]}, stable topology = CacheTopology{id=21,
rebalanceId=6, currentCH=DefaultConsistentHash{ns=256, owners =
(6)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 40+45,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 45+37,
fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 45+42,
398c8c9b-5e02-4699-956c-52a55802d546-42840: 40+40,
41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 41+51,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 45+41]}, pendingCH=null, unionCH=null,
phase=NO_REBALANCE, actualMembers=[85d60140-6da8-4618-9540-44d0bdec2b68-42234,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553,
398c8c9b-5e02-4699-956c-52a55802d546-42840, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469],
persistentUUIDs=[24211d0a-66bc-45c5-8ec7-4ec93d4ab8bb,
4ad1da72-5f44-4ec9-8c8c-d132501b4e20, 262918d4-fa1d-4b80-9ed3-2deb76736036,
860e5eb9-964e-4461-936f-810a8fa3bfd6, ae77f87c-722a-43cd-8041-fc5d660ccf44,
f617d3d9-e370-4efb-b000-ce0d35d04b73]}, availability mode = null, resolveConflicts = true
2018-03-09 11:10:41,960 ERROR
[transport-thread-744a974a-2811-4f79-ac63-f32daf005d7f-p4-t6]
(ClusterCacheStatus.java:599) - ISPN000228: Failed to recover cache BacnetAdapterCache
state after the current node became the coordinator
org.infinispan.IllegalLifecycleStateException: Cache container has been stopped and cannot
be reused. Recreate the cache container.
at
org.infinispan.manager.DefaultCacheManager.assertIsNotTerminated(DefaultCacheManager.java:956)
~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
at
org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:452)
~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:448)
~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:434)
~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
at
org.infinispan.topology.ClusterCacheStatus.updateTopologiesAfterMerge(ClusterCacheStatus.java:191)
~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
at
org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.onPartitionMerge(PreferAvailabilityStrategy.java:197)
~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
at
org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:597)
~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
at
org.infinispan.topology.ClusterTopologyManagerImpl.lambda$recoverClusterStatus$6(ClusterTopologyManagerImpl.java:528)
~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
{noformat}
The other nodes also log a {{NullPointerException}}, I think it happens when they receive
two conflict resolution topology updates in a row:
{noformat}
2018-03-09 11:10:42,360 DEBUG
[transport-thread-41099d73-b9a6-4daf-857b-7436feb5f0a1-p4-t16]
(LocalTopologyManagerImpl.java:394) - Updating local topology for cache
BacnetAdapterCache: CacheTopology{id=22, rebalanceId=7,
currentCH=DefaultConsistentHash{ns=256, owners =
(6)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 40+45,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 45+37,
fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 45+42,
398c8c9b-5e02-4699-956c-52a55802d546-42840: 40+40,
41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 41+51,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 45+41]},
pendingCH=DefaultConsistentHash{ns=256, owners =
(8)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 29+31,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 33+30,
fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 33+33,
398c8c9b-5e02-4699-956c-52a55802d546-42840: 29+30,
41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 30+31,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 34+32,
744a974a-2811-4f79-ac63-f32daf005d7f-60108: 34+34,
41099d73-b9a6-4daf-857b-7436feb5f0a1-48336: 34+35]}, unionCH=null,
phase=CONFLICT_RESOLUTION, actualMembers=[85d60140-6da8-4618-9540-44d0bdec2b68-42234,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553,
398c8c9b-5e02-4699-956c-52a55802d546-42840, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469, 744a974a-2811-4f79-ac63-f32daf005d7f-60108,
41099d73-b9a6-4daf-857b-7436feb5f0a1-48336],
persistentUUIDs=[24211d0a-66bc-45c5-8ec7-4ec93d4ab8bb,
4ad1da72-5f44-4ec9-8c8c-d132501b4e20, 262918d4-fa1d-4b80-9ed3-2deb76736036,
860e5eb9-964e-4461-936f-810a8fa3bfd6, ae77f87c-722a-43cd-8041-fc5d660ccf44,
f617d3d9-e370-4efb-b000-ce0d35d04b73, e4ecc1db-b374-46e6-b5a0-a0d52eaf538a,
a1534115-8f12-4d8f-ad5a-f7f1c28f582d]}
2018-03-09 11:10:42,881 DEBUG
[transport-thread-41099d73-b9a6-4daf-857b-7436feb5f0a1-p4-t23]
(LocalTopologyManagerImpl.java:394) - Updating local topology for cache
BacnetAdapterCache: CacheTopology{id=23, rebalanceId=7,
currentCH=DefaultConsistentHash{ns=256, owners =
(6)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 40+45,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 45+37,
fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 45+42,
398c8c9b-5e02-4699-956c-52a55802d546-42840: 40+40,
41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 41+51,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 45+41]},
pendingCH=DefaultConsistentHash{ns=256, owners =
(6)[85d60140-6da8-4618-9540-44d0bdec2b68-42234: 33+31,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678: 43+20,
fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553: 44+23,
398c8c9b-5e02-4699-956c-52a55802d546-42840: 43+17,
41fb370e-10b9-4e8e-b139-95643c2450ce-40703: 44+21,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469: 49+19]}, unionCH=null,
phase=CONFLICT_RESOLUTION, actualMembers=[85d60140-6da8-4618-9540-44d0bdec2b68-42234,
ed4eb9fb-1c0a-4822-9acb-6af063df4ba6-56678, fb4843d6-4073-4890-9afb-c6f8220cc2bd-60553,
398c8c9b-5e02-4699-956c-52a55802d546-42840, 41fb370e-10b9-4e8e-b139-95643c2450ce-40703,
b2fd8ffe-5322-4bdc-909d-a418906cebae-7469, 41099d73-b9a6-4daf-857b-7436feb5f0a1-48336],
persistentUUIDs=[24211d0a-66bc-45c5-8ec7-4ec93d4ab8bb,
4ad1da72-5f44-4ec9-8c8c-d132501b4e20, 262918d4-fa1d-4b80-9ed3-2deb76736036,
860e5eb9-964e-4461-936f-810a8fa3bfd6, ae77f87c-722a-43cd-8041-fc5d660ccf44,
f617d3d9-e370-4efb-b000-ce0d35d04b73, a1534115-8f12-4d8f-ad5a-f7f1c28f582d]}
2018-03-09 11:10:42,895 ERROR
[transport-thread-41099d73-b9a6-4daf-857b-7436feb5f0a1-p4-t23]
(LocalTopologyManagerImpl.java:272) - ISPN000452: Failed to update topology for cache
BacnetAdapterCache
java.lang.NullPointerException
at
org.infinispan.distribution.ch.impl.SyncConsistentHashFactory.union(SyncConsistentHashFactory.java:152)
~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
at
org.infinispan.distribution.ch.impl.SyncConsistentHashFactory.union(SyncConsistentHashFactory.java:53)
~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
at
org.infinispan.topology.LocalTopologyManagerImpl.doHandleTopologyUpdate(LocalTopologyManagerImpl.java:342)
~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
at
org.infinispan.topology.LocalTopologyManagerImpl.lambda$handleTopologyUpdate$1(LocalTopologyManagerImpl.java:270)
~[infinispan-embedded-9.1.4.Final.jar:9.1.4.Final]
{noformat}
2 subclusters failed to merge to 1 cluster -
IllegalLifecycleStateException
---------------------------------------------------------------------------
Key: ISPN-8976
URL:
https://issues.jboss.org/browse/ISPN-8976
Project: Infinispan
Issue Type: Bug
Affects Versions: 9.1.4.Final
Reporter: Robert Cernak
Attachments: logs.zip
At the beginning I have main cluster consisted of 8 nodes.
Then I disconnected main switch on which these nodes were connected.
This leaded to separating main cluster to 2 subclusters - first with 2 nodes and second
with 6 nodes. This was expected.
After that I rebooted the nodes. After reboot, nodes again correctly formed 2 subclusters
with 2 and 6 members.
After a long time when all nodes were stable with low cpu load, I connected the main
switch back which should lead to recreation of main cluster with 8 controllers.
However main cluster did not recovered:
subcluster2 did not change - still had 6 nodes connected - no new members
subcluster1 - nodes did not connect with subcluster2 and after cca 30min they left the
cluster.
When I checked infinispan logs of node1 from 1st subcluster I had
IllegalLifecycleStateException for every created cache (see included logs.zip):
[transport-thread-744a974a-2811-4f79-ac63-f32daf005d7f-p4-t6]
(ClusterCacheStatus.java:599) - ISPN000228: Failed to recover cache XXX state after the
current node became the coordinator
org.infinispan.IllegalLifecycleStateException: Cache container has been stopped and
cannot be reused. Recreate the cache container.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)