[JBoss JIRA] (ISPN-9189) ExceptionEvictionTest test failure causes testng to panic
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9189?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9189:
-------------------------------
Status: Open (was: New)
> ExceptionEvictionTest test failure causes testng to panic
> ---------------------------------------------------------
>
> Key: ISPN-9189
> URL: https://issues.jboss.org/browse/ISPN-9189
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 9.3.0.CR1
> Reporter: Gustavo Fernandes
> Assignee: Dan Berindei
>
> A single test failure is apparently causing 300+ cascade failures due to cleanup issues on testng. The first logged failure is:
> {noformat}
> ERROR] clearContent(org.infinispan.eviction.impl.ExceptionEvictionTest) Time elapsed: 0.022 s <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<1>
> at org.testng.AssertJUnit.fail(AssertJUnit.java:59)
> at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364)
> at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80)
> at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:170)
> at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:177)
> at org.infinispan.eviction.impl.ExceptionEvictionTest.clearContent(ExceptionEvictionTest.java:197)
> {noformat}
> Followed by 300+ occcurences of:
> {noformat}
> [ERROR] testExceptionOnInsert[REPL_SYNC, nodeCount=3, storageType=OBJECT, optimisticTransaction=false](org.infinispan.eviction.impl.ExceptionEvictionTest) Time elapsed: 0.001 s <<< FAILURE!
> java.lang.NullPointerException
> at org.infinispan.commons.test.RunningTestsRegistry.unregisterThreadWithTest(RunningTestsRegistry.java:39)
> at org.infinispan.commons.test.TestNGTestListener.onConfigurationFailure(TestNGTestListener.java:142)
> at org.testng.internal.Invoker.runConfigurationListeners(Invoker.java:1659)
> at org.testng.internal.Invoker.handleConfigurationFailure(Invoker.java:299)
> at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:237)
> {noformat}
> The related build is https://ci.infinispan.org/job/Infinispan/job/master/625/
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (ISPN-9189) ExceptionEvictionTest test failure causes testng to panic
by Gustavo Fernandes (JIRA)
Gustavo Fernandes created ISPN-9189:
---------------------------------------
Summary: ExceptionEvictionTest test failure causes testng to panic
Key: ISPN-9189
URL: https://issues.jboss.org/browse/ISPN-9189
Project: Infinispan
Issue Type: Bug
Components: Test Suite - Core
Affects Versions: 9.3.0.CR1
Reporter: Gustavo Fernandes
A single test failure is apparently causing 300+ cascade failures due to cleanup issues on testng. The first logged failure is:
{noformat}
ERROR] clearContent(org.infinispan.eviction.impl.ExceptionEvictionTest) Time elapsed: 0.022 s <<< FAILURE!
java.lang.AssertionError: expected:<0> but was:<1>
at org.testng.AssertJUnit.fail(AssertJUnit.java:59)
at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364)
at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80)
at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:170)
at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:177)
at org.infinispan.eviction.impl.ExceptionEvictionTest.clearContent(ExceptionEvictionTest.java:197)
{noformat}
Followed by 300+ occcurences of:
{noformat}
[ERROR] testExceptionOnInsert[REPL_SYNC, nodeCount=3, storageType=OBJECT, optimisticTransaction=false](org.infinispan.eviction.impl.ExceptionEvictionTest) Time elapsed: 0.001 s <<< FAILURE!
java.lang.NullPointerException
at org.infinispan.commons.test.RunningTestsRegistry.unregisterThreadWithTest(RunningTestsRegistry.java:39)
at org.infinispan.commons.test.TestNGTestListener.onConfigurationFailure(TestNGTestListener.java:142)
at org.testng.internal.Invoker.runConfigurationListeners(Invoker.java:1659)
at org.testng.internal.Invoker.handleConfigurationFailure(Invoker.java:299)
at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:237)
{noformat}
The related build is https://ci.infinispan.org/job/Infinispan/job/master/625/
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (ISPN-9188) Cancelling the last segment doesn't remove transfer task
by Dan Berindei (JIRA)
Dan Berindei created ISPN-9188:
----------------------------------
Summary: Cancelling the last segment doesn't remove transfer task
Key: ISPN-9188
URL: https://issues.jboss.org/browse/ISPN-9188
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.3.0.Beta1
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Critical
Fix For: 9.3.0.CR1
If an {{InboundTransferTask}} has some completed segments and the last segment in an is cancelled with {{cancelSegments()}} (because the local node is no longer an owner), the task is never completed or removed from the list of active transfers.
In the log below, NodeA adds segment 243 in topology 10. The transfer fails because NodeB shut down, so it's retried in topology 11 from NodeC.,
{noformat}
13:19:17,057 TRACE (transport-thread-Test-NodeA-p22712-t2:[Topology-cluster-listener]) [StateConsumerImpl] Received new topology for cache cluster-listener, isRebalance = true, isMember = true, topology = CacheTopology{id=10, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-53146: 79+82, Test-NodeB-56589: 90+84, Test-NodeC-22289: 87+90]}, pendingCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (4)[Test-NodeA-53146: 55+54, Test-NodeB-56589: 67+60, Test-NodeC-22289: 67+69, Test-NodeD-8561: 67+73]}, unionCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (4)[Test-NodeA-53146: 79+87, Test-NodeB-56589: 90+85, Test-NodeC-22289: 87+93, Test-NodeD-8561: 0+140]}, actualMembers=[Test-NodeA-53146, Test-NodeB-56589, Test-NodeC-22289, Test-NodeD-8561], persistentUUIDs=[dff2268e-e59e-48cc-809d-dcfe0774d1d8, 01379420-7999-4abf-abd8-2777454c035a, e9090b94-dc69-49d8-a2dc-ae0ad7f18fe0, 8edade14-6827-4b67-93d2-49b93844d8ad]}
13:19:17,097 TRACE (transport-thread-Test-NodeA-p22712-t2:[Topology-cluster-listener]) [StateConsumerImpl] On cache cluster-listener we have: added segments: {18 71 91-92 243}; removed segments: {}
13:19:17,097 TRACE (transport-thread-Test-NodeA-p22712-t2:[Topology-cluster-listener]) [StateConsumerImpl] Adding transfer from Test-NodeB-56589 for segments {18 71 91 243}
13:19:17,098 TRACE (transport-thread-Test-NodeA-p22712-t2:[Topology-cluster-listener]) [StateConsumerImpl] Adding transfer from Test-NodeC-22289 for segments {92}
13:19:17,107 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] Received new topology for cache cluster-listener, isRebalance = false, isMember = true, topology = CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[Test-NodeA-53146: 122+39, Test-NodeC-22289: 134+43]}, pendingCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[Test-NodeA-53146: 120+31, Test-NodeC-22289: 136+42]}, unionCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[Test-NodeA-53146: 122+60, Test-NodeC-22289: 134+71]}, actualMembers=[Test-NodeA-53146, Test-NodeC-22289, Test-NodeD-8561], persistentUUIDs=[dff2268e-e59e-48cc-809d-dcfe0774d1d8, e9090b94-dc69-49d8-a2dc-ae0ad7f18fe0, 8edade14-6827-4b67-93d2-49b93844d8ad]}
13:19:17,140 TRACE (stateTransferExecutor-thread-Test-NodeA-p22713-t1:[StateRequest-cluster-listener]) [StateConsumerImpl] Inbound transfer finished: InboundTransferTask{segments={18 71 91 243}, finishedSegments={}, unfinishedSegments={18 71 91 243}, source=Test-NodeB-56589, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@5237f653[Completed exceptionally], topologyId=10, timeout=240000, cacheName=cluster-listener}
13:19:17,156 TRACE (remote-thread-Test-NodeA-p22710-t5:[cluster-listener]) [StateConsumerImpl] Segments not received yet for cache cluster-listener: {Test-NodeB-56589=[InboundTransferTask{segments={18 71 91 243}, finishedSegments={}, unfinishedSegments={18 71 91 243}, source=Test-NodeB-56589, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@5237f653[Completed exceptionally], topologyId=10, timeout=240000, cacheName=cluster-listener}]}
13:19:17,273 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] Removing inbound transfers from node Test-NodeB-56589 for segments {18 71 91 243}
13:19:17,274 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] Adding transfer from Test-NodeC-22289 for segments {18 71 91 243}
13:19:17,283 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] Received new topology for cache cluster-listener, isRebalance = true, isMember = true, topology = CacheTopology{id=12, phase=READ_OLD_WRITE_ALL, rebalanceId=5, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[Test-NodeA-53146: 122+39, Test-NodeC-22289: 134+43]}, pendingCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-53146: 76+78, Test-NodeC-22289: 90+80, Test-NodeD-8561: 90+98]}, unionCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-53146: 122+75, Test-NodeC-22289: 134+73, Test-NodeD-8561: 0+188]}, actualMembers=[Test-NodeA-53146, Test-NodeC-22289, Test-NodeD-8561], persistentUUIDs=[dff2268e-e59e-48cc-809d-dcfe0774d1d8, e9090b94-dc69-49d8-a2dc-ae0ad7f18fe0, 8edade14-6827-4b67-93d2-49b93844d8ad]}
13:19:17,285 WARN (remote-thread-Test-NodeA-p22710-t6:[cluster-listener]) [StateConsumerImpl] Discarding received cache entries for segment 243 of cache cluster-listener because they do not belong to this node.
13:19:17,286 TRACE (remote-thread-Test-NodeA-p22710-t6:[cluster-listener]) [StateConsumerImpl] Segments not received yet for cache cluster-listener: {Test-NodeC-22289=[InboundTransferTask{segments={18 71 91 243}, finishedSegments={18 71 91}, unfinishedSegments={243}, source=Test-NodeC-22289, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@2755cd[Not completed, 1 dependents], topologyId=11, timeout=240000, cacheName=cluster-listener}]}
13:19:17,428 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] On cache cluster-listener we have: added segments: {45 54-55 63 90 95-98 114-115 119 138-140 155-156 167 174-175 205 220-222}; removed segments: {30 66-67 77 93 190-192 243}
13:19:17,429 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [InboundTransferTask] Partially cancelling inbound state transfer from node Test-NodeC-22289, segments {243}
13:19:17,429 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] Adding transfer from Test-NodeC-22289 for segments {45 54-55 63 90 95-98 114-115 119 138-140 155-156 167 174-175 205 220-222}
13:19:17,430 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] notifyEndOfStateTransferIfNeeded: no active transfers
{noformat}
The last message is incorrect: the rebalance phase is not confirmed because there are 2 active transfers from NodeC: the 243 one, which wasn't completely cancelled, and the 45...222 one, which hasn't yet started.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (ISPN-9187) Lost segments logged when node leaves during rebalance
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9187?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9187:
-------------------------------
Sprint: Sprint 9.3.0.CR1
> Lost segments logged when node leaves during rebalance
> ------------------------------------------------------
>
> Key: ISPN-9187
> URL: https://issues.jboss.org/browse/ISPN-9187
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.3.Final, 9.3.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 9.3.0.CR1
>
>
> When a node leaves during rebalance, we remove the leaver from the current CH and from the pending CH with {{ConsistentHashFactory.updateMembers()}}. However, since the 4-phase rebalance changes joiners are also removed from the pending CH.
> In the following log numOwners=2 and pending CH had one segment (239) owned by a joiner (D) and a leaver (B). The updated pending CH doesn't have either B or D, this means both owners are lost and random owners (C and A) are elected. A sees that it was allocated new segments from {{updateMembers}} and logs that the segment has been lost:
> {noformat}
> 08:53:01,528 INFO (remote-thread-test-NodeA-p2-t6:[cluster-listener]) [CLUSTER] [Context=cluster-listener] ISPN100002: Starting rebalance with members [test-NodeA-15001, test-NodeB-55628, test-NodeC-62395, test-NodeD-29215], phase READ_OLD_WRITE_ALL, topology id 10
> 08:53:01,554 TRACE (transport-thread-test-NodeD-p28-t2:[Topology-cluster-listener]) [CacheTopology] Current consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 244-245 248-250 252 254}, backup: {... 235-236 243 246-247 251 253}
> test-NodeB-55628 primary: {... 231 234 240 242}, backup: {... 232-233 239 241 255}
> test-NodeC-62395 primary: {... 232-233 235-236 239 241 243 246-247 251 253 255}, backup: {... 231 234 237-238 240 242 244-245 248-250 252 254}
> 08:53:01,554 TRACE (transport-thread-test-NodeD-p28-t2:[Topology-cluster-listener]) [CacheTopology] Pending consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 245 248 252}, backup: {... 235-236 244 246-247 251}
> test-NodeB-55628 primary: {... 231 240 242}, backup: {... 230 239 241}
> test-NodeC-62395 primary: {... 232-233 235-236 241 243 246-247 251 253 255}, backup: {... 231 234 240 242 245 249-250 252 254}
> test-NodeD-29215 primary: {... 234 239 244 249-250 254}, backup: {... 232-233 237-238 243 248 253 255}
> 08:53:01,606 TRACE (remote-thread-test-NodeA-p2-t5:[cluster-listener]) [ClusterCacheStatus] Removed node test-NodeB-55628 from cache cluster-listener: members = [test-NodeA-15001, test-NodeC-62395, test-NodeD-29215], joiners = [test-NodeD-29215]
> 08:53:01,611 TRACE (remote-thread-test-NodeA-p2-t5:[cluster-listener]) [CacheTopology] Current consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 244-245 248-250 252 254}, backup: {... 235-236 243 246-247 251 253}
> test-NodeC-62395 primary: {... 230-236 239-243 246-247 251 253 255}, backup: {... 237-238 244-245 248-250 252 254}
> 08:53:01,611 TRACE (remote-thread-test-NodeA-p2-t5:[cluster-listener]) [CacheTopology] Pending consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 244-245 248 252}, backup: {... 235-236 239 246-247 251}
> test-NodeC-62395 primary: {... 227-236 239-243 246-247 249-251 253-255}, backup: {... 226 245 252}
> 08:53:01,613 TRACE (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-15001: 134+45, test-NodeC-62395: 122+50]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-15001: 129+45, test-NodeC-62395: 127+28]}, unionCH=DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-15001: 134+62, test-NodeC-62395: 122+68]}, actualMembers=[test-NodeA-15001, test-NodeC-62395, test-NodeD-29215], persistentUUIDs=[0506cc27-9762-4703-ad56-6a3bf7953529, c365b93f-e46c-4f11-ab46-6cafa2b2d92b, d3f21b0d-07f2-4089-b160-f754e719de83]} on cache cluster-listener
> 08:53:01,662 TRACE (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateConsumerImpl] On cache cluster-listener we have: new segments: {1-18 21 30-53 56-63 67 69-73 75-78 81-85 88-101 107-115 117-118 125-134 136-139 142-156 158-165 168-170 172-174 177-179 181-188 193-211 214-226 228-229 235-239 243-254}; old segments: {1-15 30-53 56-63 69-73 75-77 81-85 88-101 107-115 117-118 125-131 136-139 142-145 150-156 158-165 168-170 172-174 177-179 183-188 193-211 214-218 220-226 228-229 235-238 243-254}
> 08:53:01,663 TRACE (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateConsumerImpl] On cache cluster-listener we have: added segments: {16-18 21 67 78 132-134 146-149 181-182 219 239}; removed segments: {}
> 08:53:01,663 DEBUG (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateConsumerImpl] Not requesting segments {16-18 21 67 78 132-134 146-149 181-182 219 239} because the last owner left the cluster
> {noformat}
> There isn't any visible inconsistency: A only owns segment 239 for writing, and the coordinator immediately starts a new rebalance, ignoring the pending CH that it sent out earlier. However, the new rebalance causes its own problems: ISPN-8240.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (ISPN-9187) Lost segments logged when node leaves during rebalance
by Dan Berindei (JIRA)
Dan Berindei created ISPN-9187:
----------------------------------
Summary: Lost segments logged when node leaves during rebalance
Key: ISPN-9187
URL: https://issues.jboss.org/browse/ISPN-9187
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.3.0.Beta1, 9.2.3.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 9.3.0.CR1
When a node leaves during rebalance, we remove the leaver from the current CH and from the pending CH with {{ConsistentHashFactory.updateMembers()}}. However, since the 4-phase rebalance changes joiners are also removed from the pending CH.
In the following log numOwners=2 and pending CH had one segment (239) owned by a joiner (D) and a leaver (B). The updated pending CH doesn't have either B or D, this means both owners are lost and random owners (C and A) are elected. A sees that it was allocated new segments from {{updateMembers}} and logs that the segment has been lost:
{noformat}
08:53:01,528 INFO (remote-thread-test-NodeA-p2-t6:[cluster-listener]) [CLUSTER] [Context=cluster-listener] ISPN100002: Starting rebalance with members [test-NodeA-15001, test-NodeB-55628, test-NodeC-62395, test-NodeD-29215], phase READ_OLD_WRITE_ALL, topology id 10
08:53:01,554 TRACE (transport-thread-test-NodeD-p28-t2:[Topology-cluster-listener]) [CacheTopology] Current consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 244-245 248-250 252 254}, backup: {... 235-236 243 246-247 251 253}
test-NodeB-55628 primary: {... 231 234 240 242}, backup: {... 232-233 239 241 255}
test-NodeC-62395 primary: {... 232-233 235-236 239 241 243 246-247 251 253 255}, backup: {... 231 234 237-238 240 242 244-245 248-250 252 254}
08:53:01,554 TRACE (transport-thread-test-NodeD-p28-t2:[Topology-cluster-listener]) [CacheTopology] Pending consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 245 248 252}, backup: {... 235-236 244 246-247 251}
test-NodeB-55628 primary: {... 231 240 242}, backup: {... 230 239 241}
test-NodeC-62395 primary: {... 232-233 235-236 241 243 246-247 251 253 255}, backup: {... 231 234 240 242 245 249-250 252 254}
test-NodeD-29215 primary: {... 234 239 244 249-250 254}, backup: {... 232-233 237-238 243 248 253 255}
08:53:01,606 TRACE (remote-thread-test-NodeA-p2-t5:[cluster-listener]) [ClusterCacheStatus] Removed node test-NodeB-55628 from cache cluster-listener: members = [test-NodeA-15001, test-NodeC-62395, test-NodeD-29215], joiners = [test-NodeD-29215]
08:53:01,611 TRACE (remote-thread-test-NodeA-p2-t5:[cluster-listener]) [CacheTopology] Current consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 244-245 248-250 252 254}, backup: {... 235-236 243 246-247 251 253}
test-NodeC-62395 primary: {... 230-236 239-243 246-247 251 253 255}, backup: {... 237-238 244-245 248-250 252 254}
08:53:01,611 TRACE (remote-thread-test-NodeA-p2-t5:[cluster-listener]) [CacheTopology] Pending consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 244-245 248 252}, backup: {... 235-236 239 246-247 251}
test-NodeC-62395 primary: {... 227-236 239-243 246-247 249-251 253-255}, backup: {... 226 245 252}
08:53:01,613 TRACE (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-15001: 134+45, test-NodeC-62395: 122+50]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-15001: 129+45, test-NodeC-62395: 127+28]}, unionCH=DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-15001: 134+62, test-NodeC-62395: 122+68]}, actualMembers=[test-NodeA-15001, test-NodeC-62395, test-NodeD-29215], persistentUUIDs=[0506cc27-9762-4703-ad56-6a3bf7953529, c365b93f-e46c-4f11-ab46-6cafa2b2d92b, d3f21b0d-07f2-4089-b160-f754e719de83]} on cache cluster-listener
08:53:01,662 TRACE (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateConsumerImpl] On cache cluster-listener we have: new segments: {1-18 21 30-53 56-63 67 69-73 75-78 81-85 88-101 107-115 117-118 125-134 136-139 142-156 158-165 168-170 172-174 177-179 181-188 193-211 214-226 228-229 235-239 243-254}; old segments: {1-15 30-53 56-63 69-73 75-77 81-85 88-101 107-115 117-118 125-131 136-139 142-145 150-156 158-165 168-170 172-174 177-179 183-188 193-211 214-218 220-226 228-229 235-238 243-254}
08:53:01,663 TRACE (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateConsumerImpl] On cache cluster-listener we have: added segments: {16-18 21 67 78 132-134 146-149 181-182 219 239}; removed segments: {}
08:53:01,663 DEBUG (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateConsumerImpl] Not requesting segments {16-18 21 67 78 132-134 146-149 181-182 219 239} because the last owner left the cluster
{noformat}
There isn't any visible inconsistency: A only owns segment 239 for writing, and the coordinator immediately starts a new rebalance, ignoring the pending CH that it sent out earlier. However, the new rebalance causes its own problems: ISPN-8240.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (ISPN-9187) Lost segments logged when node leaves during rebalance
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9187?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9187:
-------------------------------
Status: Open (was: New)
> Lost segments logged when node leaves during rebalance
> ------------------------------------------------------
>
> Key: ISPN-9187
> URL: https://issues.jboss.org/browse/ISPN-9187
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.3.Final, 9.3.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 9.3.0.CR1
>
>
> When a node leaves during rebalance, we remove the leaver from the current CH and from the pending CH with {{ConsistentHashFactory.updateMembers()}}. However, since the 4-phase rebalance changes joiners are also removed from the pending CH.
> In the following log numOwners=2 and pending CH had one segment (239) owned by a joiner (D) and a leaver (B). The updated pending CH doesn't have either B or D, this means both owners are lost and random owners (C and A) are elected. A sees that it was allocated new segments from {{updateMembers}} and logs that the segment has been lost:
> {noformat}
> 08:53:01,528 INFO (remote-thread-test-NodeA-p2-t6:[cluster-listener]) [CLUSTER] [Context=cluster-listener] ISPN100002: Starting rebalance with members [test-NodeA-15001, test-NodeB-55628, test-NodeC-62395, test-NodeD-29215], phase READ_OLD_WRITE_ALL, topology id 10
> 08:53:01,554 TRACE (transport-thread-test-NodeD-p28-t2:[Topology-cluster-listener]) [CacheTopology] Current consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 244-245 248-250 252 254}, backup: {... 235-236 243 246-247 251 253}
> test-NodeB-55628 primary: {... 231 234 240 242}, backup: {... 232-233 239 241 255}
> test-NodeC-62395 primary: {... 232-233 235-236 239 241 243 246-247 251 253 255}, backup: {... 231 234 237-238 240 242 244-245 248-250 252 254}
> 08:53:01,554 TRACE (transport-thread-test-NodeD-p28-t2:[Topology-cluster-listener]) [CacheTopology] Pending consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 245 248 252}, backup: {... 235-236 244 246-247 251}
> test-NodeB-55628 primary: {... 231 240 242}, backup: {... 230 239 241}
> test-NodeC-62395 primary: {... 232-233 235-236 241 243 246-247 251 253 255}, backup: {... 231 234 240 242 245 249-250 252 254}
> test-NodeD-29215 primary: {... 234 239 244 249-250 254}, backup: {... 232-233 237-238 243 248 253 255}
> 08:53:01,606 TRACE (remote-thread-test-NodeA-p2-t5:[cluster-listener]) [ClusterCacheStatus] Removed node test-NodeB-55628 from cache cluster-listener: members = [test-NodeA-15001, test-NodeC-62395, test-NodeD-29215], joiners = [test-NodeD-29215]
> 08:53:01,611 TRACE (remote-thread-test-NodeA-p2-t5:[cluster-listener]) [CacheTopology] Current consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 244-245 248-250 252 254}, backup: {... 235-236 243 246-247 251 253}
> test-NodeC-62395 primary: {... 230-236 239-243 246-247 251 253 255}, backup: {... 237-238 244-245 248-250 252 254}
> 08:53:01,611 TRACE (remote-thread-test-NodeA-p2-t5:[cluster-listener]) [CacheTopology] Pending consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 244-245 248 252}, backup: {... 235-236 239 246-247 251}
> test-NodeC-62395 primary: {... 227-236 239-243 246-247 249-251 253-255}, backup: {... 226 245 252}
> 08:53:01,613 TRACE (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-15001: 134+45, test-NodeC-62395: 122+50]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-15001: 129+45, test-NodeC-62395: 127+28]}, unionCH=DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-15001: 134+62, test-NodeC-62395: 122+68]}, actualMembers=[test-NodeA-15001, test-NodeC-62395, test-NodeD-29215], persistentUUIDs=[0506cc27-9762-4703-ad56-6a3bf7953529, c365b93f-e46c-4f11-ab46-6cafa2b2d92b, d3f21b0d-07f2-4089-b160-f754e719de83]} on cache cluster-listener
> 08:53:01,662 TRACE (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateConsumerImpl] On cache cluster-listener we have: new segments: {1-18 21 30-53 56-63 67 69-73 75-78 81-85 88-101 107-115 117-118 125-134 136-139 142-156 158-165 168-170 172-174 177-179 181-188 193-211 214-226 228-229 235-239 243-254}; old segments: {1-15 30-53 56-63 69-73 75-77 81-85 88-101 107-115 117-118 125-131 136-139 142-145 150-156 158-165 168-170 172-174 177-179 183-188 193-211 214-218 220-226 228-229 235-238 243-254}
> 08:53:01,663 TRACE (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateConsumerImpl] On cache cluster-listener we have: added segments: {16-18 21 67 78 132-134 146-149 181-182 219 239}; removed segments: {}
> 08:53:01,663 DEBUG (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateConsumerImpl] Not requesting segments {16-18 21 67 78 132-134 146-149 181-182 219 239} because the last owner left the cluster
> {noformat}
> There isn't any visible inconsistency: A only owns segment 239 for writing, and the coordinator immediately starts a new rebalance, ignoring the pending CH that it sent out earlier. However, the new rebalance causes its own problems: ISPN-8240.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (ISPN-7521) java.lang.ClassCastException: org.infinispan.distribution.ch.impl.DefaultConsistentHash cannot be cast to org.infinispan.distribution.ch.impl.ReplicatedConsistentHash
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-7521?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes resolved ISPN-7521.
-------------------------------------
Resolution: Out of Date
> java.lang.ClassCastException: org.infinispan.distribution.ch.impl.DefaultConsistentHash cannot be cast to org.infinispan.distribution.ch.impl.ReplicatedConsistentHash
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-7521
> URL: https://issues.jboss.org/browse/ISPN-7521
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.CR1
> Reporter: Gustavo Fernandes
>
> This happened when running the full test suite, on test {{DistributedServerTaskDeploymentIT.testDeploy}}.
> Stacktrace:
> {noformat}
> 14:20:26,995 ERROR [org.infinispan.topology.LocalTopologyManagerImpl] (transport-thread--p4-t11) ISPN000230: Failed to start rebalance for cache memcachedCache: java.lang.ClassCastException: org.infinispan.distribution.ch.impl.DefaultConsistentHash cannot be cast to org.infinispan.distribution.ch.impl.ReplicatedConsistentHash
> at org.infinispan.distribution.ch.impl.SyncReplicatedConsistentHashFactory.union(SyncReplicatedConsistentHashFactory.java:26)
> at org.infinispan.topology.LocalTopologyManagerImpl.doHandleRebalance(LocalTopologyManagerImpl.java:486)
> at org.infinispan.topology.LocalTopologyManagerImpl.lambda$handleRebalance$3(LocalTopologyManagerImpl.java:450)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (ISPN-7521) java.lang.ClassCastException: org.infinispan.distribution.ch.impl.DefaultConsistentHash cannot be cast to org.infinispan.distribution.ch.impl.ReplicatedConsistentHash
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-7521?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes commented on ISPN-7521:
-----------------------------------------
I'm not seeing it for a while, closing...
> java.lang.ClassCastException: org.infinispan.distribution.ch.impl.DefaultConsistentHash cannot be cast to org.infinispan.distribution.ch.impl.ReplicatedConsistentHash
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-7521
> URL: https://issues.jboss.org/browse/ISPN-7521
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.CR1
> Reporter: Gustavo Fernandes
>
> This happened when running the full test suite, on test {{DistributedServerTaskDeploymentIT.testDeploy}}.
> Stacktrace:
> {noformat}
> 14:20:26,995 ERROR [org.infinispan.topology.LocalTopologyManagerImpl] (transport-thread--p4-t11) ISPN000230: Failed to start rebalance for cache memcachedCache: java.lang.ClassCastException: org.infinispan.distribution.ch.impl.DefaultConsistentHash cannot be cast to org.infinispan.distribution.ch.impl.ReplicatedConsistentHash
> at org.infinispan.distribution.ch.impl.SyncReplicatedConsistentHashFactory.union(SyncReplicatedConsistentHashFactory.java:26)
> at org.infinispan.topology.LocalTopologyManagerImpl.doHandleRebalance(LocalTopologyManagerImpl.java:486)
> at org.infinispan.topology.LocalTopologyManagerImpl.lambda$handleRebalance$3(LocalTopologyManagerImpl.java:450)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (ISPN-5323) Add a configuration option for partition handling to force DEGRADED mode
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5323?page=com.atlassian.jira.plugin.... ]
Dan Berindei resolved ISPN-5323.
--------------------------------
Resolution: Won't Fix
> Add a configuration option for partition handling to force DEGRADED mode
> ------------------------------------------------------------------------
>
> Key: ISPN-5323
> URL: https://issues.jboss.org/browse/ISPN-5323
> Project: Infinispan
> Issue Type: Feature Request
> Components: Core
> Reporter: Wolf-Dieter Fink
> Labels: partition_handling
>
> Current one part of the partition can be stay AVAILABLE and therefor it is not possible to read key's in a DEGRADED partition as it might be stale.
> But for some reason it might be worth to be able to read such keys.
> In that case it must be ensured that no partition enter AVAILABLE mode and rebalance.
> An additional configuration migt force DEGRADED mode for all partitions, in this case it will be possible to have read access to all keys where this part. does not have all owners AND have full access if all owners are available.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months
[JBoss JIRA] (ISPN-7521) java.lang.ClassCastException: org.infinispan.distribution.ch.impl.DefaultConsistentHash cannot be cast to org.infinispan.distribution.ch.impl.ReplicatedConsistentHash
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-7521?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-7521:
------------------------------------
[~gustavonalle] I'm not seeing any failures in CI, does the test still fail for you?
> java.lang.ClassCastException: org.infinispan.distribution.ch.impl.DefaultConsistentHash cannot be cast to org.infinispan.distribution.ch.impl.ReplicatedConsistentHash
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-7521
> URL: https://issues.jboss.org/browse/ISPN-7521
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.CR1
> Reporter: Gustavo Fernandes
>
> This happened when running the full test suite, on test {{DistributedServerTaskDeploymentIT.testDeploy}}.
> Stacktrace:
> {noformat}
> 14:20:26,995 ERROR [org.infinispan.topology.LocalTopologyManagerImpl] (transport-thread--p4-t11) ISPN000230: Failed to start rebalance for cache memcachedCache: java.lang.ClassCastException: org.infinispan.distribution.ch.impl.DefaultConsistentHash cannot be cast to org.infinispan.distribution.ch.impl.ReplicatedConsistentHash
> at org.infinispan.distribution.ch.impl.SyncReplicatedConsistentHashFactory.union(SyncReplicatedConsistentHashFactory.java:26)
> at org.infinispan.topology.LocalTopologyManagerImpl.doHandleRebalance(LocalTopologyManagerImpl.java:486)
> at org.infinispan.topology.LocalTopologyManagerImpl.lambda$handleRebalance$3(LocalTopologyManagerImpl.java:450)
> at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:144)
> at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:33)
> at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:174)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 7 months