[JBoss JIRA] (ISPN-9188) Cancelling the last segment doesn't remove transfer task
by Vittorio Rigamonti (JIRA)
[ https://issues.jboss.org/browse/ISPN-9188?page=com.atlassian.jira.plugin.... ]
Vittorio Rigamonti updated ISPN-9188:
-------------------------------------
Fix Version/s: 9.3.0.Final
(was: 9.3.0.CR1)
> Cancelling the last segment doesn't remove transfer task
> --------------------------------------------------------
>
> Key: ISPN-9188
> URL: https://issues.jboss.org/browse/ISPN-9188
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.3.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 9.3.0.Final
>
>
> If an {{InboundTransferTask}} has some completed segments and the last segment in an is cancelled with {{cancelSegments()}} (because the local node is no longer an owner), the task is never completed or removed from the list of active transfers.
> In the log below, NodeA adds segment 243 in topology 10. The transfer fails because NodeB shut down, so it's retried in topology 11 from NodeC.,
> {noformat}
> 13:19:17,057 TRACE (transport-thread-Test-NodeA-p22712-t2:[Topology-cluster-listener]) [StateConsumerImpl] Received new topology for cache cluster-listener, isRebalance = true, isMember = true, topology = CacheTopology{id=10, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-53146: 79+82, Test-NodeB-56589: 90+84, Test-NodeC-22289: 87+90]}, pendingCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (4)[Test-NodeA-53146: 55+54, Test-NodeB-56589: 67+60, Test-NodeC-22289: 67+69, Test-NodeD-8561: 67+73]}, unionCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (4)[Test-NodeA-53146: 79+87, Test-NodeB-56589: 90+85, Test-NodeC-22289: 87+93, Test-NodeD-8561: 0+140]}, actualMembers=[Test-NodeA-53146, Test-NodeB-56589, Test-NodeC-22289, Test-NodeD-8561], persistentUUIDs=[dff2268e-e59e-48cc-809d-dcfe0774d1d8, 01379420-7999-4abf-abd8-2777454c035a, e9090b94-dc69-49d8-a2dc-ae0ad7f18fe0, 8edade14-6827-4b67-93d2-49b93844d8ad]}
> 13:19:17,097 TRACE (transport-thread-Test-NodeA-p22712-t2:[Topology-cluster-listener]) [StateConsumerImpl] On cache cluster-listener we have: added segments: {18 71 91-92 243}; removed segments: {}
> 13:19:17,097 TRACE (transport-thread-Test-NodeA-p22712-t2:[Topology-cluster-listener]) [StateConsumerImpl] Adding transfer from Test-NodeB-56589 for segments {18 71 91 243}
> 13:19:17,098 TRACE (transport-thread-Test-NodeA-p22712-t2:[Topology-cluster-listener]) [StateConsumerImpl] Adding transfer from Test-NodeC-22289 for segments {92}
> 13:19:17,107 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] Received new topology for cache cluster-listener, isRebalance = false, isMember = true, topology = CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[Test-NodeA-53146: 122+39, Test-NodeC-22289: 134+43]}, pendingCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[Test-NodeA-53146: 120+31, Test-NodeC-22289: 136+42]}, unionCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[Test-NodeA-53146: 122+60, Test-NodeC-22289: 134+71]}, actualMembers=[Test-NodeA-53146, Test-NodeC-22289, Test-NodeD-8561], persistentUUIDs=[dff2268e-e59e-48cc-809d-dcfe0774d1d8, e9090b94-dc69-49d8-a2dc-ae0ad7f18fe0, 8edade14-6827-4b67-93d2-49b93844d8ad]}
> 13:19:17,140 TRACE (stateTransferExecutor-thread-Test-NodeA-p22713-t1:[StateRequest-cluster-listener]) [StateConsumerImpl] Inbound transfer finished: InboundTransferTask{segments={18 71 91 243}, finishedSegments={}, unfinishedSegments={18 71 91 243}, source=Test-NodeB-56589, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@5237f653[Completed exceptionally], topologyId=10, timeout=240000, cacheName=cluster-listener}
> 13:19:17,156 TRACE (remote-thread-Test-NodeA-p22710-t5:[cluster-listener]) [StateConsumerImpl] Segments not received yet for cache cluster-listener: {Test-NodeB-56589=[InboundTransferTask{segments={18 71 91 243}, finishedSegments={}, unfinishedSegments={18 71 91 243}, source=Test-NodeB-56589, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@5237f653[Completed exceptionally], topologyId=10, timeout=240000, cacheName=cluster-listener}]}
> 13:19:17,273 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] Removing inbound transfers from node Test-NodeB-56589 for segments {18 71 91 243}
> 13:19:17,274 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] Adding transfer from Test-NodeC-22289 for segments {18 71 91 243}
> 13:19:17,283 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] Received new topology for cache cluster-listener, isRebalance = true, isMember = true, topology = CacheTopology{id=12, phase=READ_OLD_WRITE_ALL, rebalanceId=5, currentCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (2)[Test-NodeA-53146: 122+39, Test-NodeC-22289: 134+43]}, pendingCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-53146: 76+78, Test-NodeC-22289: 90+80, Test-NodeD-8561: 90+98]}, unionCH=PartitionerConsistentHash:DefaultConsistentHash{ns=256, owners = (3)[Test-NodeA-53146: 122+75, Test-NodeC-22289: 134+73, Test-NodeD-8561: 0+188]}, actualMembers=[Test-NodeA-53146, Test-NodeC-22289, Test-NodeD-8561], persistentUUIDs=[dff2268e-e59e-48cc-809d-dcfe0774d1d8, e9090b94-dc69-49d8-a2dc-ae0ad7f18fe0, 8edade14-6827-4b67-93d2-49b93844d8ad]}
> 13:19:17,285 WARN (remote-thread-Test-NodeA-p22710-t6:[cluster-listener]) [StateConsumerImpl] Discarding received cache entries for segment 243 of cache cluster-listener because they do not belong to this node.
> 13:19:17,286 TRACE (remote-thread-Test-NodeA-p22710-t6:[cluster-listener]) [StateConsumerImpl] Segments not received yet for cache cluster-listener: {Test-NodeC-22289=[InboundTransferTask{segments={18 71 91 243}, finishedSegments={18 71 91}, unfinishedSegments={243}, source=Test-NodeC-22289, isCancelled=false, completionFuture=java.util.concurrent.CompletableFuture@2755cd[Not completed, 1 dependents], topologyId=11, timeout=240000, cacheName=cluster-listener}]}
> 13:19:17,428 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] On cache cluster-listener we have: added segments: {45 54-55 63 90 95-98 114-115 119 138-140 155-156 167 174-175 205 220-222}; removed segments: {30 66-67 77 93 190-192 243}
> 13:19:17,429 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [InboundTransferTask] Partially cancelling inbound state transfer from node Test-NodeC-22289, segments {243}
> 13:19:17,429 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] Adding transfer from Test-NodeC-22289 for segments {45 54-55 63 90 95-98 114-115 119 138-140 155-156 167 174-175 205 220-222}
> 13:19:17,430 TRACE (transport-thread-Test-NodeA-p22712-t1:[Topology-cluster-listener]) [StateConsumerImpl] notifyEndOfStateTransferIfNeeded: no active transfers
> {noformat}
> The last message is incorrect: the rebalance phase is not confirmed because there are 2 active transfers from NodeC: the 243 one, which wasn't completely cancelled, and the 45...222 one, which hasn't yet started.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 10 months
[JBoss JIRA] (ISPN-9190) Make trace logging faster by writing a separate log file for each test
by Vittorio Rigamonti (JIRA)
[ https://issues.jboss.org/browse/ISPN-9190?page=com.atlassian.jira.plugin.... ]
Vittorio Rigamonti updated ISPN-9190:
-------------------------------------
Fix Version/s: 9.3.0.Final
(was: 9.3.0.CR1)
> Make trace logging faster by writing a separate log file for each test
> ----------------------------------------------------------------------
>
> Key: ISPN-9190
> URL: https://issues.jboss.org/browse/ISPN-9190
> Project: Infinispan
> Issue Type: Enhancement
> Components: Build
> Affects Versions: 9.3.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Minor
> Fix For: 9.3.0.Final
>
>
> Logging each test to a separate file is cheaper because each test uses a separate appender, so there's no contention. The total size may also be smaller (with {{CompressedFileAppender}}) because the test name is repeated a lot and there isn't so much overlap between different tests.
> Most important, keeping only the logs of failed tests becomes much faster.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 10 months
[JBoss JIRA] (ISPN-9194) EntityRegionAccessStrategyTest.testRemoveAll failing randomly
by Vittorio Rigamonti (JIRA)
[ https://issues.jboss.org/browse/ISPN-9194?page=com.atlassian.jira.plugin.... ]
Vittorio Rigamonti updated ISPN-9194:
-------------------------------------
Fix Version/s: (was: 9.3.0.CR1)
> EntityRegionAccessStrategyTest.testRemoveAll failing randomly
> -------------------------------------------------------------
>
> Key: ISPN-9194
> URL: https://issues.jboss.org/browse/ISPN-9194
> Project: Infinispan
> Issue Type: Bug
> Components: Hibernate Cache
> Affects Versions: 9.3.0.Beta1
> Reporter: Galder Zamarreño
> Assignee: Galder Zamarreño
> Labels: testsuite_stability
> Fix For: 9.3.0.Final
>
> Attachments: ispn9194-log-v3.tgz, ispn9194-log-v4.tgz
>
>
> {code}
> [INFO] Running org.infinispan.test.hibernate.cache.commons.entity.EntityRegionAccessStrategyTest
> [ERROR] Tests run: 153, Failures: 1, Errors: 0, Skipped: 17, Time elapsed: 23.447 s <<< FAILURE! - in org.infinispan.test.hibernate.cache.commons.entity.EntityRegionAccessStrategyTest
> [ERROR] testRemoveAll[JTA, REPL_SYNC, AccessType[read-only]](org.infinispan.test.hibernate.cache.commons.entity.EntityRegionAccessStrategyTest) Time elapsed: 0.01 s <<< FAILURE!
> java.lang.AssertionError: expected:<VALUE1/1> but was:<null>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at org.infinispan.test.hibernate.cache.commons.AbstractRegionAccessStrategyTest.evictOrRemoveAllTest(AbstractRegionAccessStrategyTest.java:490)
> at org.infinispan.test.hibernate.cache.commons.AbstractRegionAccessStrategyTest.testRemoveAll(AbstractRegionAccessStrategyTest.java:433)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at org.hibernate.testing.junit4.ExtendedFrameworkMethod.invokeExplosively(ExtendedFrameworkMethod.java:45)
> at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 10 months
[JBoss JIRA] (ISPN-9187) Lost segments logged when node leaves during rebalance
by Vittorio Rigamonti (JIRA)
[ https://issues.jboss.org/browse/ISPN-9187?page=com.atlassian.jira.plugin.... ]
Vittorio Rigamonti updated ISPN-9187:
-------------------------------------
Fix Version/s: 9.3.0.Final
(was: 9.3.0.CR1)
> Lost segments logged when node leaves during rebalance
> ------------------------------------------------------
>
> Key: ISPN-9187
> URL: https://issues.jboss.org/browse/ISPN-9187
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.3.Final, 9.3.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 9.3.0.Final
>
>
> When a node leaves during rebalance, we remove the leaver from the current CH and from the pending CH with {{ConsistentHashFactory.updateMembers()}}. However, since the 4-phase rebalance changes joiners are also removed from the pending CH.
> In the following log numOwners=2 and pending CH had one segment (239) owned by a joiner (D) and a leaver (B). The updated pending CH doesn't have either B or D, this means both owners are lost and random owners (C and A) are elected. A sees that it was allocated new segments from {{updateMembers}} and logs that the segment has been lost:
> {noformat}
> 08:53:01,528 INFO (remote-thread-test-NodeA-p2-t6:[cluster-listener]) [CLUSTER] [Context=cluster-listener] ISPN100002: Starting rebalance with members [test-NodeA-15001, test-NodeB-55628, test-NodeC-62395, test-NodeD-29215], phase READ_OLD_WRITE_ALL, topology id 10
> 08:53:01,554 TRACE (transport-thread-test-NodeD-p28-t2:[Topology-cluster-listener]) [CacheTopology] Current consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 244-245 248-250 252 254}, backup: {... 235-236 243 246-247 251 253}
> test-NodeB-55628 primary: {... 231 234 240 242}, backup: {... 232-233 239 241 255}
> test-NodeC-62395 primary: {... 232-233 235-236 239 241 243 246-247 251 253 255}, backup: {... 231 234 237-238 240 242 244-245 248-250 252 254}
> 08:53:01,554 TRACE (transport-thread-test-NodeD-p28-t2:[Topology-cluster-listener]) [CacheTopology] Pending consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 245 248 252}, backup: {... 235-236 244 246-247 251}
> test-NodeB-55628 primary: {... 231 240 242}, backup: {... 230 239 241}
> test-NodeC-62395 primary: {... 232-233 235-236 241 243 246-247 251 253 255}, backup: {... 231 234 240 242 245 249-250 252 254}
> test-NodeD-29215 primary: {... 234 239 244 249-250 254}, backup: {... 232-233 237-238 243 248 253 255}
> 08:53:01,606 TRACE (remote-thread-test-NodeA-p2-t5:[cluster-listener]) [ClusterCacheStatus] Removed node test-NodeB-55628 from cache cluster-listener: members = [test-NodeA-15001, test-NodeC-62395, test-NodeD-29215], joiners = [test-NodeD-29215]
> 08:53:01,611 TRACE (remote-thread-test-NodeA-p2-t5:[cluster-listener]) [CacheTopology] Current consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 244-245 248-250 252 254}, backup: {... 235-236 243 246-247 251 253}
> test-NodeC-62395 primary: {... 230-236 239-243 246-247 251 253 255}, backup: {... 237-238 244-245 248-250 252 254}
> 08:53:01,611 TRACE (remote-thread-test-NodeA-p2-t5:[cluster-listener]) [CacheTopology] Pending consistent hash's routing table: test-NodeA-15001 primary: {... 237-238 244-245 248 252}, backup: {... 235-236 239 246-247 251}
> test-NodeC-62395 primary: {... 227-236 239-243 246-247 249-251 253-255}, backup: {... 226 245 252}
> 08:53:01,613 TRACE (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=11, phase=READ_OLD_WRITE_ALL, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-15001: 134+45, test-NodeC-62395: 122+50]}, pendingCH=DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-15001: 129+45, test-NodeC-62395: 127+28]}, unionCH=DefaultConsistentHash{ns=256, owners = (2)[test-NodeA-15001: 134+62, test-NodeC-62395: 122+68]}, actualMembers=[test-NodeA-15001, test-NodeC-62395, test-NodeD-29215], persistentUUIDs=[0506cc27-9762-4703-ad56-6a3bf7953529, c365b93f-e46c-4f11-ab46-6cafa2b2d92b, d3f21b0d-07f2-4089-b160-f754e719de83]} on cache cluster-listener
> 08:53:01,662 TRACE (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateConsumerImpl] On cache cluster-listener we have: new segments: {1-18 21 30-53 56-63 67 69-73 75-78 81-85 88-101 107-115 117-118 125-134 136-139 142-156 158-165 168-170 172-174 177-179 181-188 193-211 214-226 228-229 235-239 243-254}; old segments: {1-15 30-53 56-63 69-73 75-77 81-85 88-101 107-115 117-118 125-131 136-139 142-145 150-156 158-165 168-170 172-174 177-179 183-188 193-211 214-218 220-226 228-229 235-238 243-254}
> 08:53:01,663 TRACE (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateConsumerImpl] On cache cluster-listener we have: added segments: {16-18 21 67 78 132-134 146-149 181-182 219 239}; removed segments: {}
> 08:53:01,663 DEBUG (transport-thread-test-NodeA-p4-t1:[Topology-cluster-listener]) [StateConsumerImpl] Not requesting segments {16-18 21 67 78 132-134 146-149 181-182 219 239} because the last owner left the cluster
> {noformat}
> There isn't any visible inconsistency: A only owns segment 239 for writing, and the coordinator immediately starts a new rebalance, ignoring the pending CH that it sent out earlier. However, the new rebalance causes its own problems: ISPN-8240.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 10 months