[JBoss JIRA] (ISPN-9536) Distributed stream iteration timeout during join
by Diego Lovison (Jira)
[ https://issues.jboss.org/browse/ISPN-9536?page=com.atlassian.jira.plugin.... ]
Diego Lovison updated ISPN-9536:
--------------------------------
Tester: Diego Lovison
> Distributed stream iteration timeout during join
> ------------------------------------------------
>
> Key: ISPN-9536
> URL: https://issues.jboss.org/browse/ISPN-9536
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.4.0.CR3
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 9.4.0.Final
>
>
> {{LocalStreamManagerImpl}} checks in several places that the cache status is {{RUNNING}}, and if not it suspects all local segments with this log message:
> {noformat}
> [LocalStreamManagerImpl] Cache status is no longer running, all segments are now suspect for Test-NodeD-257370
> {noformat}
> This check is incorrect, because a node may receive distributed streaming requests before it is running: the initial state transfer could finish before all the components in the cache finish starting.
> If the check fails because the cache is still starting, the originator of the distributed stream iteration will assume that the remote cache has a newer topology. Because there is no newer topology, the originator blocks for {{DistributedCacheTimeout.timeout()}} and then fails.
> {noformat}
> 10:48:06,137 DEBUG (transport-thread-Test-NodeE-p660-t3:[Topology-org.infinispan.CONFIG]) [LocalTopologyManagerImpl] Updating local topology for cache org.infinispan.CONFIG: CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (5)[Test-NodeC-45735: 55, Test-NodeA-61957: 55, Test-NodeE-27687: 52, Test-NodeB-13641: 45, Test-NodeD-25737: 49]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeC-45735, Test-NodeA-61957, Test-NodeE-27687, Test-NodeB-13641, Test-NodeD-25737], persistentUUIDs=[5fccba8a-802a-423a-8e3b-ec9c1892a5ae, 5d04de93-79ce-4eff-a827-2a4bbdf54080, 70b79733-dbbe-4622-a5f6-38d3ec7771ff, 87e8c232-21a0-4416-8c81-4829f4619414, 78146ad7-7594-4161-9a7e-c4da0cc79f42]}
> 10:48:06,156 TRACE (remote-thread-Test-NodeE-p651-t2:[]) [LocalStreamManagerImpl] Request Test-NodeD-257370 completed for segments {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} with {} suspected segments
> 10:48:06,157 TRACE (remote-thread-Test-NodeE-p651-t2:[]) [LocalStreamManagerImpl] Cache status is no longer running, all segments are now suspect for Test-NodeD-257370
> 10:48:06,158 TRACE (ForkThread-5,CounterConcurrentStartTest:[org.infinispan.CONFIG]) [DefaultCacheManager] Cache org.infinispan.CONFIG started
> 10:48:06,165 TRACE (remote-thread-Test-NodeD-p649-t2:[]) [ClusterStreamManagerImpl] Received response from Test-NodeE-27687 with a completed response true for id Test-NodeD-257370 with {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} suspected segments.
> 10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [DistributedCacheStream] Found {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255} lost segments for identifier Test-NodeD-257370
> 10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [DistributedCacheStream] Waiting for topology 10 to continue stream operation with segments {2 4 12 20 24 27 31 36 38 42 47 56 58 66 72 75 77 87-88 93 100-101 109 120 125 148 150 159 166 172 179-180 188 194-195 198 204-205 212 216 219-220 222 226 233 236-237 239 242-244 255}
> 10:48:06,165 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [StateTransferLockImpl] Waiting for topology 10 to be installed, current topology is 9
> 10:48:36,168 TRACE (ForkThread-9,CounterConcurrentStartTest:[]) [BasicComponentRegistryImpl] Changed status of org.infinispan.counter.api.CounterManager to FAILED
> org.infinispan.manager.EmbeddedCacheManagerStartupException: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.counter.impl.manager.EmbeddedCounterManager.start() on object of type EmbeddedCounterManager
> ...
> Caused by: java.util.concurrent.TimeoutException
> at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) ~[?:1.8.0_171]
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) ~[?:1.8.0_171]
> at org.infinispan.stream.impl.AbstractCacheStream.performOperationRehashAware(AbstractCacheStream.java:330) ~[classes/:?]
> at org.infinispan.stream.impl.AbstractCacheStream.performOperation(AbstractCacheStream.java:229) ~[classes/:?]
> at org.infinispan.stream.impl.DistributedCacheStream.anyMatch(DistributedCacheStream.java:406) ~[classes/:?]
> at org.infinispan.util.AbstractDelegatingCacheStream.anyMatch(AbstractDelegatingCacheStream.java:300) ~[classes/:?]
> at org.infinispan.CacheStream.anyMatch(CacheStream.java:462) ~[classes/:?]
> at org.infinispan.counter.impl.manager.CounterConfigurationManager.start(CounterConfigurationManager.java:90) ~[classes/:?]
> at org.infinispan.counter.impl.manager.EmbeddedCounterManager.start(EmbeddedCounterManager.java:89) ~[classes/:?]
> {noformat}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 4 months
[JBoss JIRA] (ISPN-9701) TransactionTable does not shutdown gracefully
by tommaso borgato (Jira)
[ https://issues.jboss.org/browse/ISPN-9701?page=com.atlassian.jira.plugin.... ]
tommaso borgato updated ISPN-9701:
----------------------------------
Comment: was deleted
(was: revalidate after backport)
> TransactionTable does not shutdown gracefully
> ---------------------------------------------
>
> Key: ISPN-9701
> URL: https://issues.jboss.org/browse/ISPN-9701
> Project: Infinispan
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 9.2.4.Final, 9.3.5.Final, 9.4.1.Final
> Reporter: Paul Ferraro
> Assignee: Paul Ferraro
> Priority: Critical
> Fix For: 10.0.0.Alpha2, 9.4.3.Final
>
>
> Here's a sample stacktrace during shutdown:
> {noformat}
> 16:54:15,033 WARN [org.wildfly.clustering.web.undertow] (default task-1) ISPN000472: Cache manager is stopping: org.infinispan.IllegalLifecycleStateException: ISPN000472: Cache manager is stopping
> at org.infinispan.marshall.core.GlobalMarshaller.getExternalizer(GlobalMarshaller.java:420)
> at org.infinispan.marshall.core.GlobalMarshaller.writeNonNullableObject(GlobalMarshaller.java:400)
> at org.infinispan.marshall.core.GlobalMarshaller.writeNullableObject(GlobalMarshaller.java:355)
> at org.infinispan.marshall.core.GlobalMarshaller.writeObjectOutput(GlobalMarshaller.java:183)
> at org.infinispan.marshall.core.GlobalMarshaller.writeObjectOutput(GlobalMarshaller.java:176)
> at org.infinispan.marshall.core.GlobalMarshaller.objectToBuffer(GlobalMarshaller.java:305)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.marshallRequest(JGroupsTransport.java:1009)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.sendCommand(JGroupsTransport.java:1209)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.performAsyncRemoteInvocation(JGroupsTransport.java:1105)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotelyAsync(JGroupsTransport.java:246)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotelyAsync(RpcManagerImpl.java:291)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:323)
> at org.infinispan.transaction.impl.TransactionTable.removeTransactionInfoRemotely(TransactionTable.java:900)
> at org.infinispan.transaction.impl.TransactionTable.releaseLocksForCompletedTransaction(TransactionTable.java:886)
> at org.infinispan.transaction.xa.XaTransactionTable.forgetSuccessfullyCompletedTransaction(XaTransactionTable.java:195)
> at org.infinispan.transaction.xa.XaTransactionTable.commit(XaTransactionTable.java:128)
> at org.infinispan.transaction.xa.TransactionXaAdapter.commit(TransactionXaAdapter.java:68)
> at org.infinispan.commons.tx.TransactionImpl.finishResource(TransactionImpl.java:419)
> at org.infinispan.commons.tx.TransactionImpl.commitResources(TransactionImpl.java:466)
> at org.infinispan.commons.tx.TransactionImpl.runCommit(TransactionImpl.java:335)
> at org.infinispan.commons.tx.TransactionImpl.commit(TransactionImpl.java:110)
> {noformat}
> The problem seems to be that shutDownGracefully() first waits for the localTransactions map to be empty. However, when the cache is clustered, releaseLocksForCompletedTransaction(...) removes the transaction from the localTransactions map *before* invoking removeTransactionInfoRemotely(...), which means that the subsequent TxCompletionNotificationCommand can fail to marshal (see above), or the transport might close before this command is sent.
> A naive fix would simply reorder the removeLocalTransaction(...) to happen after the call to removeTransactionInfoRemotely(...) within the releaseLocksForCompletedTransaction(...) method, but I'm sure there's more to it.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 4 months
[JBoss JIRA] (ISPN-9701) TransactionTable does not shutdown gracefully
by tommaso borgato (Jira)
[ https://issues.jboss.org/browse/ISPN-9701?page=com.atlassian.jira.plugin.... ]
tommaso borgato commented on ISPN-9701:
---------------------------------------
revalidate after backport
> TransactionTable does not shutdown gracefully
> ---------------------------------------------
>
> Key: ISPN-9701
> URL: https://issues.jboss.org/browse/ISPN-9701
> Project: Infinispan
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 9.2.4.Final, 9.3.5.Final, 9.4.1.Final
> Reporter: Paul Ferraro
> Assignee: Paul Ferraro
> Priority: Critical
> Fix For: 10.0.0.Alpha2, 9.4.3.Final
>
>
> Here's a sample stacktrace during shutdown:
> {noformat}
> 16:54:15,033 WARN [org.wildfly.clustering.web.undertow] (default task-1) ISPN000472: Cache manager is stopping: org.infinispan.IllegalLifecycleStateException: ISPN000472: Cache manager is stopping
> at org.infinispan.marshall.core.GlobalMarshaller.getExternalizer(GlobalMarshaller.java:420)
> at org.infinispan.marshall.core.GlobalMarshaller.writeNonNullableObject(GlobalMarshaller.java:400)
> at org.infinispan.marshall.core.GlobalMarshaller.writeNullableObject(GlobalMarshaller.java:355)
> at org.infinispan.marshall.core.GlobalMarshaller.writeObjectOutput(GlobalMarshaller.java:183)
> at org.infinispan.marshall.core.GlobalMarshaller.writeObjectOutput(GlobalMarshaller.java:176)
> at org.infinispan.marshall.core.GlobalMarshaller.objectToBuffer(GlobalMarshaller.java:305)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.marshallRequest(JGroupsTransport.java:1009)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.sendCommand(JGroupsTransport.java:1209)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.performAsyncRemoteInvocation(JGroupsTransport.java:1105)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotelyAsync(JGroupsTransport.java:246)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotelyAsync(RpcManagerImpl.java:291)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:323)
> at org.infinispan.transaction.impl.TransactionTable.removeTransactionInfoRemotely(TransactionTable.java:900)
> at org.infinispan.transaction.impl.TransactionTable.releaseLocksForCompletedTransaction(TransactionTable.java:886)
> at org.infinispan.transaction.xa.XaTransactionTable.forgetSuccessfullyCompletedTransaction(XaTransactionTable.java:195)
> at org.infinispan.transaction.xa.XaTransactionTable.commit(XaTransactionTable.java:128)
> at org.infinispan.transaction.xa.TransactionXaAdapter.commit(TransactionXaAdapter.java:68)
> at org.infinispan.commons.tx.TransactionImpl.finishResource(TransactionImpl.java:419)
> at org.infinispan.commons.tx.TransactionImpl.commitResources(TransactionImpl.java:466)
> at org.infinispan.commons.tx.TransactionImpl.runCommit(TransactionImpl.java:335)
> at org.infinispan.commons.tx.TransactionImpl.commit(TransactionImpl.java:110)
> {noformat}
> The problem seems to be that shutDownGracefully() first waits for the localTransactions map to be empty. However, when the cache is clustered, releaseLocksForCompletedTransaction(...) removes the transaction from the localTransactions map *before* invoking removeTransactionInfoRemotely(...), which means that the subsequent TxCompletionNotificationCommand can fail to marshal (see above), or the transport might close before this command is sent.
> A naive fix would simply reorder the removeLocalTransaction(...) to happen after the call to removeTransactionInfoRemotely(...) within the releaseLocksForCompletedTransaction(...) method, but I'm sure there's more to it.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 4 months
[JBoss JIRA] (ISPN-9762) Cache hangs during rebalancing
by Gustavo Fernandes (Jira)
[ https://issues.jboss.org/browse/ISPN-9762?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes edited comment on ISPN-9762 at 11/27/18 1:40 AM:
-------------------------------------------------------------------
EDIT: updated the log file URL
Link to trace log https://drive.google.com/open?id=1-gZUdVGf-Sz-MQZxkKs2KUylMT0-wXvH
was (Author: schernolyas):
Link to trace log https://drive.google.com/open?id=1-gZUdVGf-Sz-MQZxkKs2KUylMT0-wXvH
> Cache hangs during rebalancing
> ------------------------------
>
> Key: ISPN-9762
> URL: https://issues.jboss.org/browse/ISPN-9762
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 9.4.2.Final
> Reporter: Sergey Chernolyas
> Assignee: Ryan Emerson
> Priority: Blocker
> Attachments: hang_node.txt, normal_node.txt, stat_bad_node.png, stat_good_node.png
>
>
> I have a cluster with two nodes. One node starts without problem. Second node hangs on rebalancing cache DEVICES.
> Configuration of the cache:
> {code:xml}
> <distributed-cache name="DEVICES" owners="2" segments="256" mode="SYNC">
> <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
> <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
> <memory>
> <object size="300000" strategy="REMOVE"/>
> </memory>
> <rocksdb-store preload="true" path="/data/rocksdb/devices/data">
> <expiration path="/data/rocksdb/devices/expired"/>
> </rocksdb-store>
> <indexing index="LOCAL">
> <property name="default.indexmanager">org.infinispan.query.indexmanager.InfinispanIndexManager</property>
> <property name="default.directory_provider">infinispan</property>
> <property name="default.worker.execution">async</property>
> <property name="default.index_flush_interval">500</property>
> <property name="default.indexwriter.merge_factor">30</property>
> <property name="default.indexwriter.merge_max_size">1024</property>
> <property name="default.indexwriter.ram_buffer_size">256</property>
> <property name="default.locking_cachename">LuceneIndexesLocking_devices</property>
> <property name="default.data_cachename">LuceneIndexesData_devices</property>
> <property name="default.metadata_cachename">LuceneIndexesMetadata_devices</property>
> </indexing>
> <expiration max-idle="172800000"/>
> </distributed-cache>
> {code}
> The cache contains 70 000 elements.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 4 months
[JBoss JIRA] (ISPN-9762) Cache hangs during rebalancing
by Gustavo Fernandes (Jira)
[ https://issues.jboss.org/browse/ISPN-9762?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes edited comment on ISPN-9762 at 11/27/18 1:32 AM:
-------------------------------------------------------------------
Link to trace log https://drive.google.com/open?id=1-gZUdVGf-Sz-MQZxkKs2KUylMT0-wXvH
was (Author: schernolyas):
Link to trace log https://yadi.sk/d/5OW7GafEHvXdWQ
> Cache hangs during rebalancing
> ------------------------------
>
> Key: ISPN-9762
> URL: https://issues.jboss.org/browse/ISPN-9762
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 9.4.2.Final
> Reporter: Sergey Chernolyas
> Assignee: Ryan Emerson
> Priority: Blocker
> Attachments: hang_node.txt, normal_node.txt, stat_bad_node.png, stat_good_node.png
>
>
> I have a cluster with two nodes. One node starts without problem. Second node hangs on rebalancing cache DEVICES.
> Configuration of the cache:
> {code:xml}
> <distributed-cache name="DEVICES" owners="2" segments="256" mode="SYNC">
> <state-transfer await-initial-transfer="true" enabled="true" timeout="2400000" chunk-size="2048"/>
> <partition-handling when-split="ALLOW_READ_WRITES" merge-policy="PREFERRED_ALWAYS"/>
> <memory>
> <object size="300000" strategy="REMOVE"/>
> </memory>
> <rocksdb-store preload="true" path="/data/rocksdb/devices/data">
> <expiration path="/data/rocksdb/devices/expired"/>
> </rocksdb-store>
> <indexing index="LOCAL">
> <property name="default.indexmanager">org.infinispan.query.indexmanager.InfinispanIndexManager</property>
> <property name="default.directory_provider">infinispan</property>
> <property name="default.worker.execution">async</property>
> <property name="default.index_flush_interval">500</property>
> <property name="default.indexwriter.merge_factor">30</property>
> <property name="default.indexwriter.merge_max_size">1024</property>
> <property name="default.indexwriter.ram_buffer_size">256</property>
> <property name="default.locking_cachename">LuceneIndexesLocking_devices</property>
> <property name="default.data_cachename">LuceneIndexesData_devices</property>
> <property name="default.metadata_cachename">LuceneIndexesMetadata_devices</property>
> </indexing>
> <expiration max-idle="172800000"/>
> </distributed-cache>
> {code}
> The cache contains 70 000 elements.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 4 months
[JBoss JIRA] (ISPN-9699) Cluster member owning no data
by Radoslav Husar (Jira)
[ https://issues.jboss.org/browse/ISPN-9699?page=com.atlassian.jira.plugin.... ]
Radoslav Husar updated ISPN-9699:
---------------------------------
Description:
Currently, you can set {{capacity-factor}} to zero on a cache if you don't want a node to own any segment.
If you want a node to own no data at all, you could set this property on all declared caches. But this wouldn't work for internal caches anyway (locks, counters).
It would be nice to have a global switch.
This would be useful when your app needs to be a cluster member for discovery/membership but is deployed mostly for processing. Indeed, when such nodes are added/removed from the cluster:
* there would be no data loss
* the cluster would be back in healthy state faster as no data would be moved around.
was:
Currently, you can set {{capacity-factor}} to zero on a cache if you don't want a node to own any segment.
If you want a node to own no data at all, you could set this property on all declared cached. But this wouldn't work for internal caches anyway (locks, counters).
It would be nice to have a global switch.
This would be useful when your app needs to be a cluster member for discovery/membership but is deployed mostly for processing. Indeed, when such nodes are added/removed from the cluster:
* there would be no data loss
* the cluster would be back in healthy state faster as no data would be moved around.
> Cluster member owning no data
> -----------------------------
>
> Key: ISPN-9699
> URL: https://issues.jboss.org/browse/ISPN-9699
> Project: Infinispan
> Issue Type: Feature Request
> Components: Core
> Affects Versions: 9.4.1.Final
> Reporter: Thomas Segismont
> Assignee: Katia Aresti
> Priority: Major
> Fix For: 10.0.0.Alpha2, 9.4.3.Final
>
>
> Currently, you can set {{capacity-factor}} to zero on a cache if you don't want a node to own any segment.
> If you want a node to own no data at all, you could set this property on all declared caches. But this wouldn't work for internal caches anyway (locks, counters).
> It would be nice to have a global switch.
> This would be useful when your app needs to be a cluster member for discovery/membership but is deployed mostly for processing. Indeed, when such nodes are added/removed from the cluster:
> * there would be no data loss
> * the cluster would be back in healthy state faster as no data would be moved around.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 4 months
[JBoss JIRA] (ISPN-8524) ScatteredDelayedAvailabilityUpdateTest.testDelayedAvailabilityUpdate5 failing randomly
by Diego Lovison (Jira)
[ https://issues.jboss.org/browse/ISPN-8524?page=com.atlassian.jira.plugin.... ]
Diego Lovison closed ISPN-8524.
-------------------------------
> ScatteredDelayedAvailabilityUpdateTest.testDelayedAvailabilityUpdate5 failing randomly
> --------------------------------------------------------------------------------------
>
> Key: ISPN-8524
> URL: https://issues.jboss.org/browse/ISPN-8524
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.2.0.Alpha2
> Reporter: Galder Zamarreño
> Assignee: Radim Vansa
> Priority: Major
> Labels: testsuite_stability
> Fix For: 9.4.0.Final
>
> Attachments: ScatteredDelayedAvailabilityUpdateTest_ISPN-7919_RpcManager_ResponseCollector_20171212.log.gz
>
>
> http://ci.infinispan.org/job/Infinispan/job/PR-5556/17/
> org.infinispan.partitionhandling.ScatteredDelayedAvailabilityUpdateTest.testDelayedAvailabilityUpdate5[SCATTERED_SYNC]
> {code}
> java.util.concurrent.TimeoutException
> at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> at org.infinispan.partitionhandling.ScatteredDelayedAvailabilityUpdateTest.testDelayedAvailabilityUpdate(ScatteredDelayedAvailabilityUpdateTest.java:75)
> at org.infinispan.partitionhandling.DelayedAvailabilityUpdateTest.testDelayedAvailabilityUpdate5(DelayedAvailabilityUpdateTest.java:39)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> ... Removed 16 stack frames
> {code}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 4 months
[JBoss JIRA] (ISPN-9415) Client topology is not updated after cache becomes degraded
by Diego Lovison (Jira)
[ https://issues.jboss.org/browse/ISPN-9415?page=com.atlassian.jira.plugin.... ]
Diego Lovison closed ISPN-9415.
-------------------------------
> Client topology is not updated after cache becomes degraded
> -----------------------------------------------------------
>
> Key: ISPN-9415
> URL: https://issues.jboss.org/browse/ISPN-9415
> Project: Infinispan
> Issue Type: Bug
> Components: Server
> Affects Versions: 9.4.0.Beta1, 9.3.1.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 9.4.0.CR3
>
>
> When a new server is started, or after a merge, the other servers may see it as a an owner in the consistent hash before the other servers see its server address in the address cache ({{___hotRodTopologyCache}}). When a server needs to send a topology update but some of the servers are missing from the address cache, it can't send the topology update, so it tries to send a "partial update" that excludes the missing servers from the segment owners. In order to send the full topology update when the address cache is populated, the partial topology update has to be sent a smaller topology id, and that means it is only send if {{serverTopologyId >= clientTopologyId + 2}}.
> When the cluster splits and the cache becomes degraded, the servers in the other partition are removed from the address cache, but the list of segment owners is not updated, and the topology id is only incremented by 1. The address cache is incomplete, but a partial update cannot be sent, so the client keeps the old topology and keeps trying to connect to the servers in the other partition.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
7 years, 4 months