[JBoss JIRA] (ISPN-4766) Cache can't start if coordinator leaves during join and joiner becomes the new coordinator
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4766?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4766:
-------------------------------
Priority: Critical (was: Major)
> Cache can't start if coordinator leaves during join and joiner becomes the new coordinator
> ------------------------------------------------------------------------------------------
>
> Key: ISPN-4766
> URL: https://issues.jboss.org/browse/ISPN-4766
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 7.0.0.Beta2
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 7.0.0.CR1
>
>
> When the joiner becomes the coordinator, it tries to recover the current cache topologies, but it receives just one expected member and no current topology. This causes a NPE in ClusterCacheStatus:
> {noformat}
> 22:51:49,547 ERROR (transport-thread-NodeB-p21124-t1:) [ClusterCacheStatus] ISPN000228: Failed to recover cache dist state after the current node became the coordinator
> java.lang.NullPointerException
> at org.infinispan.partionhandling.impl.PreferAvailabilityStrategy.onPartitionMerge(PreferAvailabilityStrategy.java:104)
> at org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:452)
> at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:260)
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:180)
> at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:427)
> {noformat}
> The LocalTopologyManagerImpl waits a bit after receiving the SuspectException and tries again, but this time it receives a {@code null} initial topology, causing another NPE:
> {noformat}
> 22:51:51,319 DEBUG (testng-GlobalKeySetTaskTest:) [LocalTopologyManagerImpl] Error sending join request for cache dist to coordinator
> java.lang.NullPointerException
> at org.infinispan.topology.LocalTopologyManagerImpl.resetLocalTopologyBeforeRebalance(LocalTopologyManagerImpl.java:222)
> at org.infinispan.topology.LocalTopologyManagerImpl.handleTopologyUpdate(LocalTopologyManagerImpl.java:191)
> at org.infinispan.topology.LocalTopologyManagerImpl.join(LocalTopologyManagerImpl.java:105)
> at org.infinispan.statetransfer.StateTransferManagerImpl.start(StateTransferManagerImpl.java:108)
> {noformat}
> This keeps going on until the state transfer timeout expires.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4766) Cache can't start if coordinator leaves during join and joiner becomes the new coordinator
by Dan Berindei (JIRA)
Dan Berindei created ISPN-4766:
----------------------------------
Summary: Cache can't start if coordinator leaves during join and joiner becomes the new coordinator
Key: ISPN-4766
URL: https://issues.jboss.org/browse/ISPN-4766
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 7.0.0.Beta2
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 7.0.0.CR1
When the joiner becomes the coordinator, it tries to recover the current cache topologies, but it receives just one expected member and no current topology. This causes a NPE in ClusterCacheStatus:
{noformat}
22:51:49,547 ERROR (transport-thread-NodeB-p21124-t1:) [ClusterCacheStatus] ISPN000228: Failed to recover cache dist state after the current node became the coordinator
java.lang.NullPointerException
at org.infinispan.partionhandling.impl.PreferAvailabilityStrategy.onPartitionMerge(PreferAvailabilityStrategy.java:104)
at org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:452)
at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:260)
at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:180)
at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:427)
{noformat}
The LocalTopologyManagerImpl waits a bit after receiving the SuspectException and tries again, but this time it receives a {@code null} initial topology, causing another NPE:
{noformat}
22:51:51,319 DEBUG (testng-GlobalKeySetTaskTest:) [LocalTopologyManagerImpl] Error sending join request for cache dist to coordinator
java.lang.NullPointerException
at org.infinispan.topology.LocalTopologyManagerImpl.resetLocalTopologyBeforeRebalance(LocalTopologyManagerImpl.java:222)
at org.infinispan.topology.LocalTopologyManagerImpl.handleTopologyUpdate(LocalTopologyManagerImpl.java:191)
at org.infinispan.topology.LocalTopologyManagerImpl.join(LocalTopologyManagerImpl.java:105)
at org.infinispan.statetransfer.StateTransferManagerImpl.start(StateTransferManagerImpl.java:108)
{noformat}
This keeps going on until the state transfer timeout expires.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4752) Implement native getAll/putAll operations in Hot Rod 2.0
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-4752?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño commented on ISPN-4752:
----------------------------------------
Implementing these operations requires considerable changes, particularly at the decode level, where state phases need to be looped over and multiple key/params/values need to be tracked. On top of that, and more importantly, when the puts are called, we don't want to be calling synchronous put since it would be hugely ineffecient server-side. Much better would be to call putAsync(), get the results for each, and then when all are completed, whether with success or failure, send the reply back. The current decoder would hugely benefit from a more async approach rather than the current one which heavily relies on sync calls. Separating the Memcached/HotRod decoder would make all this much easier as well. With all this in mind, Infinispan 7.0 is too late to apply these changes, so moving up to next version which currently is 7.1.
> Implement native getAll/putAll operations in Hot Rod 2.0
> --------------------------------------------------------
>
> Key: ISPN-4752
> URL: https://issues.jboss.org/browse/ISPN-4752
> Project: Infinispan
> Issue Type: Feature Request
> Components: Remote Protocols
> Reporter: Galder Zamarreño
> Assignee: Galder Zamarreño
> Fix For: 8.0.0.Final
>
>
> To help make RemoteCache.getAll() and RemoteCache.putAll() operations more efficient.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months
[JBoss JIRA] (ISPN-4752) Implement native getAll/putAll operations in Hot Rod 2.0
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-4752?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño edited comment on ISPN-4752 at 9/25/14 7:28 AM:
-----------------------------------------------------------------
Implementing these operations requires considerable changes, particularly at the decode level, where state phases need to be looped over and multiple key/params/values need to be tracked. On top of that, and more importantly, when the puts are called, we don't want to be calling synchronous put since it would be hugely ineffecient server-side. Much better would be to call putAsync(), get the results for each, and then when all are completed, whether with success or failure, send the reply back. The current decoder would hugely benefit from a more async approach rather than the current one which heavily relies on sync calls. Separating the Memcached/HotRod decoder would make all this much easier as well. With all this in mind, Infinispan 7.0 is too late to apply these changes, so moving up to next version which currently is 8.0, but more likely it'll be 7.1.
was (Author: galder.zamarreno):
Implementing these operations requires considerable changes, particularly at the decode level, where state phases need to be looped over and multiple key/params/values need to be tracked. On top of that, and more importantly, when the puts are called, we don't want to be calling synchronous put since it would be hugely ineffecient server-side. Much better would be to call putAsync(), get the results for each, and then when all are completed, whether with success or failure, send the reply back. The current decoder would hugely benefit from a more async approach rather than the current one which heavily relies on sync calls. Separating the Memcached/HotRod decoder would make all this much easier as well. With all this in mind, Infinispan 7.0 is too late to apply these changes, so moving up to next version which currently is 7.1.
> Implement native getAll/putAll operations in Hot Rod 2.0
> --------------------------------------------------------
>
> Key: ISPN-4752
> URL: https://issues.jboss.org/browse/ISPN-4752
> Project: Infinispan
> Issue Type: Feature Request
> Components: Remote Protocols
> Reporter: Galder Zamarreño
> Assignee: Galder Zamarreño
> Fix For: 8.0.0.Final
>
>
> To help make RemoteCache.getAll() and RemoteCache.putAll() operations more efficient.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 6 months