[JBoss JIRA] (ISPN-4903) ServerFailureRetrySingleOwnerTest doesn't actually test client retry
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4903?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4903:
----------------------------------
Fix Version/s: 9.4.0.Final
(was: 9.4.0.CR3)
> ServerFailureRetrySingleOwnerTest doesn't actually test client retry
> --------------------------------------------------------------------
>
> Key: ISPN-4903
> URL: https://issues.jboss.org/browse/ISPN-4903
> Project: Infinispan
> Issue Type: Bug
> Components: Server, Test Suite - Server
> Affects Versions: 7.0.0.CR2
> Reporter: Dan Berindei
> Fix For: 9.4.0.Final
>
> Attachments: ServerFailureRetrySingleOwnerTest.java
>
>
> With {{useSynchronization = true}} (the default, before ISPN-4166 is integrated), the {{SuspectException}} thrown by the listener is swallowed by the transaction manager and the client doesn't retry. The test doesn't pick that up because the exception is thrown _after_ the entry was updated in the data container (a regular SuspectException would be thrown before).
> I changed the configuration to {{useSynchronization = false}}, but it didn't work because the {{SuspectException}} is wrapped in a {{CacheListenerException}}, so the client throws an exception instead of retrying. I also changed the test to use an interceptor instead of a listener, but then I got a {{ClassCastException}}:
> {noformat}
> Caused by: java.lang.ClassCastException: [B cannot be cast to org.infinispan.container.entries.CacheEntry
> at org.infinispan.cache.impl.CacheImpl.getCacheEntry(CacheImpl.java:424)
> at org.infinispan.cache.impl.CacheImpl.getCacheEntry(CacheImpl.java:429)
> at org.infinispan.server.hotrod.Decoder2x$.customReadKey(Decoder2x.scala:285)
> at org.infinispan.server.hotrod.HotRodDecoder.customDecodeKey(HotRodDecoder.scala:156)
> at org.infinispan.server.core.AbstractProtocolDecoder.org$infinispan$server$core$AbstractProtocolDecoder$$decodeKey(AbstractProtocolDecoder.scala:176)
> at org.infinispan.server.core.AbstractProtocolDecoder.decodeDispatch(AbstractProtocolDecoder.scala:71) ... 14 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 6 months
[JBoss JIRA] (ISPN-4879) Log a clear error message when an incompatible node joins the cluster
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4879?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4879:
----------------------------------
Fix Version/s: 9.4.0.Final
(was: 9.4.0.CR3)
> Log a clear error message when an incompatible node joins the cluster
> ---------------------------------------------------------------------
>
> Key: ISPN-4879
> URL: https://issues.jboss.org/browse/ISPN-4879
> Project: Infinispan
> Issue Type: Feature Request
> Components: Core
> Affects Versions: 7.0.0.CR2
> Reporter: Dan Berindei
> Fix For: 9.4.0.Final
>
>
> We don't check the Infinispan version when a node joins the cluster. If the node has an incompatible version, it will most likely fail to join, but the error message is not at all straightforward. As an example:
> {noformat}
> Exception in thread "main" org.infinispan.commons.CacheException: Unable
> to invoke method public void
> org.infinispan.statetransfer.StateTransferManagerImpl.start() throws
> java.lang.Exception on object of type StateTransferManagerImpl
> at
> org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:170)
> at
> org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:869)
> at
> org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:638)
> at
> org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:627)
> at
> org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:530)
> at
> org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:216)
> at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:764)
> at
> org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:584)
> at
> org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:539)
> at
> org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:416)
> at ch.nexustelecom.lbd.engine.ImsiCache.init(ImsiCache.java:49)
> at
> ch.nexustelecom.dexclient.engine.DefaultDexClientEngine.init(DefaultDexClientEngine.java:120)
> at ch.nexustelecom.dexclient.DexClient.initClient(DexClient.java:169)
> at
> ch.nexustelecom.dexclient.tool.DexClientManager.startup(DexClientManager.java:196)
> at
> ch.nexustelecom.dexclient.tool.DexClientManager.main(DexClientManager.java:83)
> Caused by: org.infinispan.commons.CacheException:
> java.lang.ClassNotFoundException:
> org.infinispan.partionhandling.impl.AvailabilityMode
> at org.infinispan.commons.util.Util.rewrapAsCacheException(Util.java:655)
> at
> org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:176)
> at
> org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:536)
> at
> org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinator(LocalTopologyManagerImpl.java:388)
> at
> org.infinispan.topology.LocalTopologyManagerImpl.join(LocalTopologyManagerImpl.java:102)
> at
> org.infinispan.statetransfer.StateTransferManagerImpl.start(StateTransferManagerImpl.java:108)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at
> org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:168)
> ... 14 more
> Caused by: java.lang.ClassNotFoundException:
> org.infinispan.partionhandling.impl.AvailabilityMode
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Unknown Source)
> at
> org.jboss.marshalling.AbstractClassResolver.loadClass(AbstractClassResolver.java:131)
> at
> org.jboss.marshalling.AbstractClassResolver.resolveClass(AbstractClassResolver.java:112)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadClassDescriptor(RiverUnmarshaller.java:1002)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadNewObject(RiverUnmarshaller.java:1239)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:272)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
> at
> org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:41)
> at
> org.infinispan.topology.CacheStatusResponse$Externalizer.readObject(CacheStatusResponse.java:76)
> at
> org.infinispan.topology.CacheStatusResponse$Externalizer.readObject(CacheStatusResponse.java:62)
> at
> org.infinispan.marshall.core.ExternalizerTable$ExternalizerAdapter.readObject(ExternalizerTable.java:424)
> at
> org.infinispan.marshall.core.ExternalizerTable.readObject(ExternalizerTable.java:221)
> at
> org.infinispan.marshall.core.JBossMarshaller$ExternalizerTableProxy.readObject(JBossMarshaller.java:148)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:351)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
> at
> org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:41)
> at
> org.infinispan.remoting.responses.SuccessfulResponse$Externalizer.readObject(SuccessfulResponse.java:79)
> at
> org.infinispan.remoting.responses.SuccessfulResponse$Externalizer.readObject(SuccessfulResponse.java:64)
> at
> org.infinispan.marshall.core.ExternalizerTable$ExternalizerAdapter.readObject(ExternalizerTable.java:424)
> at
> org.infinispan.marshall.core.ExternalizerTable.readObject(ExternalizerTable.java:221)
> at
> org.infinispan.marshall.core.JBossMarshaller$ExternalizerTableProxy.readObject(JBossMarshaller.java:148)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:351)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
> at
> org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:41)
> at
> org.infinispan.commons.marshall.jboss.AbstractJBossMarshaller.objectFromObjectStream(AbstractJBossMarshaller.java:135)
> at
> org.infinispan.marshall.core.VersionAwareMarshaller.objectFromByteBuffer(VersionAwareMarshaller.java:101)
> at
> org.infinispan.commons.marshall.AbstractDelegatingMarshaller.objectFromByteBuffer(AbstractDelegatingMarshaller.java:80)
> at
> org.infinispan.remoting.transport.jgroups.MarshallerAdapter.objectFromBuffer(MarshallerAdapter.java:28)
> at
> org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
> at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:250)
> at
> org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:674)
> at org.jgroups.JChannel.up(JChannel.java:733)
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
> at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:146)
> at org.jgroups.protocols.RSVP.up(RSVP.java:190)
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:379)
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1042)
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1034)
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:752)
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:399)
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:610)
> at org.jgroups.protocols.BARRIER.up(BARRIER.java:152)
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:297)
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:288)
> at org.jgroups.protocols.Discovery.up(Discovery.java:277)
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1568)
> at org.jgroups.protocols.TP$MyHandler.run(TP.java:1787)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> Optionally, we could allow the user to configure an "application version" and prevent nodes with different application versions from joining the same cluster.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 6 months
[JBoss JIRA] (ISPN-4846) State transfer keeps trying to fetch transaction data after the cache was stopped
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4846?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4846:
----------------------------------
Fix Version/s: 9.4.0.Final
(was: 9.4.0.CR3)
> State transfer keeps trying to fetch transaction data after the cache was stopped
> ---------------------------------------------------------------------------------
>
> Key: ISPN-4846
> URL: https://issues.jboss.org/browse/ISPN-4846
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.CR1
> Reporter: Dan Berindei
> Fix For: 9.4.0.Final
>
>
> StateConsumerImpl doesn't check if the cache is stopped while fetching transaction data, it only stops when it's no longer able to find providers for transactions.
> However, JGroupsTransport throws a generic CacheException when the channel is stopped. The state transfer thread can enter a busy-wait loop, retrying to get the transaction data and immediately getting the CacheException, filling the log with messages like this:
> {noformat}
> 19:32:28,237 WARN (remote-thread-NodeN-p42592-t1:) [StateConsumerImpl] ISPN000209: Failed to retrieve transactions for segments [10, 11, 12, 13, 14, 15, 17, 16, 19, 18, 21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 42, 43, 40, 41, 46, 47, 44, 45, 51, 50, 49, 48, 55, 54, 53, 52, 59, 58, 57, 56] of cache testCache from node NodeM-53416
> org.infinispan.commons.CacheException: java.lang.IllegalStateException: channel is not connected
> at org.infinispan.commons.util.Util.rewrapAsCacheException(Util.java:655)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:176)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:536)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:290)
> at org.infinispan.statetransfer.StateConsumerImpl.getTransactions(StateConsumerImpl.java:766)
> at org.infinispan.statetransfer.StateConsumerImpl.requestTransactions(StateConsumerImpl.java:685)
> at org.infinispan.statetransfer.StateConsumerImpl.addTransfers(StateConsumerImpl.java:629)
> at org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:331)
> at org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:195)
> at org.infinispan.statetransfer.StateTransferManagerImpl.access$000(StateTransferManagerImpl.java:43)
> at org.infinispan.statetransfer.StateTransferManagerImpl$1.rebalance(StateTransferManagerImpl.java:116)
> {noformat}
> We should check is the cache is stopped before retrying in StateConsumerImpl.requestTransactions. I also think we should change the stop order - it would make sense to stop the remote executor threads and the RpcDispatcher before we stop the channel.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 6 months
[JBoss JIRA] (ISPN-5241) Cache topology updates should use the NO_FC flag
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5241?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5241:
----------------------------------
Fix Version/s: 9.4.0.Final
(was: 9.4.0.CR3)
> Cache topology updates should use the NO_FC flag
> ------------------------------------------------
>
> Key: ISPN-5241
> URL: https://issues.jboss.org/browse/ISPN-5241
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.1.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 9.4.0.Final
>
>
> Topology updates are sent while holding the ClusterCacheStatus lock, so they should never block. However, when MFC is present, the topology update can block waiting for enough credits. As most CacheTopologyControlCommands need to acquire the ClusterCacheStatus lock, this can easily lead to a full remote-executor pool (and OOB pool) and the appearance of a deadlock.
> What's more, if one node is not responsive, it can block all the other nodes from receiving further topology updates. Topology updates should be as prompt as possible, so we should use the NO_FC flag to ensure that each node receives topology updates as soon as possible.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 6 months
[JBoss JIRA] (ISPN-5163) A write operation with the SKIP_LOCKING flag can roll back the transaction
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5163?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5163:
----------------------------------
Fix Version/s: 9.4.0.Final
(was: 9.4.0.CR3)
> A write operation with the SKIP_LOCKING flag can roll back the transaction
> --------------------------------------------------------------------------
>
> Key: ISPN-5163
> URL: https://issues.jboss.org/browse/ISPN-5163
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 7.0.3.Final, 7.1.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 9.4.0.Final
>
>
> When a write operation has the SKIP_LOCKING flag, it does not send a {{LockControlCommand}} to the primary owner, but it can send a {{ClusteredGetCommand}} with {{acquireRemoteLocks=true}} instead. The {{ClusteredGetCommmand}} will then execute a {{LockControlCommand}} with the origin not set properly, and {{TxInterceptor}} will roll back the transaction because the originator ({{null}}) appears to have left the cluster.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 6 months
[JBoss JIRA] (ISPN-5093) Granularity of remote event listener implementations doing the same job
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-5093?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5093:
----------------------------------
Fix Version/s: 9.4.0.Final
(was: 9.4.0.CR3)
> Granularity of remote event listener implementations doing the same job
> -----------------------------------------------------------------------
>
> Key: ISPN-5093
> URL: https://issues.jboss.org/browse/ISPN-5093
> Project: Infinispan
> Issue Type: Enhancement
> Components: Remote Protocols
> Reporter: Galder Zamarreño
> Assignee: Galder Zamarreño
> Fix For: 9.4.0.Final
>
>
> Currently, if N clients add the same listener to a cache that does the same job, e.g. keeping a near cache consistent, this results in N server-side cluster listeners created, each potentially installed in different nodes. If one of those nodes fails, all clients that had a listener registered to that node will have to find a different node for this listener.
> The downsides of this approach is that there are as many cluster listeners installed as clients have added listeners (or have near cache enabled), which might not very efficient. If a node goes down, all clients that have cluster listeners there need to failover to some other node.
> The advantage of this approach is simplicity of the approach to decide where to add the listener and where to failover to.
> For this type of scenarios, an alternative set up might be worth exploring:
> If all these client side listeners are interested in exactly the same events, and the client ID would be exposed via the RemoteCache API, a server side cluster listener multi-plexing between all these clients could be potentially built. In other words, instead of having N clients register N cluster listeners, the first client would register the cluster listener with a client listener ID, and if more registrations were added with the same client listener ID, the connections would be added to the existing cluster listener implementation.
> The maximise the efficiency of this solution, all clients (even running in different JMVs), given the same client listener ID, should agree upon the node to add the listener in. For a distributed cache, hashing on the cache name would work. For replicated caches, since there's no hashing available, the first node of the view could be used.
> Since the logic to be executed server-side varies between being the first node adding the client listener vs the others, synchronization would be added to make sure that the first invocation only creates the cluster listener, and the others simply add the channel to the listener.
> Failover is a bit more tricky too, because if the node with the cluster listener goes down, all the clients have to failover, which again exposes a 1st vs the others type of logic.
> Advantages of this approach is the reduction in number of cluster listeners and potentially efficiency coming from a single cluster listener implementation server side.
> The disadvantages come from the server side logic to add/failover a cluster listener, which need to take into account if the listener is present or not. Other disadvantages come from needing the clients to use some specific routing for adding listeners for same node.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 6 months