[JBoss JIRA] (ISPN-4903) ServerFailureRetrySingleOwnerTest doesn't actually test client retry
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-4903?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4903:
----------------------------------
Fix Version/s: 9.4.8.Final
(was: 9.4.7.Final)
> ServerFailureRetrySingleOwnerTest doesn't actually test client retry
> --------------------------------------------------------------------
>
> Key: ISPN-4903
> URL: https://issues.jboss.org/browse/ISPN-4903
> …
[View More] Project: Infinispan
> Issue Type: Bug
> Components: Server, Test Suite - Server
> Affects Versions: 7.0.0.CR2
> Reporter: Dan Berindei
> Priority: Major
> Fix For: 9.4.8.Final
>
> Attachments: ServerFailureRetrySingleOwnerTest.java
>
>
> With {{useSynchronization = true}} (the default, before ISPN-4166 is integrated), the {{SuspectException}} thrown by the listener is swallowed by the transaction manager and the client doesn't retry. The test doesn't pick that up because the exception is thrown _after_ the entry was updated in the data container (a regular SuspectException would be thrown before).
> I changed the configuration to {{useSynchronization = false}}, but it didn't work because the {{SuspectException}} is wrapped in a {{CacheListenerException}}, so the client throws an exception instead of retrying. I also changed the test to use an interceptor instead of a listener, but then I got a {{ClassCastException}}:
> {noformat}
> Caused by: java.lang.ClassCastException: [B cannot be cast to org.infinispan.container.entries.CacheEntry
> at org.infinispan.cache.impl.CacheImpl.getCacheEntry(CacheImpl.java:424)
> at org.infinispan.cache.impl.CacheImpl.getCacheEntry(CacheImpl.java:429)
> at org.infinispan.server.hotrod.Decoder2x$.customReadKey(Decoder2x.scala:285)
> at org.infinispan.server.hotrod.HotRodDecoder.customDecodeKey(HotRodDecoder.scala:156)
> at org.infinispan.server.core.AbstractProtocolDecoder.org$infinispan$server$core$AbstractProtocolDecoder$$decodeKey(AbstractProtocolDecoder.scala:176)
> at org.infinispan.server.core.AbstractProtocolDecoder.decodeDispatch(AbstractProtocolDecoder.scala:71) ... 14 more
> {noformat}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
[View Less]
6 years
[JBoss JIRA] (ISPN-4879) Log a clear error message when an incompatible node joins the cluster
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-4879?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4879:
----------------------------------
Fix Version/s: 9.4.8.Final
(was: 9.4.7.Final)
> Log a clear error message when an incompatible node joins the cluster
> ---------------------------------------------------------------------
>
> Key: ISPN-4879
> URL: https://issues.jboss.org/browse/ISPN-4879
> …
[View More] Project: Infinispan
> Issue Type: Feature Request
> Components: Core
> Affects Versions: 7.0.0.CR2
> Reporter: Dan Berindei
> Priority: Major
> Fix For: 9.4.8.Final
>
>
> We don't check the Infinispan version when a node joins the cluster. If the node has an incompatible version, it will most likely fail to join, but the error message is not at all straightforward. As an example:
> {noformat}
> Exception in thread "main" org.infinispan.commons.CacheException: Unable
> to invoke method public void
> org.infinispan.statetransfer.StateTransferManagerImpl.start() throws
> java.lang.Exception on object of type StateTransferManagerImpl
> at
> org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:170)
> at
> org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:869)
> at
> org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:638)
> at
> org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:627)
> at
> org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:530)
> at
> org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:216)
> at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:764)
> at
> org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:584)
> at
> org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:539)
> at
> org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:416)
> at ch.nexustelecom.lbd.engine.ImsiCache.init(ImsiCache.java:49)
> at
> ch.nexustelecom.dexclient.engine.DefaultDexClientEngine.init(DefaultDexClientEngine.java:120)
> at ch.nexustelecom.dexclient.DexClient.initClient(DexClient.java:169)
> at
> ch.nexustelecom.dexclient.tool.DexClientManager.startup(DexClientManager.java:196)
> at
> ch.nexustelecom.dexclient.tool.DexClientManager.main(DexClientManager.java:83)
> Caused by: org.infinispan.commons.CacheException:
> java.lang.ClassNotFoundException:
> org.infinispan.partionhandling.impl.AvailabilityMode
> at org.infinispan.commons.util.Util.rewrapAsCacheException(Util.java:655)
> at
> org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:176)
> at
> org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:536)
> at
> org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinator(LocalTopologyManagerImpl.java:388)
> at
> org.infinispan.topology.LocalTopologyManagerImpl.join(LocalTopologyManagerImpl.java:102)
> at
> org.infinispan.statetransfer.StateTransferManagerImpl.start(StateTransferManagerImpl.java:108)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at
> org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:168)
> ... 14 more
> Caused by: java.lang.ClassNotFoundException:
> org.infinispan.partionhandling.impl.AvailabilityMode
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> at java.lang.ClassLoader.loadClass(Unknown Source)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Unknown Source)
> at
> org.jboss.marshalling.AbstractClassResolver.loadClass(AbstractClassResolver.java:131)
> at
> org.jboss.marshalling.AbstractClassResolver.resolveClass(AbstractClassResolver.java:112)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadClassDescriptor(RiverUnmarshaller.java:1002)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadNewObject(RiverUnmarshaller.java:1239)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:272)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
> at
> org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:41)
> at
> org.infinispan.topology.CacheStatusResponse$Externalizer.readObject(CacheStatusResponse.java:76)
> at
> org.infinispan.topology.CacheStatusResponse$Externalizer.readObject(CacheStatusResponse.java:62)
> at
> org.infinispan.marshall.core.ExternalizerTable$ExternalizerAdapter.readObject(ExternalizerTable.java:424)
> at
> org.infinispan.marshall.core.ExternalizerTable.readObject(ExternalizerTable.java:221)
> at
> org.infinispan.marshall.core.JBossMarshaller$ExternalizerTableProxy.readObject(JBossMarshaller.java:148)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:351)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
> at
> org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:41)
> at
> org.infinispan.remoting.responses.SuccessfulResponse$Externalizer.readObject(SuccessfulResponse.java:79)
> at
> org.infinispan.remoting.responses.SuccessfulResponse$Externalizer.readObject(SuccessfulResponse.java:64)
> at
> org.infinispan.marshall.core.ExternalizerTable$ExternalizerAdapter.readObject(ExternalizerTable.java:424)
> at
> org.infinispan.marshall.core.ExternalizerTable.readObject(ExternalizerTable.java:221)
> at
> org.infinispan.marshall.core.JBossMarshaller$ExternalizerTableProxy.readObject(JBossMarshaller.java:148)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:351)
> at
> org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
> at
> org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:41)
> at
> org.infinispan.commons.marshall.jboss.AbstractJBossMarshaller.objectFromObjectStream(AbstractJBossMarshaller.java:135)
> at
> org.infinispan.marshall.core.VersionAwareMarshaller.objectFromByteBuffer(VersionAwareMarshaller.java:101)
> at
> org.infinispan.commons.marshall.AbstractDelegatingMarshaller.objectFromByteBuffer(AbstractDelegatingMarshaller.java:80)
> at
> org.infinispan.remoting.transport.jgroups.MarshallerAdapter.objectFromBuffer(MarshallerAdapter.java:28)
> at
> org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
> at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:250)
> at
> org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:674)
> at org.jgroups.JChannel.up(JChannel.java:733)
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
> at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:146)
> at org.jgroups.protocols.RSVP.up(RSVP.java:190)
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:379)
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1042)
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
> at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1034)
> at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:752)
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:399)
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:610)
> at org.jgroups.protocols.BARRIER.up(BARRIER.java:152)
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:297)
> at org.jgroups.protocols.MERGE3.up(MERGE3.java:288)
> at org.jgroups.protocols.Discovery.up(Discovery.java:277)
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1568)
> at org.jgroups.protocols.TP$MyHandler.run(TP.java:1787)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> Optionally, we could allow the user to configure an "application version" and prevent nodes with different application versions from joining the same cluster.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
[View Less]
6 years
[JBoss JIRA] (ISPN-4846) State transfer keeps trying to fetch transaction data after the cache was stopped
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-4846?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-4846:
----------------------------------
Fix Version/s: 9.4.8.Final
(was: 9.4.7.Final)
> State transfer keeps trying to fetch transaction data after the cache was stopped
> ---------------------------------------------------------------------------------
>
> Key: ISPN-4846
> URL: https://issues.jboss.org/browse/…
[View More]ISPN-4846
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.CR1
> Reporter: Dan Berindei
> Priority: Major
> Fix For: 9.4.8.Final
>
>
> StateConsumerImpl doesn't check if the cache is stopped while fetching transaction data, it only stops when it's no longer able to find providers for transactions.
> However, JGroupsTransport throws a generic CacheException when the channel is stopped. The state transfer thread can enter a busy-wait loop, retrying to get the transaction data and immediately getting the CacheException, filling the log with messages like this:
> {noformat}
> 19:32:28,237 WARN (remote-thread-NodeN-p42592-t1:) [StateConsumerImpl] ISPN000209: Failed to retrieve transactions for segments [10, 11, 12, 13, 14, 15, 17, 16, 19, 18, 21, 20, 23, 22, 25, 24, 27, 26, 29, 28, 42, 43, 40, 41, 46, 47, 44, 45, 51, 50, 49, 48, 55, 54, 53, 52, 59, 58, 57, 56] of cache testCache from node NodeM-53416
> org.infinispan.commons.CacheException: java.lang.IllegalStateException: channel is not connected
> at org.infinispan.commons.util.Util.rewrapAsCacheException(Util.java:655)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:176)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:536)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:290)
> at org.infinispan.statetransfer.StateConsumerImpl.getTransactions(StateConsumerImpl.java:766)
> at org.infinispan.statetransfer.StateConsumerImpl.requestTransactions(StateConsumerImpl.java:685)
> at org.infinispan.statetransfer.StateConsumerImpl.addTransfers(StateConsumerImpl.java:629)
> at org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:331)
> at org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:195)
> at org.infinispan.statetransfer.StateTransferManagerImpl.access$000(StateTransferManagerImpl.java:43)
> at org.infinispan.statetransfer.StateTransferManagerImpl$1.rebalance(StateTransferManagerImpl.java:116)
> {noformat}
> We should check is the cache is stopped before retrying in StateConsumerImpl.requestTransactions. I also think we should change the stop order - it would make sense to stop the remote executor threads and the RpcDispatcher before we stop the channel.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
[View Less]
6 years
[JBoss JIRA] (ISPN-5574) Define high-level cache capabilities
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-5574?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5574:
----------------------------------
Fix Version/s: 9.4.8.Final
(was: 9.4.7.Final)
> Define high-level cache capabilities
> ------------------------------------
>
> Key: ISPN-5574
> URL: https://issues.jboss.org/browse/ISPN-5574
> Project: Infinispan
> Issue Type: Feature Request
…
[View More]> Components: Configuration, Core
> Affects Versions: 7.2.3.Final
> Reporter: Dan Berindei
> Priority: Minor
> Fix For: 9.4.8.Final
>
>
> Infinispan's configuration is very flexible, and it's sometimes hard to figure out how different settings affect things like cache consistency.
> For example, the lucene-directory module uses the fairly complicated {{Configurations.noDataLossOnJoiner()}} method to validate that a cache is safe for storing lucene indexes.
> Another example is users who would like to use a store for backup, but they don't want read from the store for M/R tasks or when get(k) doesn't find the key in memory.
> One idea would be to define a set of "capabilities" like "state-transfer-complete" or "all-data-in-memory". The user could then add those capabilities in the cache definition, and the cache won't start if the configuration violates those capabilities. The capabilities would also be used internally, to improve the error message when a feature requires a particular combination of settings.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
[View Less]
6 years
[JBoss JIRA] (ISPN-5572) Exposed JMX MBeans should be separate components
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-5572?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5572:
----------------------------------
Fix Version/s: 9.4.8.Final
(was: 9.4.7.Final)
> Exposed JMX MBeans should be separate components
> ------------------------------------------------
>
> Key: ISPN-5572
> URL: https://issues.jboss.org/browse/ISPN-5572
> Project: Infinispan
> …
[View More]Issue Type: Task
> Components: Core
> Affects Versions: 8.0.0.Alpha2, 7.2.3.Final
> Reporter: Dan Berindei
> Priority: Major
> Fix For: 9.4.8.Final
>
>
> We currently expose internal components as JMX MBeans, and that makes our JMX "API" very unstructured. The exposed MBeans should be separate components, and the only concern in their interfaces should be ease of use.
> One example of JMX getting in the way of refactoring is {{CacheMgmtInterceptor}}. The interceptor chain is dynamic, so it should be possible to insert the interceptor only when statistics are enabled. But because the {{statisticsEnabled}} attribute is on the interceptor itself, that becomes a lot trickier, and we had to introduce a separate configuration attribute that disables statistics permanently (ISPN-5542).
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
[View Less]
6 years
[JBoss JIRA] (ISPN-5570) Cross-site: retry backup commands
by Tristan Tarrant (Jira)
[ https://issues.jboss.org/browse/ISPN-5570?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-5570:
----------------------------------
Fix Version/s: 9.4.8.Final
(was: 9.4.7.Final)
> Cross-site: retry backup commands
> ---------------------------------
>
> Key: ISPN-5570
> URL: https://issues.jboss.org/browse/ISPN-5570
> Project: Infinispan
> Issue Type: Bug
> …
[View More]Components: Core, Cross-Site Replication
> Affects Versions: 7.2.3.Final
> Reporter: Dan Berindei
> Priority: Major
> Fix For: 9.4.8.Final
>
>
> There are 3 phases in a backup RPC:
> 1. Sender -> Local site master: caused by the site master is shutting down or crashing, or by a network split.
> 2. Local site master -> Remote site master:
> 2.1. Local site master is no longer a site master, e.g. because it's shutting down or because it's no longer coordinator after a merge.
> 2.2. Remote site master is not longer a site master.
> 2.3. Link between local site and remote site is down.
> 3. Remote site master -> Backup targets
> Replication failures in phase 3 are handled by retrying (except for TimeoutExceptions), because {{BaseBackupReceiver}} uses regular cache methods to perform the updates.
> But replication failures in phases 1 and 2 are not handled in any way, except for causing the remote site to be taken offline after a certain number of replication failures (if backup is synchronous). We should instead retry backup RPCs when we get a {{SuspectException}} or {{UnreachableException}}, and perhaps even when we get no response (2.2?), and only stop when the timeout expires or when the backup is taken offline.
> Async backup probably needs retrying as well, and perhaps even a more sophisticated approach like I-RAC (ISPN-2634).
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
[View Less]
6 years