[JBoss JIRA] (ISPN-3712) HotRod Rolling Upgrades from Delirium 5.2.4 to latest server needs dropped org.infinispan.util.ByteArrayKey
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-3712?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-3712:
-----------------------------------------------
Tomas Sykora <tsykora(a)redhat.com> changed the Status of [bug 1030485|https://bugzilla.redhat.com/show_bug.cgi?id=1030485] from ON_QA to VERIFIED
> HotRod Rolling Upgrades from Delirium 5.2.4 to latest server needs dropped org.infinispan.util.ByteArrayKey
> -----------------------------------------------------------------------------------------------------------
>
> Key: ISPN-3712
> URL: https://issues.jboss.org/browse/ISPN-3712
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 6.0.0.CR1
> Environment: Processing HotRod rolling upgrades from source Infinispan 'Delirium' 5.2.4.Final-redhat-1 server to the latest Infinispan 'Infinium' 6.0.0-SNAPSHOT. Using old standalone.xml for source node and examples/standalone-hotrod-rolling-upgrade.xml for new target server.
> Reporter: Tomas Sykora
> Assignee: Tristan Tarrant
> Labels: 620
> Fix For: 6.0.0.Final, 7.0.0.Alpha1
>
>
> When it is running from 6.0.0-SNAPSHOT to 6.0.0-SNAPSHOT it is ok. It looks like some back-porting compatibility issue.
> See stacktrace:
> javax.management.MBeanException
> at org.infinispan.jmx.ResourceDMBean.invoke(ResourceDMBean.java:273)
> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
> at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:791)
> at org.jboss.as.jmx.PluggableMBeanServerImpl$TcclMBeanServer.invoke(PluggableMBeanServerImpl.java:527)
> at org.jboss.as.jmx.PluggableMBeanServerImpl.invoke(PluggableMBeanServerImpl.java:263)
> at org.jboss.remotingjmx.protocol.v2.ServerProxy$InvokeHandler.handle(ServerProxy.java:915)
> at org.jboss.remotingjmx.protocol.v2.ServerCommon$MessageReciever$1.run(ServerCommon.java:152)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.infinispan.jmx.ResourceDMBean.invoke(ResourceDMBean.java:271)
> ... 9 more
> Caused by: org.infinispan.commons.CacheException: java.lang.ClassNotFoundException: org.infinispan.util.ByteArrayKey from [Module "org.infinispan:main" from local module loader @4099a992 (finder: local module finder @284bd160 (roots: /home/tsykora/programs/eclipseWorkspace/Infinispan-server-tsykora/infinispan-server/testsuite/target/server/node1/modules,/home/tsykora/programs/eclipseWorkspace/Infinispan-server-tsykora/infinispan-server/testsuite/target/server/node1/modules/system/layers/base))]
> at org.infinispan.persistence.remote.upgrade.HotRodTargetMigrator.synchronizeData(HotRodTargetMigrator.java:63)
> at org.infinispan.upgrade.RollingUpgradeManager.synchronizeData(RollingUpgradeManager.java:59)
> ... 14 more
> Caused by: java.lang.ClassNotFoundException: org.infinispan.util.ByteArrayKey from [Module "org.infinispan:main" from local module loader @4099a992 (finder: local module finder @284bd160 (roots: /home/tsykora/programs/eclipseWorkspace/Infinispan-server-tsykora/infinispan-server/testsuite/target/server/node1/modules,/home/tsykora/programs/eclipseWorkspace/Infinispan-server-tsykora/infinispan-server/testsuite/target/server/node1/modules/system/layers/base))]
> at org.jboss.modules.ModuleClassLoader.findClass(ModuleClassLoader.java:190)
> at org.jboss.modules.ConcurrentClassLoader.performLoadClassUnchecked(ConcurrentClassLoader.java:468)
> at org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:456)
> at org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:398)
> at org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:120)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:264)
> at org.jboss.marshalling.AbstractClassResolver.loadClass(AbstractClassResolver.java:135)
> at org.jboss.marshalling.AbstractClassResolver.resolveClass(AbstractClassResolver.java:116)
> at org.jboss.marshalling.river.RiverUnmarshaller.doReadClassDescriptor(RiverUnmarshaller.java:947)
> at org.jboss.marshalling.river.RiverUnmarshaller.doReadNewObject(RiverUnmarshaller.java:1259)
> at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:276)
> at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:213)
> at org.jboss.marshalling.river.RiverUnmarshaller.doReadCollectionObject(RiverUnmarshaller.java:184)
> at org.jboss.marshalling.river.RiverUnmarshaller.readCollectionData(RiverUnmarshaller.java:777)
> at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:656)
> at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:213)
> at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
> at org.infinispan.commons.marshall.jboss.AbstractJBossMarshaller.objectFromObjectStream(AbstractJBossMarshaller.java:140)
> at org.infinispan.commons.marshall.jboss.AbstractJBossMarshaller.objectFromByteBuffer(AbstractJBossMarshaller.java:118)
> at org.infinispan.commons.marshall.AbstractMarshaller.objectFromByteBuffer(AbstractMarshaller.java:82)
> at org.infinispan.persistence.remote.upgrade.HotRodTargetMigrator.synchronizeData(HotRodTargetMigrator.java:61)
> ... 15 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3879) RHQ management -- server plugin -- two X-Site related operations are failing.
by Tomas Sykora (JIRA)
Tomas Sykora created ISPN-3879:
----------------------------------
Summary: RHQ management -- server plugin -- two X-Site related operations are failing.
Key: ISPN-3879
URL: https://issues.jboss.org/browse/ISPN-3879
Project: Infinispan
Issue Type: Bug
Affects Versions: 6.0.1.Final
Reporter: Tomas Sykora
Assignee: Mircea Markus
The only information I was able to gather until now:
java.lang.Exception: JBAS011002: Failed to invoke operation: null, rolled-back=true
at org.rhq.core.pc.operation.OperationInvocation.run(OperationInvocation.java:278)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3659) Cache stop should clear thread-local ExtendedRiverMarshaller or their instance caches
by InfinispanUser 0815 (JIRA)
[ https://issues.jboss.org/browse/ISPN-3659?page=com.atlassian.jira.plugin.... ]
InfinispanUser 0815 edited comment on ISPN-3659 at 1/8/14 9:14 AM:
-------------------------------------------------------------------
-
was (Author: infinispan_user0815):
Vielen Dank für Ihre Email.
Ich bin bis einschließlich 13.12.2013 nicht im Haus. Ihre Email wird nicht automatisch weitergeleitet.
Thank your for your email.
I am out of office until December, 13th 2013. Your email will not be forwarded automatically.
Mit freundlichen Grüßen
Kind regards
Florian Gebhard
> Cache stop should clear thread-local ExtendedRiverMarshaller or their instance caches
> -------------------------------------------------------------------------------------
>
> Key: ISPN-3659
> URL: https://issues.jboss.org/browse/ISPN-3659
> Project: Infinispan
> Issue Type: Bug
> Components: Marshalling
> Affects Versions: 5.3.0.Final, 6.0.0.CR1
> Reporter: Dan Berindei
> Assignee: Pedro Ruivo
> Priority: Critical
> Labels: 620
> Fix For: 6.0.1.Final, 7.0.0.Alpha1, 7.0.0.Final
>
>
> The fix for ISPN-2372 was incomplete. We now clear the references from the thread-local marshallers to the cache itself, so it can be garbage-collected, but we don't clear the marshallers or their instance caches. If the cache values' object graph is very large, the cached marshallers will use up a lot of memory that could be garbage-collected (assuming the cache was indeed restarted and only one cache instance is running now).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-3878?page=com.atlassian.jira.plugin.... ]
Radim Vansa edited comment on ISPN-3878 at 1/8/14 8:53 AM:
-----------------------------------------------------------
I think I've found why the RSVP response does not arrive: as TxCompletionNotificationCommand (from other node with already installed new topology) is sent as non-OOB and this waits until the topology is installed, the ordered RSVP response cannot be delivered. After the cancel command times out, the topology change is finished (in SCI.onTopologyUpdate: finally { ... }) and only then the ordered commands can arrive.
was (Author: rvansa):
I think I've found why the RSVP response does not arrive: as TxCompletionNotificationCommand is sent as non-OOB and this waits until the topology is installed, the ordered RSVP response cannot be delivered. After the cancel command times out, the topology change is finished (in SCI.onTopologyUpdate: finally { ... }) and only then the ordered commands can arrive.
> Unhandled failing ST cancel leads to deadlock
> ---------------------------------------------
>
> Key: ISPN-3878
> URL: https://issues.jboss.org/browse/ISPN-3878
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 6.0.1.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
>
> Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well.
> 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true
> 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion()
> 3. New rebalance is started, changing the CH - requested segment is not in the new CH
> 4. Some ST are canceled, the cancel command is sent and taking a long time
> 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled
> 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed
> Result: the inbound transfer thread is stuck and rebalance is never completed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-3878?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-3878:
-----------------------------------
I think I've found why the RSVP response does not arrive: as TxCompletionNotificationCommand is sent as non-OOB and this waits until the topology is installed, the ordered RSVP response cannot be delivered. After the cancel command times out, the topology change is finished (in SCI.onTopologyUpdate: finally { ... }) and only then the ordered commands can arrive.
> Unhandled failing ST cancel leads to deadlock
> ---------------------------------------------
>
> Key: ISPN-3878
> URL: https://issues.jboss.org/browse/ISPN-3878
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 6.0.1.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
>
> Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well.
> 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true
> 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion()
> 3. New rebalance is started, changing the CH - requested segment is not in the new CH
> 4. Some ST are canceled, the cancel command is sent and taking a long time
> 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled
> 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed
> Result: the inbound transfer thread is stuck and rebalance is never completed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3878?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-3878:
------------------------------------
Summarizing the discussion with Radim on IRC: the cancel command is already sent asynchronously, but it also uses the RSVP flag, which makes it half-synchronous... And the RSVP ACK message is not tagged as OOB, which means it can be easily delayed by a random asynchronous command that takes too long (maybe because it's waiting for the new topology).
With UNICAST3, RSVP should no longer be necessary, so we can fix this by removing the code that sets the RSVP flag automatically for all the state transfer commands. We should also change {{InboundTransferTask.sendCancelCommand()}} to send the cancel commands as OOB.
> Unhandled failing ST cancel leads to deadlock
> ---------------------------------------------
>
> Key: ISPN-3878
> URL: https://issues.jboss.org/browse/ISPN-3878
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 6.0.1.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
>
> Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well.
> 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true
> 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion()
> 3. New rebalance is started, changing the CH - requested segment is not in the new CH
> 4. Some ST are canceled, the cancel command is sent and taking a long time
> 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled
> 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed
> Result: the inbound transfer thread is stuck and rebalance is never completed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3878?page=com.atlassian.jira.plugin.... ]
Dan Berindei edited comment on ISPN-3878 at 1/8/14 7:45 AM:
------------------------------------------------------------
We should also change {{InboundTransferTask.sendCancelCommand()}} to send the cancel commands as OOB.
was (Author: dan.berindei):
Summarizing the discussion with Radim on IRC: the cancel command is already sent asynchronously, but it also uses the RSVP flag, which makes it half-synchronous... And the RSVP ACK message is not tagged as OOB, which means it can be easily delayed by a random asynchronous command that takes too long (maybe because it's waiting for the new topology).
With UNICAST3, RSVP should no longer be necessary, so we can fix this by removing the code that sets the RSVP flag automatically for all the state transfer commands. We should also change {{InboundTransferTask.sendCancelCommand()}} to send the cancel commands as OOB.
> Unhandled failing ST cancel leads to deadlock
> ---------------------------------------------
>
> Key: ISPN-3878
> URL: https://issues.jboss.org/browse/ISPN-3878
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 6.0.1.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
>
> Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well.
> 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true
> 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion()
> 3. New rebalance is started, changing the CH - requested segment is not in the new CH
> 4. Some ST are canceled, the cancel command is sent and taking a long time
> 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled
> 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed
> Result: the inbound transfer thread is stuck and rebalance is never completed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-3878?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-3878:
-----------------------------------
The cause for the TimeoutException is RSVP protocol. RSVP response was delayed for > 60s (our replication timeout). This response is NOT sent with OOB flag, therefore, other messages processing could delay delivering it.
As Dan recommended, with UNICAST3 the RSVP protocol theoretically is not necessary anymore. Trying to run the tests without RSVP now.
> Unhandled failing ST cancel leads to deadlock
> ---------------------------------------------
>
> Key: ISPN-3878
> URL: https://issues.jboss.org/browse/ISPN-3878
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 6.0.1.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
>
> Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well.
> 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true
> 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion()
> 3. New rebalance is started, changing the CH - requested segment is not in the new CH
> 4. Some ST are canceled, the cancel command is sent and taking a long time
> 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled
> 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed
> Result: the inbound transfer thread is stuck and rebalance is never completed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years
[JBoss JIRA] (ISPN-3878) Unhandled failing ST cancel leads to deadlock
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3878?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-3878:
------------------------------------
I think the cancel command can't be sent asynchronously, because we want to know that nobody is sending state by the time the new rebalance starts. (The cancelling of the transfer tasks should happen during the handling of the CH_UPDATE that's sent by the new coordinator, not during the REBALANCE_START that follows.)
On the other hand, perhaps we don't need the CANCEL_STATE_TRANSFER commands at all, and we could just cancel all outbound transfer tasks when we install a new cache topology without a pending CH in StateProviderImpl.
> Unhandled failing ST cancel leads to deadlock
> ---------------------------------------------
>
> Key: ISPN-3878
> URL: https://issues.jboss.org/browse/ISPN-3878
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 6.0.1.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
>
> Two concurrent rebalances can lead to deadlock. Example situation when two rebalances can be executed in parallel is when the coordinator is leaving a cluster; it sends REBALANCE_START and passes away. Then, the new coordinator recovers cluster status and sends REBALANCE_START as well.
> 1. Node is requesting segments for the old topology, StateConsumerImpl.isTransferThreadRunning is set to true
> 2. Node waits for StateResponseCommand in SCI: InboundTransferTask.awaitCompletion()
> 3. New rebalance is started, changing the CH - requested segment is not in the new CH
> 4. Some ST are canceled, the cancel command is sent and taking a long time
> 5. StateReponseCommand is received, but in SCI.applyState it is found out that this segment is no longer owned so the task is not completed/cancelled
> 6. Later, we get TimeoutException from InboundTransferTask.sendCancelCommand, and no more cancellations are executed
> Result: the inbound transfer thread is stuck and rebalance is never completed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years