[infinispan-issues] [JBoss JIRA] (ISPN-1806) Potential race condition results in StateTransferInProgressException on view change
Dan Berindei (JIRA)
jira-events at lists.jboss.org
Tue Feb 7 10:12:49 EST 2012
[ https://issues.jboss.org/browse/ISPN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664792#comment-12664792 ]
Dan Berindei commented on ISPN-1806:
------------------------------------
I missed something in the log, in fact FD_SOCK did suspect `node-udp-1` and it got kicked out of the cluster. However, it appears as if the node re-joins the cluster 50ms after it was killed:
{noformat}
20:22:36,602 INFO [org.jboss.as.clustering.CoreGroupCommunicationService.lifecycle.cluster] (Incoming-9,null) JBAS010247: New cluster view for partition cluster (id: 2, delta: -1, merge: false) : [node-udp-0/cluster]
20:22:36,603 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-9,null) ISPN000094: Received new cluster view: [node-udp-0/cluster|2] [node-udp-0/cluster]
20:22:36,659 INFO [org.jboss.as.clustering.CoreGroupCommunicationService.lifecycle.cluster] (Incoming-11,null) JBAS010247: New cluster view for partition cluster (id: 3, delta: 1, merge: false) : [node-udp-0/cluster, node-udp-1/cluster]
20:22:36,659 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-11,null) ISPN000094: Received new cluster view: [node-udp-0/cluster|3] [node-udp-0/cluster, node-udp-1/cluster]
{noformat}
After 10 seconds `node-udp-1` is restarted and it properly joins the cluster the second time. The "ghost" of the previous `node-udp-1` instance is still there:
{noformat}
20:22:49,479 INFO [stdout] (pool-5-thread-1) GMS: address=node-udp-1/cluster, cluster=cluster, physical address=127.0.0.1:55300
20:22:49,549 INFO [org.jboss.as.clustering.CoreGroupCommunicationService.lifecycle.cluster] (Incoming-13,null) JBAS010247: New cluster view for partition cluster (id: 4, delta: 1, merge: false) : [node-udp-0/cluster, node-udp-1/cluster, node-udp-1/cluster]
20:22:49,552 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-13,null) ISPN000094: Received new cluster view: [node-udp-0/cluster|4] [node-udp-0/cluster, node-udp-1/cluster, node-udp-1/cluster
{noformat}
30 seconds after `node-udp-1` was killed, FD kicks it out of the cluster again:
{noformat}
20:23:20,224 INFO [org.jboss.as.clustering.CoreGroupCommunicationService.lifecycle.cluster] (Incoming-15,null) JBAS010247: New cluster view for partition cluster (id: 5, delta: -1, merge: false) : [node-udp-0/cluster, node-udp-1/cluster]
{noformat}
This looks like it could be a JGroups issue, but I don't think it could have caused the hang-up - the cause is still ISPN-1814.
> Potential race condition results in StateTransferInProgressException on view change
> -----------------------------------------------------------------------------------
>
> Key: ISPN-1806
> URL: https://issues.jboss.org/browse/ISPN-1806
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.1.0.FINAL
> Reporter: Paul Ferraro
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 5.2.0.FINAL
>
> Attachments: org.jboss.as.test.clustering.unmanaged.singleton.SingletonTestCase-output.txt
>
>
> I'm not sure yet if this is an Infinispan or AS bug. In summary, I'm performing cache operations from a @ViewChanged event. Occasionally this results in an endless loop of "Failed to prepare view CacheView" error messages and upon timeout, a StateTransferInProgressException. I've attached the server log containing the eventual thread dump.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list