[
https://issues.jboss.org/browse/ISPN-1806?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-1806:
------------------------------------
I missed something in the log, in fact FD_SOCK did suspect `node-udp-1` and it got kicked
out of the cluster. However, it appears as if the node re-joins the cluster 50ms after it
was killed:
{noformat}
20:22:36,602 INFO
[org.jboss.as.clustering.CoreGroupCommunicationService.lifecycle.cluster]
(Incoming-9,null) JBAS010247: New cluster view for partition cluster (id: 2, delta: -1,
merge: false) : [node-udp-0/cluster]
20:22:36,603 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-9,null) ISPN000094: Received new cluster view: [node-udp-0/cluster|2]
[node-udp-0/cluster]
20:22:36,659 INFO
[org.jboss.as.clustering.CoreGroupCommunicationService.lifecycle.cluster]
(Incoming-11,null) JBAS010247: New cluster view for partition cluster (id: 3, delta: 1,
merge: false) : [node-udp-0/cluster, node-udp-1/cluster]
20:22:36,659 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-11,null) ISPN000094: Received new cluster view: [node-udp-0/cluster|3]
[node-udp-0/cluster, node-udp-1/cluster]
{noformat}
After 10 seconds `node-udp-1` is restarted and it properly joins the cluster the second
time. The "ghost" of the previous `node-udp-1` instance is still there:
{noformat}
20:22:49,479 INFO [stdout] (pool-5-thread-1) GMS: address=node-udp-1/cluster,
cluster=cluster, physical address=127.0.0.1:55300
20:22:49,549 INFO
[org.jboss.as.clustering.CoreGroupCommunicationService.lifecycle.cluster]
(Incoming-13,null) JBAS010247: New cluster view for partition cluster (id: 4, delta: 1,
merge: false) : [node-udp-0/cluster, node-udp-1/cluster, node-udp-1/cluster]
20:22:49,552 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-13,null) ISPN000094: Received new cluster view: [node-udp-0/cluster|4]
[node-udp-0/cluster, node-udp-1/cluster, node-udp-1/cluster
{noformat}
30 seconds after `node-udp-1` was killed, FD kicks it out of the cluster again:
{noformat}
20:23:20,224 INFO
[org.jboss.as.clustering.CoreGroupCommunicationService.lifecycle.cluster]
(Incoming-15,null) JBAS010247: New cluster view for partition cluster (id: 5, delta: -1,
merge: false) : [node-udp-0/cluster, node-udp-1/cluster]
{noformat}
This looks like it could be a JGroups issue, but I don't think it could have caused
the hang-up - the cause is still ISPN-1814.
Potential race condition results in StateTransferInProgressException
on view change
-----------------------------------------------------------------------------------
Key: ISPN-1806
URL:
https://issues.jboss.org/browse/ISPN-1806
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 5.1.0.FINAL
Reporter: Paul Ferraro
Assignee: Dan Berindei
Priority: Critical
Fix For: 5.2.0.FINAL
Attachments:
org.jboss.as.test.clustering.unmanaged.singleton.SingletonTestCase-output.txt
I'm not sure yet if this is an Infinispan or AS bug. In summary, I'm performing
cache operations from a @ViewChanged event. Occasionally this results in an endless loop
of "Failed to prepare view CacheView" error messages and upon timeout, a
StateTransferInProgressException. I've attached the server log containing the
eventual thread dump.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira