[
https://issues.jboss.org/browse/ISPN-1806?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-1806:
------------------------------------
According to the attached log there there is no problem with the
commit/StateTransferInProgressException - the transaction thread is blocked because
CacheViewsManagerImpl is not able to install a new cache view. I created a separate issue
to describe the problem: ISPN-1814.
There is another problem apparent in the log: JGroups apparently didn't exclude the
killed cluster member from the view even after it failed to ACK the new view:
{noformat}
20:22:49,552 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-13,null) ISPN000094: Received new cluster view: [node-udp-0/cluster|4]
[node-udp-0/cluster, node-udp-1/cluster, node-udp-1/cluster]
20:22:52,497 WARNING [org.jgroups.protocols.pbcast.GMS] (pool-5-thread-1)
JOIN(node-udp-1/cluster) sent to node-udp-0/cluster timed out (after 3000 ms), retrying
20:22:54,549 WARNING [org.jgroups.protocols.pbcast.GMS]
(ViewHandler,cluster,node-udp-0/cluster) node-udp-0/cluster: failed to collect all ACKs
(expected=2) for view [node-udp-0/cluster|4] after 5000ms, missing ACKs from
[node-udp-1/cluster]
20:22:54,835 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(pool-9-thread-1) ISPN000094: Received new cluster view: [node-udp-0/cluster|4]
[node-udp-0/cluster, 1903f28a-f7d1-1488-62a3-03a2df0f5b62, node-udp-1/cluster]
...
20:23:20,174 INFO [org.jboss.as.clustering.CoreGroupCommunicationService.cluster]
(VERIFY_SUSPECT.TimerThread,cluster,node-udp-1/cluster) JBAS010254: Suspected member:
1903f28a-f7d1-1488-62a3-03a2df0f5b62
{noformat}
I'm not sure why this happens - is FD/FD_ALL enabled in the JGroups configuration?
And finally, there are some issues with the log itself:
* The log messages from Incoming threads have a 'null' instead of the node name.
Not sure if it's related to the (caught) exception in
{{CoreGroupCommunicationService$MembershipListenerImpl.viewAccepted}}, since the OOB
threads look fine.
* The restarted node should start each time with a different name, it's hard to
understand what's happening with two {{node-udp-1/cluster}} s in the same cluster.
Potential race condition results in StateTransferInProgressException
on view change
-----------------------------------------------------------------------------------
Key: ISPN-1806
URL:
https://issues.jboss.org/browse/ISPN-1806
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 5.1.0.FINAL
Reporter: Paul Ferraro
Assignee: Dan Berindei
Priority: Critical
Attachments:
org.jboss.as.test.clustering.unmanaged.singleton.SingletonTestCase-output.txt
I'm not sure yet if this is an Infinispan or AS bug. In summary, I'm performing
cache operations from a @ViewChanged event. Occasionally this results in an endless loop
of "Failed to prepare view CacheView" error messages and upon timeout, a
StateTransferInProgressException. I've attached the server log containing the
eventual thread dump.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira