[infinispan-issues] [JBoss JIRA] (ISPN-1806) Potential race condition results in StateTransferInProgressException on view change
Dan Berindei (JIRA)
jira-events at lists.jboss.org
Mon Jan 30 19:33:48 EST 2012
[ https://issues.jboss.org/browse/ISPN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662912#comment-12662912 ]
Dan Berindei commented on ISPN-1806:
------------------------------------
According to the attached log there there is no problem with the commit/StateTransferInProgressException - the transaction thread is blocked because CacheViewsManagerImpl is not able to install a new cache view. I created a separate issue to describe the problem: ISPN-1814.
There is another problem apparent in the log: JGroups apparently didn't exclude the killed cluster member from the view even after it failed to ACK the new view:
{noformat}
20:22:49,552 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-13,null) ISPN000094: Received new cluster view: [node-udp-0/cluster|4] [node-udp-0/cluster, node-udp-1/cluster, node-udp-1/cluster]
20:22:52,497 WARNING [org.jgroups.protocols.pbcast.GMS] (pool-5-thread-1) JOIN(node-udp-1/cluster) sent to node-udp-0/cluster timed out (after 3000 ms), retrying
20:22:54,549 WARNING [org.jgroups.protocols.pbcast.GMS] (ViewHandler,cluster,node-udp-0/cluster) node-udp-0/cluster: failed to collect all ACKs (expected=2) for view [node-udp-0/cluster|4] after 5000ms, missing ACKs from [node-udp-1/cluster]
20:22:54,835 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (pool-9-thread-1) ISPN000094: Received new cluster view: [node-udp-0/cluster|4] [node-udp-0/cluster, 1903f28a-f7d1-1488-62a3-03a2df0f5b62, node-udp-1/cluster]
...
20:23:20,174 INFO [org.jboss.as.clustering.CoreGroupCommunicationService.cluster] (VERIFY_SUSPECT.TimerThread,cluster,node-udp-1/cluster) JBAS010254: Suspected member: 1903f28a-f7d1-1488-62a3-03a2df0f5b62
{noformat}
I'm not sure why this happens - is FD/FD_ALL enabled in the JGroups configuration?
And finally, there are some issues with the log itself:
* The log messages from Incoming threads have a 'null' instead of the node name. Not sure if it's related to the (caught) exception in {{CoreGroupCommunicationService$MembershipListenerImpl.viewAccepted}}, since the OOB threads look fine.
* The restarted node should start each time with a different name, it's hard to understand what's happening with two {{node-udp-1/cluster}} s in the same cluster.
> Potential race condition results in StateTransferInProgressException on view change
> -----------------------------------------------------------------------------------
>
> Key: ISPN-1806
> URL: https://issues.jboss.org/browse/ISPN-1806
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.1.0.FINAL
> Reporter: Paul Ferraro
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: org.jboss.as.test.clustering.unmanaged.singleton.SingletonTestCase-output.txt
>
>
> I'm not sure yet if this is an Infinispan or AS bug. In summary, I'm performing cache operations from a @ViewChanged event. Occasionally this results in an endless loop of "Failed to prepare view CacheView" error messages and upon timeout, a StateTransferInProgressException. I've attached the server log containing the eventual thread dump.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list