[infinispan-issues] [JBoss JIRA] (ISPN-1806) Potential race condition results in StateTransferInProgressException on view change

Monday, 30 January 2012

    [
https://issues.jboss.org/browse/ISPN-1806?page=com.atlassian.jira.plugin....
] 

Dan Berindei commented on ISPN-1806:
------------------------------------

According to the attached log there there is no problem with the
commit/StateTransferInProgressException - the transaction thread is blocked because
CacheViewsManagerImpl is not able to install a new cache view. I created a separate issue
to describe the problem: ISPN-1814.

There is another problem apparent in the log: JGroups apparently didn't exclude the
killed cluster member from the view even after it failed to ACK the new view:

{noformat}
20:22:49,552 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(Incoming-13,null) ISPN000094: Received new cluster view: [node-udp-0/cluster|4]
[node-udp-0/cluster, node-udp-1/cluster, node-udp-1/cluster]
20:22:52,497 WARNING [org.jgroups.protocols.pbcast.GMS] (pool-5-thread-1)
JOIN(node-udp-1/cluster) sent to node-udp-0/cluster timed out (after 3000 ms), retrying
20:22:54,549 WARNING [org.jgroups.protocols.pbcast.GMS]
(ViewHandler,cluster,node-udp-0/cluster) node-udp-0/cluster: failed to collect all ACKs
(expected=2) for view [node-udp-0/cluster|4] after 5000ms, missing ACKs from
[node-udp-1/cluster]
20:22:54,835 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(pool-9-thread-1) ISPN000094: Received new cluster view: [node-udp-0/cluster|4]
[node-udp-0/cluster, 1903f28a-f7d1-1488-62a3-03a2df0f5b62, node-udp-1/cluster]
...
20:23:20,174 INFO  [org.jboss.as.clustering.CoreGroupCommunicationService.cluster]
(VERIFY_SUSPECT.TimerThread,cluster,node-udp-1/cluster) JBAS010254: Suspected member:
1903f28a-f7d1-1488-62a3-03a2df0f5b62
{noformat}

I'm not sure why this happens - is FD/FD_ALL enabled in the JGroups configuration?

And finally, there are some issues with the log itself:
* The log messages from Incoming threads have a 'null' instead of the node name.
Not sure if it's related to the (caught) exception in
{{CoreGroupCommunicationService$MembershipListenerImpl.viewAccepted}}, since the OOB
threads look fine.
* The restarted node should start each time with a different name, it's hard to
understand what's happening with two {{node-udp-1/cluster}} s in the same cluster.

...
 Potential race condition results in StateTransferInProgressException
on view change
 -----------------------------------------------------------------------------------

                 Key: ISPN-1806
                 URL: https://issues.jboss.org/browse/ISPN-1806
             Project: Infinispan
          Issue Type: Bug
          Components: State transfer
    Affects Versions: 5.1.0.FINAL
            Reporter: Paul Ferraro
            Assignee: Dan Berindei
            Priority: Critical
         Attachments:
org.jboss.as.test.clustering.unmanaged.singleton.SingletonTestCase-output.txt

 I'm not sure yet if this is an Infinispan or AS bug.  In summary, I'm performing
cache operations from a @ViewChanged event.  Occasionally this results in an endless loop
of "Failed to prepare view CacheView" error messages and upon timeout, a
StateTransferInProgressException.  I've attached the server log containing the
eventual thread dump. 
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] (ISPN-1806) Potential race condition results in StateTransferInProgressException on view change