[infinispan-issues] [JBoss JIRA] (ISPN-1508) Failures during hotrod server cluster formation

Thu Nov 17 07:25:40 EST 2011

    [ https://issues.jboss.org/browse/ISPN-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643675#comment-12643675 ] 

Dan Berindei commented on ISPN-1508:
------------------------------------

Ok, looks like there was something else going on: the state transfer for a distributed cache may start before the state transfer for the topology cache. I'm not sure if it's because all the caches are started in parallel or if it only happens when the cluster forms through a merge (because getCache() will return on the 2nd node before doing any state transfer).

The topology cache needs to be fully populated before we can create consistent hashes for the other caches, so if the topology cache's state transfer didn't run yet all the other (distributed) caches will fail to install the new cache view.

We do retry to install the cache view, so this should only have caused some error messages on startup, but unfortunately there is another problem with the way cache view installation is cancelled. It allows the PREPARE_VIEW and ROLLBACK_VIEW commands to run in the reverse order on the remote nodes and seems to lead to a cycle of failed cache view installation.

> Failures during hotrod server cluster formation
> -----------------------------------------------
>
>                 Key: ISPN-1508
>                 URL: https://issues.jboss.org/browse/ISPN-1508
>             Project: Infinispan
>          Issue Type: Bug
>    Affects Versions: 5.1.0.BETA3
>            Reporter: Michal Linhard
>            Assignee: Dan Berindei
>            Priority: Critical
>             Fix For: 5.1.0.BETA5
>
>
> failing build: https://hudson.qa.jboss.com/hudson/job/edg-60-client-stress-test/87
> server logs: https://hudson.qa.jboss.com/hudson/job/edg-60-client-stress-test/87/artifact/report/run1/serverlogs.zip
> in this test I'm starting 4 standalone infinispan hotrod servers (startServer.sh style)
> and they take too long to form a cluster (more than 2 min)
> there errors related to CacheViewControlCommand
> the infinispan version is 5.1.0-SNAPSHOT (c642be2c8a64c13e7e74283f38e2037cef9a362f) + a quick patch for ISPN-1498

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira