Galder Zamarreño created ISPN-1769:
--------------------------------------
Summary: Rehashing hang ups leading to nodes starting as sole member in
cluster
Key: ISPN-1769
URL:
https://issues.jboss.org/browse/ISPN-1769
Project: Infinispan
Issue Type: Bug
Components: Distributed Cache
Affects Versions: 5.1.0.CR4
Reporter: Galder Zamarreño
Assignee: Dan Berindei
Priority: Blocker
Fix For: 5.1.0.FINAL
Attachments: join-timeout-dan.zip, join-timeout-threaddump.zip
It seems we still have rehashing issues starting nodes, see email below:
{quote}
Hey Dan,
I'm trying to run the old Infinispan Lab where I start 4 nodes, each of which starts a
DIST_SYNC cache (Infinispan 5.1.0.CR4) with these configurations:
new ConfigurationBuilder()
.clustering()
.cacheMode(CacheMode.DIST_SYNC)
.l1().disable()
.jmxStatistics()
.build();
new DefaultCacheManager(
GlobalConfigurationBuilder.defaultClusteredBuilder()
.transport()
.addProperty("configurationFile", "jgroups.xml")
.build()
);
And I got a hang in one of the nodes that ended up starting on its own. This is very
similar to the issues we had back in November.
I don't have TRACE logs yet but I have a thread dump of all the nodes which you can
find attached.
It's run in AS7 domain model so the output of all processes is mixed up. The node that
doesn't start in time is 'Server:server-four', so you can grep by that.
There's barely 7 entries in memory and should not have up like this.
I'm gonna try to get some TRACE logs.
{quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira