Dennis Reed created ISPN-2904:
---------------------------------
Summary: Race condition in cache startup causes state transfer timeout
Key: ISPN-2904
URL:
https://issues.jboss.org/browse/ISPN-2904
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 5.1.7.Final
Reporter: Dennis Reed
Assignee: Mircea Markus
When starting multiple caches at the same time (as EAP domain mode deployment does), one
cache can timeout during state transfer and abort startup.
This is caused by a race condition where the master node accepts requests while it
can't process them because it's still starting.
Because of this, the other node's REQUEST_JOIN is ignored, and it finally times out.
[node1]
10:47:23,390 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(ServerService Thread Pool -- 65) dests=[master:server-two/web],
command=CacheViewControlCommand{cache=repl, type=REQUEST_JOIN,
sender=master:server-one/web, newViewId=0, newMembers=null, oldViewId=0, oldMembers=null},
mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=60000
10:47:23,396 TRACE [org.jgroups.protocols.TCP] (ServerService Thread Pool -- 65) sending
msg to master:server-two/web, src=master:server-one/web, headers are RequestCorrelator:
id=200, type=REQ, id=7, rsp_expected=true, RSVP: REQ(4), UNICAST2: DATA, seqno=27, TCP:
[channel_name=web]
...
10:48:23,404 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 65)
MSC000001: Failed to start service jboss.infinispan.web.repl:
org.jboss.msc.service.StartException in service jboss.infinispan.web.repl:
org.infinispan.CacheException: Unable to invoke method public void
org.infinispan.statetransfer.BaseStateTransferManagerImpl.waitForJoinToComplete() throws
java.lang.InterruptedException on object of type ReplicatedStateTransferManagerImpl
[node2]
10:47:23,352 TRACE [org.infinispan.factories.GlobalComponentRegistry] (MSC service thread
1-6) Registering component
Component{instance=org.infinispan.marshall.jboss.ExternalizerTable@3f9c437d,
name=org.infinispan.marshall.jboss.ExternalizerTable} under name
org.infinispan.marshall.jboss.ExternalizerTable
...
10:47:23,397 TRACE [org.jgroups.protocols.TCP] (OOB-19,null) received [dst:
master:server-two/web, src: master:server-one/web (4 headers), size=54 bytes,
flags=OOB|DONT_BUNDLE|RSVP], headers are RequestCorrelator: id=200, type=REQ, id=7,
rsp_expected=true, RSVP: REQ(4), UNICAST2: DATA, seqno=27, TCP: [channel_name=web]
10:47:23,398 TRACE [org.jgroups.blocks.RequestCorrelator] (OOB-19,null) calling
(org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 7
10:47:23,398 TRACE [org.infinispan.marshall.jboss.ExternalizerTable] (OOB-19,null) Either
the marshaller has stopped or hasn't started. Read externalizers are not properly
populated: {}
10:47:23,398 TRACE [org.infinispan.marshall.jboss.ExternalizerTable] (OOB-19,null) Cache
manager is shutting down and type (id=74) cannot be resolved (thread not interrupted)
10:47:23,400 TRACE [org.jgroups.blocks.RequestCorrelator] (OOB-19,null) sending rsp for 7
to master:server-one/web
...
10:47:23,522 TRACE [org.infinispan.factories.GlobalComponentRegistry] (ServerService
Thread Pool -- 64) Invoking start method public void
org.infinispan.marshall.jboss.ExternalizerTable.start() on component
org.infinispan.marshall.jboss.ExternalizerTable
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira