[jboss-dev-forums] [Design of Clustering on JBoss (Clusters/JBoss)] - Re: JBAS-4608 Adding FLUSH to 4.2.x stacks
bstansberry@jboss.com
do-not-reply at jboss.com
Wed Aug 15 10:16:55 EDT 2007
Agreed on the point release change. If another minor release ever comes out of 4.x (more likely than you might imagine ;) ) then we can consider it.
Re: chasing down what changed, to unify things on this thread I'm going to cut and paste the relevant log analysis from David's support case that led to this issue. Background is David runs a demo app very similar to the ejb lab in the clustering course. 2 nodes on same machine; client in 3rd process exercises a simple counter SFSB in a loop; kill nodes and observe failover. David runs with 2 client processes, not one like the lab. I don't think that's relevant. Problem occurs when the killed node is restarted -- during the restart, client invocations pause for extended periods.
>From the support case:
What's going on here is the SFSB service is attempting to replicate a change before the newly joined node has received a view. The JGroups channel on the new node drops the RPC message. The sender then blocks waiting for a response.
Node1 installs a view with node2 included:
2007-08-09 15:13:14,397 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] New cluster view for partition DefaultPartition (id: 5, delta: 1) : [192.168.200.1:1099, 192.168.200.2:1099]
Client requests on node1 result in session replication RPCs:
2007-08-09 15:13:14,431 INFO [STDOUT] Total arrested guys: 79
2007-08-09 15:13:14,432 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] callMethodOnCluster(true), objName=SessionState-'/HASessionState/Default', methodName=_setState, members=[192.168.200.2:33177]
2007-08-09 15:13:14,432 TRACE [org.jboss.ha.framework.server.HAPartitionImpl] dests=[192.168.200.2:33177], method_call=SessionState-'/HASessionState/Default'._setState(GangDispute, org.jboss.ha.hasessionstate.server.PackagedSessionImpl at 1209d56), mode=2, timeout=60000
2007-08-09 15:13:14,432 TRACE [org.jboss.ha.framework.server.HAPartitionImpl] real_dests=[192.168.200.2:33177]
2007-08-09 15:13:14,432 TRACE [org.jgroups.blocks.GroupRequest] sending request (id=1186686794432)
2007-08-09 15:13:14,432 TRACE [org.jgroups.protocols.pbcast.NAKACK] sending 192.168.200.1:33167#27
2007-08-09 15:13:14,433 TRACE [org.jgroups.protocols.UDP] sending msg to null (src=192.168.200.1:33167), headers are {NAKACK=[MSG, seqno=27], MessageDispatcher=[Header: name=MessageDispatcher, type=REQ, id=1186686794432, rsp_expected=true], dest_mbrs=[192.168.200.2:33177], UDP=[channel_name=DefaultPartition]}
2007-08-09 15:13:14,433 TRACE [org.jgroups.protocols.UDP] received (mcast) 940 bytes from 192.168.200.1:33168
2007-08-09 15:13:14,433 TRACE [org.jgroups.protocols.UDP] message is [dst: 228.1.2.3:45566, src: 192.168.200.1:33167 (3 headers), size = 759 bytes], headers are {NAKACK=[MSG, seqno=27], MessageDispatcher=[Header: name=MessageDispatcher, type=REQ, id=1186686794432, rsp_expected=true], dest_mbrs=[192.168.200.2:33177], UDP=[channel_name=DefaultPartition]}
2007-08-09 15:13:14,434 TRACE [org.jgroups.protocols.pbcast.NAKACK] 192.168.200.1:33167: received 192.168.200.1:33167#27
2007-08-09 15:13:14,434 TRACE [org.jgroups.blocks.RequestCorrelator] discarded request from 192.168.200.1:33167 as we are not part of destination list (local_addr=192.168.200.1:33167, hdr=[Header: name=MessageDispatcher, type=REQ, id=1186686794432, rsp_expected=true], dest_mbrs=[192.168.200.2:33177])
2007-08-09 15:13:14,580 INFO [STDOUT] Total arrested guys: 79
2007-08-09 15:13:14,581 TRACE [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] callMethodOnCluster(true), objName=SessionState-'/HASessionState/Default', methodName=_setState, members=[192.168.200.2:33177]
2007-08-09 15:13:14,581 TRACE [org.jboss.ha.framework.server.HAPartitionImpl] dests=[192.168.200.2:33177], method_call=SessionState-'/HASessionState/Default'._setState(GangDispute, org.jboss.ha.hasessionstate.server.PackagedSessionImpl at 347cdb), mode=2, timeout=60000
2007-08-09 15:13:14,581 TRACE [org.jboss.ha.framework.server.HAPartitionImpl] real_dests=[192.168.200.2:33177]
2007-08-09 15:13:14,581 TRACE [org.jgroups.blocks.GroupRequest] sending request (id=1186686794581)
Node2 gets and discards the RPC messages:
2007-08-09 15:13:14,433 TRACE [org.jgroups.protocols.UDP] received (mcast) 940 bytes from 192.168.200.1:33168
2007-08-09 15:13:14,434 TRACE [org.jgroups.protocols.UDP] message is [dst: 228.1.2.3:45566, src: 192.168.200.1:33167 (3 headers), size = 759 bytes], headers are {NAKACK=[MSG, seqno=27], MessageDispatcher=[Header: name=MessageDispatcher, type=REQ, id=1186686794432, rsp_expected=true], dest_mbrs=[192.168.200.2:33177], UDP=[channel_name=DefaultPartition]}
2007-08-09 15:13:14,434 TRACE [org.jgroups.protocols.pbcast.NAKACK] message was discarded (not yet server)
2007-08-09 15:13:14,582 TRACE [org.jgroups.protocols.UDP] received (mcast) 940 bytes from 192.168.200.1:33168
2007-08-09 15:13:14,583 TRACE [org.jgroups.protocols.UDP] message is [dst: 228.1.2.3:45566, src: 192.168.200.1:33167 (3 headers), size = 759 bytes], headers are {NAKACK=[MSG, seqno=28], MessageDispatcher=[Header: name=MessageDispatcher, type=REQ, id=1186686794581, rsp_expected=true], dest_mbrs=[192.168.200.2:33177], UDP=[channel_name=DefaultPartition]}
2007-08-09 15:13:14,583 TRACE [org.jgroups.protocols.pbcast.NAKACK] message was discarded (not yet server)
Node2 installs view:
2007-08-09 15:13:14,712 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] ViewAccepted: initial members set
A minute later the RPCs time out on node1:
2007-08-09 15:14:14,443 TRACE [org.jgroups.blocks.GroupRequest] call did not execute correctly, request is [GroupRequest:
req_id=1186686794432
caller=192.168.200.1:33167
192.168.200.2:33177: sender=192.168.200.2:33177, retval=null, received=false, suspected=false
request_msg: [dst: , src: 192.168.200.1:33167 (3 headers), size = 759 bytes]
rsp_mode: GET_ALL
done: false
timeout: 60000
expected_mbrs: 0
]
Full logs with org.jgroups and org.jboss.ha are attached to David's support case 00017603.
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4074449#4074449
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4074449
More information about the jboss-dev-forums
mailing list