Agreed on the point release change. If another minor release ever comes out of 4.x (more
likely than you might imagine ;) ) then we can consider it.
Re: chasing down what changed, to unify things on this thread I'm going to cut and
paste the relevant log analysis from David's support case that led to this issue.
Background is David runs a demo app very similar to the ejb lab in the clustering course.
2 nodes on same machine; client in 3rd process exercises a simple counter SFSB in a loop;
kill nodes and observe failover. David runs with 2 client processes, not one like the
lab. I don't think that's relevant. Problem occurs when the killed node is
restarted -- during the restart, client invocations pause for extended periods.
From the support case:
What's going on here is the SFSB service is attempting to replicate a change before
the newly joined node has received a view. The JGroups channel on the new node drops the
RPC message. The sender then blocks waiting for a response.
Node1 installs a view with node2 included:
2007-08-09 15:13:14,397 INFO
[org.jboss.ha.framework.interfaces.HAPartition.lifecycle.DefaultPartition] New cluster
view for partition DefaultPartition (id: 5, delta: 1) : [192.168.200.1:1099,
192.168.200.2:1099]
Client requests on node1 result in session replication RPCs:
2007-08-09 15:13:14,431 INFO [STDOUT] Total arrested guys: 79
2007-08-09 15:13:14,432 TRACE
[org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition]
callMethodOnCluster(true), objName=SessionState-'/HASessionState/Default',
methodName=_setState, members=[192.168.200.2:33177]
2007-08-09 15:13:14,432 TRACE [org.jboss.ha.framework.server.HAPartitionImpl]
dests=[192.168.200.2:33177],
method_call=SessionState-'/HASessionState/Default'._setState(GangDispute,
org.jboss.ha.hasessionstate.server.PackagedSessionImpl@1209d56), mode=2, timeout=60000
2007-08-09 15:13:14,432 TRACE [org.jboss.ha.framework.server.HAPartitionImpl]
real_dests=[192.168.200.2:33177]
2007-08-09 15:13:14,432 TRACE [org.jgroups.blocks.GroupRequest] sending request
(id=1186686794432)
2007-08-09 15:13:14,432 TRACE [org.jgroups.protocols.pbcast.NAKACK] sending
192.168.200.1:33167#27
2007-08-09 15:13:14,433 TRACE [org.jgroups.protocols.UDP] sending msg to null
(src=192.168.200.1:33167), headers are {NAKACK=[MSG, seqno=27], MessageDispatcher=[Header:
name=MessageDispatcher, type=REQ, id=1186686794432, rsp_expected=true],
dest_mbrs=[192.168.200.2:33177], UDP=[channel_name=DefaultPartition]}
2007-08-09 15:13:14,433 TRACE [org.jgroups.protocols.UDP] received (mcast) 940 bytes from
192.168.200.1:33168
2007-08-09 15:13:14,433 TRACE [org.jgroups.protocols.UDP] message is [dst:
228.1.2.3:45566, src: 192.168.200.1:33167 (3 headers), size = 759 bytes], headers are
{NAKACK=[MSG, seqno=27], MessageDispatcher=[Header: name=MessageDispatcher, type=REQ,
id=1186686794432, rsp_expected=true], dest_mbrs=[192.168.200.2:33177],
UDP=[channel_name=DefaultPartition]}
2007-08-09 15:13:14,434 TRACE [org.jgroups.protocols.pbcast.NAKACK] 192.168.200.1:33167:
received 192.168.200.1:33167#27
2007-08-09 15:13:14,434 TRACE [org.jgroups.blocks.RequestCorrelator] discarded request
from 192.168.200.1:33167 as we are not part of destination list
(local_addr=192.168.200.1:33167, hdr=[Header: name=MessageDispatcher, type=REQ,
id=1186686794432, rsp_expected=true], dest_mbrs=[192.168.200.2:33177])
2007-08-09 15:13:14,580 INFO [STDOUT] Total arrested guys: 79
2007-08-09 15:13:14,581 TRACE
[org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition]
callMethodOnCluster(true), objName=SessionState-'/HASessionState/Default',
methodName=_setState, members=[192.168.200.2:33177]
2007-08-09 15:13:14,581 TRACE [org.jboss.ha.framework.server.HAPartitionImpl]
dests=[192.168.200.2:33177],
method_call=SessionState-'/HASessionState/Default'._setState(GangDispute,
org.jboss.ha.hasessionstate.server.PackagedSessionImpl@347cdb), mode=2, timeout=60000
2007-08-09 15:13:14,581 TRACE [org.jboss.ha.framework.server.HAPartitionImpl]
real_dests=[192.168.200.2:33177]
2007-08-09 15:13:14,581 TRACE [org.jgroups.blocks.GroupRequest] sending request
(id=1186686794581)
Node2 gets and discards the RPC messages:
2007-08-09 15:13:14,433 TRACE [org.jgroups.protocols.UDP] received (mcast) 940 bytes from
192.168.200.1:33168
2007-08-09 15:13:14,434 TRACE [org.jgroups.protocols.UDP] message is [dst:
228.1.2.3:45566, src: 192.168.200.1:33167 (3 headers), size = 759 bytes], headers are
{NAKACK=[MSG, seqno=27], MessageDispatcher=[Header: name=MessageDispatcher, type=REQ,
id=1186686794432, rsp_expected=true], dest_mbrs=[192.168.200.2:33177],
UDP=[channel_name=DefaultPartition]}
2007-08-09 15:13:14,434 TRACE [org.jgroups.protocols.pbcast.NAKACK] message was discarded
(not yet server)
2007-08-09 15:13:14,582 TRACE [org.jgroups.protocols.UDP] received (mcast) 940 bytes from
192.168.200.1:33168
2007-08-09 15:13:14,583 TRACE [org.jgroups.protocols.UDP] message is [dst:
228.1.2.3:45566, src: 192.168.200.1:33167 (3 headers), size = 759 bytes], headers are
{NAKACK=[MSG, seqno=28], MessageDispatcher=[Header: name=MessageDispatcher, type=REQ,
id=1186686794581, rsp_expected=true], dest_mbrs=[192.168.200.2:33177],
UDP=[channel_name=DefaultPartition]}
2007-08-09 15:13:14,583 TRACE [org.jgroups.protocols.pbcast.NAKACK] message was discarded
(not yet server)
Node2 installs view:
2007-08-09 15:13:14,712 DEBUG
[org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] ViewAccepted: initial
members set
A minute later the RPCs time out on node1:
2007-08-09 15:14:14,443 TRACE [org.jgroups.blocks.GroupRequest] call did not execute
correctly, request is [GroupRequest:
req_id=1186686794432
caller=192.168.200.1:33167
192.168.200.2:33177: sender=192.168.200.2:33177, retval=null, received=false,
suspected=false
request_msg: [dst: , src: 192.168.200.1:33167 (3 headers), size = 759 bytes]
rsp_mode: GET_ALL
done: false
timeout: 60000
expected_mbrs: 0
]
Full logs with org.jgroups and org.jboss.ha are attached to David's support case
00017603.
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4074449#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...