[jboss-user] [JBossCache] - Problem when a node tries to join the cluster
jiafan1
do-not-reply at jboss.com
Thu Nov 16 10:13:42 EST 2006
Hello,
I have some problems when try to bring up a new node to an existing cluster. It seems the new node
waits for pretty long time before it receives the join response. I set the PING timeout to be 2 seconds,
the new node always timed out and thought itself to be the first member in the group, and thus, cannot
fetch the in-memory state from other nodes during startup. Until after sometime, it joined the group, then
everything works fine. But before that, system does not work properly because the cache state is not
synchronized in the new node. (I use JBOSS TreeCache)
I changed the PING timeout to be 20 seconds, in most cases, the
new node can join the existing group, but there are still some cases the new node timed out and
thought itself to be the first member in the group. I use Fedora core 6 linux box, use JBoss TreeCache
and tomcat. The cluster includes only two nodes. I just wonder why it takes so long for the new node
to join the existing group?
The trace information and JBoss Cache configuration are attached. Any hints are welcome.
Thanks in advance,
Jian Fang
--------------------------------
Trace information:
--------------------------------
11: class org.jgroups.protocols.PerfHeader
12: class org.jgroups.protocols.PIGGYBACK$PiggybackHeader
13: class org.jgroups.protocols.PingHeader
--
2006-11-15 13:03:26,292 (main) org.jgroups.protocols.AUTOCONF.senseMaxFragSize
DEBUG: frag_size=64000
2006-11-15 13:03:26,361 (main) org.jgroups.protocols.pbcast.GMS.setImpl
DEBUG: changed role to org.jgroups.protocols.pbcast.ClientGmsImpl
2006-11-15 13:03:26,415 (DownHandler (STABLE)) org.jgroups.protocols.pbcast.STABLE.startStableTask
DEBUG: stable task started
2006-11-15 13:03:26,420 (DownHandler (UDP)) org.jgroups.protocols.UDP.start
DEBUG: creating sockets and starting threads
2006-11-15 13:03:26,421 (DownHandler (UDP)) org.jgroups.protocols.UDP.createSockets
INFO : sockets will use interface 10.20.2.37
2006-11-15 13:03:26,426 (DownHandler (UDP)) org.jgroups.protocols.UDP.createSockets
--
mcast_send_sock: bound to 10.20.2.37:33043, send buffer size=320000, receive buffer size=320000
2006-11-15 13:03:26,433 (DownHandler (UDP)) org.jgroups.protocols.UDP.bindToInterfaces
DEBUG: joined /224.0.0.75:7500 on eth0
2006-11-15 13:03:26,435 (DownHandler (UDP)) org.jgroups.protocols.UDP.bindToInterfaces
DEBUG: joined /224.0.0.75:7500 on lo
2006-11-15 13:03:26,438 (DownHandler (UDP)) org.jgroups.protocols.UDP.startThreads
DEBUG: created unicast receiver thread
2006-11-15 13:03:26,440 (UpHandler (STATE_TRANSFER)) org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.passUp
DEBUG: setting local_addr (null) to 10.20.2.37:33042
2006-11-15 13:03:26,442 (PingSender) org.jgroups.protocols.PingSender.run
DEBUG: sending GET_MBRS_REQ
2006-11-15 13:03:26,442 (PingWaiter) org.jgroups.protocols.PingWaiter.findInitialMembers
DEBUG: waiting for initial members: time_to_wait=2000, got 0 rsps
2006-11-15 13:03:26,447 (DownHandler (UDP)) org.jgroups.protocols.UDP.down
DEBUG: sending msg to null (src=10.20.2.37:33042), headers are {PING=[PING: type=GET_MBRS_REQ, arg=null], UDP=[channel_name=JTVCORE-Cluster]}
2006-11-15 13:03:26,450 (UDP mcast receiver) org.jgroups.protocols.UDP.receive
DEBUG: received (mcast)57 bytes from 10.20.2.37:33043
2006-11-15 13:03:26,454 (UDP mcast receiver) org.jgroups.protocols.UDP.handleIncomingMessage
DEBUG: message is [dst: 228.8.8.8:45566, src: 10.20.2.37:33042 (2 headers), size = 0 bytes], headers are {UDP=[channel_name=JTVCORE-Cluster], PING=[PING: type=GET_MBRS_REQ, arg=null]}
2006-11-15 13:03:26,455 (UpHandler (PING)) org.jgroups.protocols.PING.up
DEBUG: discarded my own discovery request
2006-11-15 13:03:27,445 (PingSender) org.jgroups.protocols.PingSender.run
DEBUG: sending GET_MBRS_REQ
2006-11-15 13:03:27,447 (DownHandler (UDP)) org.jgroups.protocols.UDP.down
DEBUG: sending msg to null (src=10.20.2.37:33042), headers are {PING=[PING: type=GET_MBRS_REQ, arg=null], UDP=[channel_name=JTVCORE-Cluster]}
2006-11-15 13:03:27,448 (UDP mcast receiver) org.jgroups.protocols.UDP.receive
DEBUG: received (mcast)57 bytes from 10.20.2.37:33043
2006-11-15 13:03:27,449 (UDP mcast receiver) org.jgroups.protocols.UDP.handleIncomingMessage
DEBUG: message is [dst: 228.8.8.8:45566, src: 10.20.2.37:33042 (2 headers), size = 0 bytes], headers are {UDP=[channel_name=JTVCORE-Cluster], PING=[PING: type=GET_MBRS_REQ, arg=null]}
2006-11-15 13:03:27,450 (UpHandler (PING)) org.jgroups.protocols.PING.up
DEBUG: discarded my own discovery request
2006-11-15 13:03:27,557 (UDP mcast receiver) org.jgroups.protocols.UDP.receive
DEBUG: received (mcast)57 bytes from 10.20.2.38:32789
2006-11-15 13:03:27,558 (UDP mcast receiver) org.jgroups.protocols.UDP.handleIncomingMessage
DEBUG: message is [dst: 228.8.8.8:45566, src: 10.20.2.38:32788 (2 headers), size = 0 bytes], headers are {UDP=[channel_name=JTVCORE-Cluster], PING=[PING: type=GET_MBRS_REQ, arg=null]}
2006-11-15 13:03:27,559 (UpHandler (PING)) org.jgroups.protocols.PING.up
DEBUG: received GET_MBRS_REQ from 10.20.2.38:32788, sending response [PING: type=GET_MBRS_RSP, arg=[own_addr=10.20.2.37:33042, coord_addr=10.20.2.37:33042, is_server=false]]
2006-11-15 13:03:27,560 (DownHandler (UDP)) org.jgroups.protocols.UDP.down
DEBUG: sending msg to 10.20.2.38:32788 (src=10.20.2.37:33042), headers are {PING=[PING: type=GET_MBRS_RSP, arg=[own_addr=10.20.2.37:33042, coord_addr=10.20.2.37:33042, is_server=false]], UDP=[channel_name=JTVCORE-Cluster]}
2006-11-15 13:03:28,445 (PingWaiter) org.jgroups.protocols.PingWaiter.findInitialMembers
DEBUG: initial mbrs are []
2006-11-15 13:03:28,446 (DownHandler (GMS)) org.jgroups.protocols.pbcast.GMS.join
DEBUG: initial_mbrs are []
2006-11-15 13:03:28,447 (DownHandler (GMS)) org.jgroups.protocols.pbcast.GMS.join
DEBUG: no initial members discovered: creating group as first member
2006-11-15 13:03:28,448 (DownHandler (GMS)) org.jgroups.protocols.pbcast.GMS.installView
DEBUG: [local_addr=10.20.2.37:33042] view is [10.20.2.37:33042|0] [10.20.2.37:33042]
2006-11-15 13:03:28,456 (main) org.jboss.cache.TreeCache.startService
INFO : TreeCache local address is 10.20.2.37:33042
2006-11-15 13:03:28,459 (UpHandler (STATE_TRANSFER)) org.jboss.cache.TreeCache.viewAccepted
INFO : viewAccepted(): [10.20.2.37:33042|0] [10.20.2.37:33042]
2006-11-15 13:03:28,463 (DownHandler (STATE_TRANSFER)) org.jgroups.protocols.pbcast.STATE_TRANSFER.down
DEBUG: GET_STATE: first member (no state)
2006-11-15 13:03:28,466 (DownHandler (STABLE)) org.jgroups.protocols.pbcast.STABLE.resetDigest
DEBUG: resetting digest from NAKACK: [10.20.2.37:33042#-1]
2006-11-15 13:03:28,469 (DownHandler (GMS)) org.jgroups.protocols.pbcast.GMS.setImpl
DEBUG: 10.20.2.37:33042 changed role to org.jgroups.protocols.pbcast.CoordGmsImpl
2006-11-15 13:03:28,469 (DownHandler (GMS)) org.jgroups.protocols.pbcast.GMS.becomeSingletonMember
DEBUG: created group (first member). My view is [10.20.2.37:33042|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl
2006-11-15 13:03:28,483 (main) org.jboss.cache.TreeCache.fetchStateOnStartup
INFO : State could not be retrieved (we are the first member in group)
----------------------
configuration
----------------------
jboss:service=Naming
jboss:service=TransactionManager
<!-- Configure the TransactionManager -->
<!-- org.jboss.cache.DummyTransactionManagerLookup-->
com.jtv.core.resource.impl.spring.cache.JBossCacheTransactionManagerLookup
<!--
Node locking scheme:
OPTIMISTIC
PESSIMISTIC (default)
-->
<!-- PESSIMISTIC -->
<!--
Node locking level : SERIALIZABLE
REPEATABLE_READ (default)
READ_COMMITTED
READ_UNCOMMITTED
NONE
-->
REPEATABLE_READ
<!--
Valid modes are LOCAL
REPL_ASYNC
REPL_SYNC
INVALIDATION_ASYNC
INVALIDATION_SYNC
-->
REPL_SYNC
<!-- Name of cluster. Needs to be the same for all clusters, in order
to find each other
-->
JTVCORE-Cluster
<!-- JGroups protocol stack properties. Can also be a URL,
e.g. file:/home/bela/default.xml
-->
<UDP mcast_send_buf_size="320000"
mcast_port="45566"
ucast_recv_buf_size="320000"
mcast_addr="228.8.8.8"
bind_addr="10.20.2.37"
loopback="false"
mcast_recv_buf_size="320000"
max_bundle_size="60000"
max_bundle_timeout="30"
use_incoming_packet_handler="false"
use_outgoing_packet_handler="false"
ucast_send_buf_size="320000"
ip_ttl="32"
enable_bundling="false"/>
<PING timeout="20000"
num_initial_members="2"
up_thread="true"
down_thread="true"/>
<MERGE2 max_interval="10000"
min_interval="5000"/>
<FD timeout="2000"
max_tries="3"
shun="true"
up_thread="true"
down_thread="true"/>
<VERIFY_SUSPECT timeout="1500"/>
<pbcast.NAKACK max_xmit_size="8192"
use_mcast_xmit="false"
gc_lag="50"
retransmit_timeout="600,1200,2400,4800"/>
<!--
- desired_avg_gossip: periodically sends STABLE messages around. 0 disables this
- max_bytes: max number of bytes received from anyone until a STABLE message is sent. Use either this or
desired_avg_gossip, but not both ! 0 disables it.
- stability_delay: range (number of milliseconds) that we wait until sending a STABILITY message.
This prevents STABILITY multicast storms. If max_bytes is used, this should be set to a low value (> 0 though !).
-->
<pbcast.STABLE stability_delay="1000"
desired_avg_gossip="20000"
max_bytes="0"/>
<FRAG frag_size="8192"
down_thread="true"
up_thread="true"/>
<VIEW_SYNC avg_send_interval="20000" down_thread="true" up_thread="true" />
<pbcast.GMS print_local_addr="true"
join_timeout="3000"
join_retry_timeout="2000"
shun="true"/>
<pbcast.STATE_TRANSFER
down_thread="true"
up_thread="true"/>
<!--
Whether or not to fetch state on joining a cluster
NOTE this used to be called FetchStateOnStartup and has been renamed to be more descriptive.
-->
<!-- -->
true
<!-- -->
<!--
The max amount of time (in milliseconds) we wait until the
initial state (ie. the contents of the cache) are retrieved from
existing members in a clustered environment
-->
20000
<!--
Number of milliseconds to wait until all responses for a
synchronous call have been received.
-->
20000
<!-- Max number of milliseconds to wait for a lock acquisition -->
15000
<!-- Name of the eviction policy class. -->
<!--attribute name="EvictionPolicyClass"></attribute-->
<!--
org.jboss.cache.eviction.LRUPolicy
-->
<!-- Specific eviction policy configurations. This is LRU -->
5
<!-- Cache wide default /com/jtv/core-->
5000
0
2000
0
10000
0
<!--
Indicate whether to use region based marshalling or not. Set this to true if you are running under a scoped
class loader, e.g., inside an application server. Default is "false".
-->
false
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3986551#3986551
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=3986551
More information about the jboss-user
mailing list