Hi,
Under load, the replicate() message on a cluster takes too long to appear. Following is
the log snippet:
Note: both the nodes are time synced.
| Node1:
| 2007-05-16 23:37:59,583 DEBUG
[app-nbdvjlljy7o3|10.253.205.12-1503236585|5fbb8cb142088f50] invoking method _put;
id:3(null, /ipunity/mgcpstac
| k/sentCommandCache/48400013, item,
com.ipunity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask@7b24c2, true),
members=[10.253.205.16:53498, 10.25
| 3.205.15:50987], mode=REPL_SYNC, exclude_self=true, timeout=10000
| 2007-05-16 23:37:59,583 DEBUG
[app-nbdvjlljy7o3|10.253.205.12-1503236585|5fbb8cb142088f50] Broadcasting call _put;
id:3(null, /ipunity/mgcpst
| ack/sentCommandCache/48400013, item,
com.ipunity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask@7b24c2, true) to recipient list
null
| 2007-05-16 23:37:59,583 DEBUG
[app-nbdvjlljy7o3|10.253.205.12-1503236585|5fbb8cb142088f50] callRemoteMethods(): valid
members are [10.253.205
| .15:50987] method: _replicate; id:13(_put; id:3(null,
/ipunity/mgcpstack/sentCommandCache/48400013, item, com.ipunity.ri.jain.protocol.ip.mgc
| p.SelfRetransmitTask@7b24c2, true))
| 2007-05-16 23:37:59,583 DEBUG
[app-nbdvjlljy7o3|10.253.205.12-1503236585|5fbb8cb142088f50] Marshalling object
_replicate; id:13(_put; id:3(nu
| ll, /ipunity/mgcpstack/sentCommandCache/48400013, item,
com.ipunity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask@7b24c2, true))
| 2007-05-16 23:37:59,583 DEBUG
[app-nbdvjlljy7o3|10.253.205.12-1503236585|5fbb8cb142088f50] Warning: using object
serialization for class com.
| ipunity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask
|
| Node2:
| 2007-05-16 23:38:12,726 DEBUG [] 10.253.205.15:50987 received call _put; id:3(null,
/ipunity/mgcpstack/sentCommandCache/48400013, item, com.i
| punity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask@1d38cb3, true)
| 2007-05-16 23:38:12,726 DEBUG [] (10.253.205.15:50987) call on method [_put;
id:3(null, /ipunity/mgcpstack/sentCommandCache/48400013, item, c
| om.ipunity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask@1d38cb3, true)]
| 2007-05-16 23:38:12,726 DEBUG [] PessimisticLockInterceptor invoked for method _put;
id:3(null, /ipunity/mgcpstack/sentCommandCache/48400013,
| item, com.ipunity.ri.jain.protocol.ip.mgcp.SelfRetransmitTask@1d38cb3, true)
| 2007-05-16 23:38:12,726 DEBUG [] Attempting to lock node
/ipunity/mgcpstack/sentCommandCache/48400013 for owner Thread[UpHandler (STATE_TRANS
| FER),5,Pooled Threads]
|
Observe that the replicate() messages appeared only after 12 seconds. I have 10 seconds as
replication timeout. Hence i get replication exceptions from the node2 in the cluster. I
would not want to increase the replication timeout because of performance issues in my
application.
My guess is all the jgroups receive threads were busy handling other messages in the
cluster. Is there a way to specify the thread pool size .. or to turn off thread-pooling
so that the messages are handled as they appear?
My treecache.xml is as follows:
| <?xml version="1.0" encoding="UTF-8"?>
|
| <!-- ===================================================================== -->
| <!-- -->
| <!-- Sample TreeCache Service Configuration -->
| <!-- -->
| <!-- ===================================================================== -->
|
| <server>
|
| <classpath codebase="./lib" archives="jboss-cache.jar,
jgroups.jar"/>
|
|
| <!-- ====================================================================
-->
| <!-- Defines TreeCache configuration
-->
| <!-- ====================================================================
-->
|
| <mbean code="org.jboss.cache.TreeCache"
| name="jboss.cache:service=TreeCache">
|
| <depends>jboss:service=Naming</depends>
| <depends>jboss:service=TransactionManager</depends>
|
| <!--
| Configure the TransactionManager
| -->
| <attribute
name="TransactionManagerLookupClass">com.ipunity.common.cache.WeblogicTransactionManagerLookup</attribute>
|
| <!--
| Isolation level : SERIALIZABLE
| REPEATABLE_READ (default)
| READ_COMMITTED
| READ_UNCOMMITTED
| NONE
| -->
| <attribute
name="IsolationLevel">READ_COMMITTED</attribute>
|
| <!--
| Valid modes are LOCAL, REPL_ASYNC and REPL_SYNC
| -->
| <attribute name="CacheMode">REPL_SYNC</attribute>
|
| <!--
| Just used for async repl: use a replication queue
| -->
| <attribute name="UseReplQueue">false</attribute>
|
| <!--
| Replication interval for replication queue (in ms)
| -->
| <attribute name="ReplQueueInterval">0</attribute>
|
| <!--
| Max number of elements which trigger replication
| -->
| <attribute name="ReplQueueMaxElements">0</attribute>
|
| <!-- Name of cluster. Needs to be the same for all clusters, in order
| to find each other
| -->
| <attribute
name="ClusterName">IPUnity-Cluster-2</attribute>
|
| <!-- JGroups protocol stack properties. Can also be a URL,
| e.g. file:/home/bela/default.xml
| <attribute name="ClusterProperties"></attribute>
| -->
|
| <attribute name="ClusterConfig">
| <config>
| <!-- UDP: if you have a multihomed machine,
| set the bind_addr attribute to the appropriate NIC IP address, e.g
bind_addr="192.168.0.2"
| -->
| <!-- UDP: On Windows machines, because of the media sense feature
| being broken with multicast (even after disabling media sense)
| set the loopback attribute to true -->
| <UDP mcast_addr="224.10.10.16"
mcast_port="45568"
| ip_ttl="64" ip_mcast="true"
| mcast_send_buf_size="150000"
mcast_recv_buf_size="80000"
| ucast_send_buf_size="150000"
ucast_recv_buf_size="80000"
| loopback="false"
bind_addr="10.253.205.16"/>
| <PING timeout="2000" num_initial_members="3"
| up_thread="false" down_thread="false"/>
| <MERGE2 min_interval="10000"
max_interval="20000"/>
| <!-- <FD shun="true" up_thread="true"
down_thread="true" />-->
| <FD_SOCK/>
| <VERIFY_SUSPECT timeout="1500"
| up_thread="false" down_thread="false"/>
| <pbcast.NAKACK gc_lag="50"
retransmit_timeout="600,1200,2400,4800"
| max_xmit_size="8192" up_thread="false"
down_thread="false"/>
| <UNICAST timeout="600,1200,2400"
window_size="100" min_threshold="10"
| down_thread="false"/>
| <pbcast.STABLE desired_avg_gossip="20000"
| up_thread="false" down_thread="false"/>
| <FRAG frag_size="8192"
| down_thread="false" up_thread="false"/>
| <pbcast.GMS join_timeout="5000"
join_retry_timeout="2000"
| shun="true" print_local_addr="true"/>
| <pbcast.STATE_TRANSFER up_thread="true"
down_thread="true"/>
| </config>
| </attribute>
|
|
| <!--
| Whether or not to fetch state on joining a cluster
| -->
| <attribute name="FetchStateOnStartup">true</attribute>
|
| <!--
| The max amount of time (in milliseconds) we wait until the
| initial state (ie. the contents of the cache) are retrieved from
| existing members in a clustered environment
| -->
| <attribute
name="InitialStateRetrievalTimeout">5000</attribute>
|
| <!--
| Number of milliseconds to wait until all responses for a
| synchronous call have been received.
| -->
| <attribute name="SyncReplTimeout">10000</attribute>
|
| <!-- Max number of milliseconds to wait for a lock acquisition -->
| <attribute
name="LockAcquisitionTimeout">15000</attribute>
|
| <!-- Name of the eviction policy class. Not supported now. -->
| <attribute name="EvictionPolicyClass"></attribute>
|
| <!--
| <attribute
name="CacheLoaderClass">org.jboss.cache.loader.bdbje.BdbjeCacheLoader</attribute>
| <attribute
name="CacheLoaderConfig">c:\tmp\bdbje</attribute>
| <attribute name="CacheLoaderShared">true</attribute>
| <attribute name="CacheLoaderPreload">/</attribute>
| -->
|
| <!--
| <attribute
name="CacheLoaderClass">org.jboss.cache.loader.FileCacheLoader</attribute>
| <attribute name="CacheLoaderConfig">/tmp</attribute>
| <attribute name="CacheLoaderShared">true</attribute>
| <attribute name="CacheLoaderPreload">/</attribute>
| -->
|
|
| </mbean>
|
|
| <!-- Uncomment to get a graphical view of the TreeCache MBean above -->
| <!-- <mbean code="org.jboss.cache.TreeCacheView"
name="jboss.cache:service=TreeCacheView">-->
| <!-- <depends>jboss.cache:service=TreeCache</depends>-->
| <!-- <attribute
name="CacheService">jboss.cache:service=TreeCache</attribute>-->
| <!-- </mbean>-->
|
|
| </server>
|
Version Details follow:
JBC version - 1.4.1SP3
Application server - Weblogic
Any help would be appreciated.
Regards,
Himadri
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4046298#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...