[jboss-user] [Clustering/JBoss] - How to configure JGroups on hosts with redundant network lin

mkrzemien do-not-reply at jboss.com
Fri Aug 31 05:14:02 EDT 2007


In our production environment all hosts have duplicated network links. It is intended to protect from single link failure. Does anyone have any example / best practices how to configure JGroups for proper work in such environment? (So that JGroups works fine despite a single link failure).

We made some prototyping but it failed - details below. 

Thank you in advance.
Kind regards 
Mariusz

Version: JBossCache 1.4.1 SP3, JGroups 2.4.1

Environment: a LAN consisting of two hosts, each host with two NICs (eth0, eth1), the hosts connected directly (eth0-to-eth0, eth1-to-eth1), configured as single IPv4 subnet. JGroups was intended to communicate on both interfaces and to use multicast (see Configuration below)

Test description: 
- both links are connected
- on each node started one instance of JBossCache
- replication working correctly
- disconnected link eth1-to-eth1
- replication working correctly
- reconnected link eth1-to-eth1, disconnected link eth0-to-eth0
- replication working correctly
! after a time (around 5sec) both instances communicate an exception (see below) to one another and break because the exception is not caught

I don't know if it is enough to simply catch the exception. From the top-level I can see that JGroups/JBossCache does have some problem with this configuration.

Configuration details:
                <UDP mcast_addr="228.8.8.8" mcast_port="45566"
                    ip_ttl="64" ip_mcast="true" 
                    mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
                    ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
                    loopback="false"
                    receive_on_all_interfaces="true"
                    send_on_all_interfaces="true"
                    receive_interfaces="eth0,eth1"
                    send_interfaces="eth0,eth1"/>
                <PING timeout="2000" num_initial_members="3"
                    up_thread="false" down_thread="false"/>
                <MERGE2 min_interval="10000" max_interval="20000"/>
                <!--        <FD shun="true" up_thread="true" own_thread="true" />-->
                <FD_SOCK/>
                <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false"/>
                <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800" 
max_xmit_size="8192" up_thread="false" down_thread="false"/>
                <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
                    down_thread="false"/>
                <pbcast.STABLE desired_avg_gossip="20000"
                    up_thread="false" down_thread="false"/>
                <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
                    shun="true" print_local_addr="true"/>
                <FC max_credits="2000000" down_thread="false" up_thread="false"
                    min_threshold="0.20"/>
                <FRAG frag_size="8192" down_thread="false" up_thread="true"/>
                <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>

Logs with exception:
[2007-08-30 15:20:29,796|DEBUG|main; |org.jgroups.blocks.GroupRequest(execute:195)]: call did not execute correctly, request is [GroupRequest:
req_id=1188480009786
caller=10.10.0.2:32781
10.10.0.1:32781: sender=10.10.0.1:32781, retval=null, received=false, suspected=false

request_msg: [dst: , src: 10.10.0.2:32781 (3 headers), size = 34 bytes]
rsp_mode: GET_ALL
done: false
timeout: 20000
expected_mbrs: 0
]
[2007-08-30 15:20:29,796|DEBUG|main; |org.jgroups.blocks.RpcDispatcher(callRemoteMethods:193)]: responses: [sender=10.10.0.1:32781, retval=null, received=false, suspected=false]

[2007-08-30 15:20:29,797|DEBUG|main; |org.jboss.cache.TreeCache(callRemoteMethods:4405)]: (10.10.0.2:32781): responses for method _replicate:
[sender=10.10.0.1:32781, retval=null, received=false, suspected=false]

[2007-08-30 15:20:29,798|DEBUG|main; |org.jboss.cache.interceptors.BaseRpcInterceptor(replicateCall:118)]: responses=[org.jboss.cache.ReplicationException: rsp=sender=10.10.0.1:32781, retval=null, received=false, suspected=false]
[2007-08-30 15:20:29,800|DEBUG|main; |org.jboss.cache.interceptors.BaseRpcInterceptor(checkResponses:79)]: Received Throwable from remote node
org.jboss.cache.ReplicationException: rsp=sender=10.10.0.1:32781, retval=null, received=false, suspected=false
        at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4422)
        at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4344)
        at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4455)
        at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:110)
        at org.jboss.cache.interceptors.BaseRpcInterceptor.replicateCall(BaseRpcInterceptor.java:88)
        at org.jboss.cache.interceptors.ReplicationInterceptor.handleReplicatedMethod(ReplicationInterceptor.java:124)
        at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:88)
        at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
        at org.jboss.cache.interceptors.TxInterceptor.handleNonTxMethod(TxInterceptor.java:365)
        at org.jboss.cache.interceptors.TxInterceptor.invoke(TxInterceptor.java:160)
        at org.jboss.cache.interceptors.Interceptor.invoke(Interceptor.java:68)
        at org.jboss.cache.interceptors.CacheMgmtInterceptor.invoke(CacheMgmtInterceptor.java:183)
        at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:5863)
        at org.jboss.cache.TreeCache.remove(TreeCache.java:3929)
        at org.jboss.cache.TreeCache.remove(TreeCache.java:3915)
        at test.jbcache.DistributedTree.remove(DistributedTree.java:41)
        at test.jbcache.DistributedTest.handleSession(DistributedTest.java:46)
        at test.jbcache.DistributedTest.main(DistributedTest.java:78)
Caused by: org.jboss.cache.lock.TimeoutException: Response timed out: sender=10.10.0.1:32781, retval=null, received=false, suspected=false
        at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:4420)
        ... 17 more

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4079887#4079887

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4079887



More information about the jboss-user mailing list