I switched back to UDP today based on your feedback. Our config resembles
the config below. Like I said we just increased sizes and timeouts. If you
ask me why I tweaked a certain parameter my response would be that it seemed
like a good idea based on the JGroups documentation. UDP seemed a little
more problematic than TCP, not sure why though.
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups file:schema/JGroups-2.8.xsd">
<UDP
mcast_addr="${jgroups.udp.mcast_addr:228.6.7.8}"
mcast_port="${jgroups.udp.mcast_port:46655}"
tos="8"
ucast_recv_buf_size="20000000"
ucast_send_buf_size="640000"
mcast_recv_buf_size="25000000"
mcast_send_buf_size="640000"
loopback="true"
discard_incompatible_packets="true"
max_bundle_size="4000000"
max_bundle_timeout="30"
ip_ttl="${jgroups.udp.ip_ttl:2}"
enable_bundling="true"
enable_diagnostics="false"
thread_naming_pattern="pl"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="30"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="1000"
thread_pool.rejection_policy="Discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="2"
oob_thread_pool.max_threads="30"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="true"
oob_thread_pool.queue_max_size="1000"
oob_thread_pool.rejection_policy="Discard"
/>
<PING timeout="360000" num_initial_members="400"
break_on_coord_rsp="false"/>
<MERGE2 max_interval="30000" min_interval="10000"/>
<FD_SOCK/>
<FD_ALL/>
<BARRIER />
<pbcast.NAKACK use_stats_for_retransmission="false"
exponential_backoff="0"
use_mcast_xmit="true" gc_lag="0"
retransmit_timeout="300,600,1200,2400,3600,4800"
discard_delivered_msgs="true"/>
<UNICAST timeout="300,600,1200,2400,3600,4800"/>
<pbcast.STABLE stability_delay="1000"
desired_avg_gossip="50000"
max_bytes="1000000"/>
<pbcast.GMS print_local_addr="false" join_timeout="60000"
view_bundling="true" use_flush_if_present="false"/>
<UFC max_credits="2000000" min_threshold="0.20"/>
<MFC max_credits="2000000" min_threshold="0.20"/>
<FRAG2 frag_size="2000000" />
<pbcast.STREAMING_STATE_TRANSFER/>
<!-- <pbcast.STATE_TRANSFER/> -->
<pbcast.FLUSH timeout="0"/>
</config>
-----Original Message-----
From: infinispan-dev-bounces(a)lists.jboss.org
[mailto:infinispan-dev-bounces@lists.jboss.org] On Behalf Of Bela Ban
Sent: Saturday, March 19, 2011 1:15 PM
To: infinispan-dev(a)lists.jboss.org
Subject: Re: [infinispan-dev] Infinispan Large Scale support
Hard to believe that TCP would be better, as TCP creates a mesh of
connections; for 400 nodes, with every node sending, you'll have roughly
400*400 connections !
I always had a much better experience with UDP
On 3/19/11 2:37 PM, david marion wrote:
Initially yes, but I think we are getting better stability using TCP. I
switched it
back to TCP yesterday. I can post specifics of what I did in the
TCP configuration, but the short story is I increased a lot of the timeout
values to get it to work.
Dave Marion
> Date: Sat, 19 Mar 2011 10:50:54 +0100
> From: bban(a)redhat.com
> To: infinispan-dev(a)lists.jboss.org
> Subject: Re: [infinispan-dev] Infinispan Large Scale support
>
>
>
> On 3/18/11 10:35 PM, Dave wrote:
>> Won't be able to get CR4 uploaded, policy dictates that I wait until
final
>> release. However, I was able to get 431 nodes up and running
as a
replicated
>> cluster and 115 nodes up as a distributed cluster. For the
430 node
cache, I
>> was able to get it started with no problems about 50% of the
time. When
they
>> formed multiple clusters they merged together only some of
the time. It
>> really does appear to be a startup issue at this point. We have not
pushed
>> it hard enough yet to see what happens at this scale under
load.
>>
>>
>>
>> Any idea when CR4 will be FINAL?
>>
>> Are there any tools to help diagnose problems / performance at this
scale
(I
>> ended up writing my own monitor program)?
>
>
> Yes, there's probe.sh at the JGroups level. I created a JIRA to provide
> a sample for large clusters. You said you based your config on udp.xml,
> correct ?
>
> [1]
https://issues.jboss.org/browse/JGRP-1307
>
> --
> Bela Ban
> Lead JGroups / Clustering Team
> JBoss
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Bela Ban
Lead JGroups / Clustering Team
JBoss
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev