[infinispan-dev] Infinispan Large Scale support

Tue Mar 22 11:02:35 EDT 2011

Just cc both lists so that you don't have to repeat the email and we can get everyone's feedback.

On Mar 22, 2011, at 8:44 AM, Bela Ban wrote:

> I cross-posted this to the JGroups mailing lists [1]
> 
> 
> [1] https://sourceforge.net/mail/?group_id=6081
> 
> 
> On 3/22/11 2:05 AM, Dave wrote:
>> I switched back to UDP today based on your feedback. Our config resembles
>> the config below. Like I said we just increased sizes and timeouts. If you
>> ask me why I tweaked a certain parameter my response would be that it seemed
>> like a good idea based on the JGroups documentation. UDP seemed a little
>> more problematic than TCP, not sure why though.
>> 
>> <config xmlns="urn:org:jgroups"
>>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>         xsi:schemaLocation="urn:org:jgroups file:schema/JGroups-2.8.xsd">
>>    <UDP
>>          mcast_addr="${jgroups.udp.mcast_addr:228.6.7.8}"
>>          mcast_port="${jgroups.udp.mcast_port:46655}"
>>          tos="8"
>>          ucast_recv_buf_size="20000000"
>>          ucast_send_buf_size="640000"
>>          mcast_recv_buf_size="25000000"
>>          mcast_send_buf_size="640000"
>>          loopback="true"
>>          discard_incompatible_packets="true"
>>          max_bundle_size="4000000"
>>          max_bundle_timeout="30"
>>          ip_ttl="${jgroups.udp.ip_ttl:2}"
>>          enable_bundling="true"
>>          enable_diagnostics="false"
>> 
>>          thread_naming_pattern="pl"
>> 
>>          thread_pool.enabled="true"
>>          thread_pool.min_threads="2"
>>          thread_pool.max_threads="30"
>>          thread_pool.keep_alive_time="5000"
>>          thread_pool.queue_enabled="true"
>>          thread_pool.queue_max_size="1000"
>>          thread_pool.rejection_policy="Discard"
>> 
>>          oob_thread_pool.enabled="true"
>>          oob_thread_pool.min_threads="2"
>>          oob_thread_pool.max_threads="30"
>>          oob_thread_pool.keep_alive_time="5000"
>>          oob_thread_pool.queue_enabled="true"
>>          oob_thread_pool.queue_max_size="1000"
>>          oob_thread_pool.rejection_policy="Discard"
>>          />
>> 
>>    <PING timeout="360000" num_initial_members="400"
>> break_on_coord_rsp="false"/>
>>    <MERGE2 max_interval="30000" min_interval="10000"/>
>>    <FD_SOCK/>
>>    <FD_ALL/>
>>    <BARRIER />
>>    <pbcast.NAKACK use_stats_for_retransmission="false"
>>                    exponential_backoff="0"
>>                    use_mcast_xmit="true" gc_lag="0"
>>                    retransmit_timeout="300,600,1200,2400,3600,4800"
>>                    discard_delivered_msgs="true"/>
>>    <UNICAST timeout="300,600,1200,2400,3600,4800"/>
>>    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
>> max_bytes="1000000"/>
>>    <pbcast.GMS print_local_addr="false" join_timeout="60000"
>> view_bundling="true" use_flush_if_present="false"/>
>>    <UFC max_credits="2000000" min_threshold="0.20"/>
>>    <MFC max_credits="2000000" min_threshold="0.20"/>
>>    <FRAG2 frag_size="2000000"  />
>>    <pbcast.STREAMING_STATE_TRANSFER/>
>>    <!--<pbcast.STATE_TRANSFER/>  -->
>>    <pbcast.FLUSH timeout="0"/>
>> </config>
>> 
>> 
>> -----Original Message-----
>> From: infinispan-dev-bounces at lists.jboss.org
>> [mailto:infinispan-dev-bounces at lists.jboss.org] On Behalf Of Bela Ban
>> Sent: Saturday, March 19, 2011 1:15 PM
>> To: infinispan-dev at lists.jboss.org
>> Subject: Re: [infinispan-dev] Infinispan Large Scale support
>> 
>> Hard to believe that TCP would be better, as TCP creates a mesh of
>> connections; for 400 nodes, with every node sending, you'll have roughly
>> 400*400 connections !
>> 
>> I always had a much better experience with UDP
>> 
>> On 3/19/11 2:37 PM, david marion wrote:
>>> 
>>> Initially yes, but I think we are getting better stability using TCP. I
>> switched it back to TCP yesterday. I can post specifics of what I did in the
>> TCP configuration, but the short story is I increased a lot of the timeout
>> values to get it to work.
>>> 
>>> Dave Marion
>>> 
>>> 
>>>> Date: Sat, 19 Mar 2011 10:50:54 +0100
>>>> From: bban at redhat.com
>>>> To: infinispan-dev at lists.jboss.org
>>>> Subject: Re: [infinispan-dev] Infinispan Large Scale support
>>>> 
>>>> 
>>>> 
>>>> On 3/18/11 10:35 PM, Dave wrote:
>>>>> Won't be able to get CR4 uploaded, policy dictates that I wait until
>> final
>>>>> release. However, I was able to get 431 nodes up and running as a
>> replicated
>>>>> cluster and 115 nodes up as a distributed cluster. For the 430 node
>> cache, I
>>>>> was able to get it started with no problems about 50% of the time. When
>> they
>>>>> formed multiple clusters they merged together only some of the time. It
>>>>> really does appear to be a startup issue at this point. We have not
>> pushed
>>>>> it hard enough yet to see what happens at this scale under load.
>>>>> 
>>>>> 
>>>>> 
>>>>> Any idea when CR4 will be FINAL?
>>>>> 
>>>>> Are there any tools to help diagnose problems / performance at this
>> scale (I
>>>>> ended up writing my own monitor program)?
>>>> 
>>>> 
>>>> Yes, there's probe.sh at the JGroups level. I created a JIRA to provide
>>>> a sample for large clusters. You said you based your config on udp.xml,
>>>> correct ?
>>>> 
>>>> [1] https://issues.jboss.org/browse/JGRP-1307
>>>> 
>>>> --
>>>> Bela Ban
>>>> Lead JGroups / Clustering Team
>>>> JBoss
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>   		 	   		
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
> 
> -- 
> Bela Ban
> Lead JGroups / Clustering Team
> JBoss
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache