[infinispan-dev] Infinispan Benchmarks

Thu Nov 19 02:31:22 EST 2009

Can you do one thing to exclude JGroups from the equation ?

Run the Infinispan nodes in LOCAL (not DIST- or REPL-) mode. Check your 
access times.

If you run Infinispan in XXX-SYNC mode, you incur the round trip times 
for replication / distribution on top of your client access times.

Do you run the *same* images on EC2 as your local hosts ? E.g. TCP 
settings like buffer sizes, nagling on/off etc (net.core.xxx) settings 
might be different on EC2 than local.

I'm not sure what network speed EC2 guarantees (if any!), what you could 
do is measure the raw speed available between 2 EC2 nodes with iperf.

Note that you *share* bandwidth between nodes with other virtualized 
instances, so replication / distribution *may* be the limiting factor !

Michael Lawson (mshindo) wrote:
> We have rejected the possibility of the problem being related to JGroups, as
> when running then same configuration locally (not on the amazon e2).
>
> *Let me outline the testing more specifically:*
>
> I have created a very simple socket client and server to communicate with
> infinispan nodes. This provides a mechanism to connect, send get and insert
> commands coupled with the required data to the targeted infinispan nodes.
> These insertions and retrievals are then timed from the client. As it stands
> this system works perfectly in a local environment on my own network.
> However as soon we attempt to test on the amazon e2 cloud, which is required
> for benchmarking against other products, the retrieval times jump from under
> a millisecond to around 160ms dependent on the value size number of nodes in
> the cluster.
>
> The reason we are testing using this client -> server model is that we are
> also testing concurrency, to see what happens when we send thousands of
> requests from different sources.
>
> I have used TCPPing both locally and on the amazon cloud (as multi-casting
> is not allowed in this environment), and the results are exactly the same.
> Perfect numbers locally, bad numbers remotely. This is proving to be quite a
> mystery.
>
> I have uploaded my client and server code online base code:
> http://pastebin.org/54960.
>
> Any clues ?
>
> On Wed, Nov 18, 2009 at 4:34 PM, Michael Lawson (mshindo) <
> michael at sphinix.com> wrote:
>
>   
>> Are there any official socket clients available?
>>
>>
>> On Tue, Nov 17, 2009 at 11:40 PM, Manik Surtani <manik at jboss.org> wrote:
>>
>>     
>>> On 17 Nov 2009, at 04:54, Michael Lawson (mshindo) wrote:
>>>
>>> The benchmarking in question is simple insertions and retrievals run via
>>> sockets, these benchmarks return better results when run on a local machine,
>>> however the testing in question is being done on the Amazon E2 cloud.
>>> Running on the E2 was a problem in itself, but I followed the instructions
>>> on a blog and used an xml file to configure the transport properties.
>>>
>>> <config xmlns="urn:org:jgroups" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="urn:org:jgroups file:schema/JGroups-2.8.xsd">
>>>
>>>
>>>
>>>
>>>
>>>         <TCP bind_port="7800" />
>>>         <TCPPING timeout="3000"
>>>                  initial_hosts="${jgroups.tcpping.initial_hosts:10.209.166.79[7800],10.209.198.176[7800],10.208.199.223[7800],10.208.190.224[7800],10.208.70.112[7800]}"
>>>
>>>
>>>
>>>
>>>
>>>                 port_range="1"
>>>                 num_initial_members="3"/>
>>>          <MERGE2 max_interval="30000"  min_interval="10000"/>
>>>          <FD_SOCK/>
>>>
>>>
>>>
>>>
>>>
>>>          <FD timeout="10000" max_tries="5" />
>>>          <VERIFY_SUSPECT timeout="1500"  />
>>>         <pbcast.NAKACK
>>>                  use_mcast_xmit="false" gc_lag="0"
>>>
>>>
>>>
>>>
>>>
>>>                  retransmit_timeout="300,600,1200,2400,4800"
>>>                 discard_delivered_msgs="true"/>
>>>         <UNICAST timeout="300,600,1200" />
>>>         <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"  max_bytes="400000"/>
>>>
>>>
>>>
>>>
>>>
>>>          <pbcast.GMS print_local_addr="true" join_timeout="3000"   view_bundling="true"/>
>>>         <FC max_credits="2000000"  min_threshold="0.10"/>
>>>         <FRAG2 frag_size="60000"  />
>>>
>>>
>>>
>>>
>>>
>>>         <pbcast.STREAMING_STATE_TRANSFER/>
>>> </config>
>>>
>>> I have a theory, that perhaps the introduction of TCPPING in the jgroups
>>> file is resulting in some form of polling before the actual get request is
>>> processed and returned. Could this be the case ?
>>>
>>>
>>> It could be - JGroups also has an experimental protocol called S3_PING
>>> which could help.
>>>
>>>
>>> http://javagroups.cvs.sourceforge.net/viewvc/javagroups/JGroups/src/org/jgroups/protocols/S3_PING.java?revision=1.2&view=markup
>>>
>>> Another approach for discovery in an EC2 environment is to use a
>>> GossipRouter, but I'd give S3_PING a try first.
>>>
>>> Cheers
>>> Manik
>>>
>>>
>>>
>>>
>>> On Tue, Nov 17, 2009 at 12:03 AM, Manik Surtani <manik at jboss.org> wrote:
>>>
>>>       
>>>> Hi Michael
>>>>
>>>> Could you please detail your benchmark test a bit more?  We have done
>>>> some internal benchmarks as well and things do look significantly different.
>>>>  Could you also tell us which version you have been benchmarking?  We've
>>>> made some significant changes to DIST between CR1 and CR2 with regards to
>>>> performance.
>>>>
>>>> FYI, we use the CacheBenchFwk [1] to help benchmark stuff; you may find
>>>> this useful too.
>>>>
>>>> Cheers
>>>> Manik
>>>>
>>>> [1] http://cachebenchfwk.sourceforge.net
>>>>
>>>>
>>>> On 15 Nov 2009, at 22:00, Michael Lawson (mshindo) wrote:
>>>>
>>>>         
>>>>> Hi,
>>>>> I have been performing some benchmark testing on Infinispan Running in
>>>>>           
>>>> Distributed mode, with some unexpected results.
>>>>         
>>>>> For an insertion with a Key size of 100 Bytes, and Value size 100
>>>>>           
>>>> Bytes, the insertion time was 0.13ms and retrieval was 128.06ms.
>>>>         
>>>>> Communication with the infinispan nodes is being done via a socket
>>>>>           
>>>> interface, using standard java serialization.
>>>>         
>>>>> The retrieval time is consistently high in comparison to other systems,
>>>>>           
>>>> and I am wondering whether there are some other benchmark reports floating
>>>> around that I can compare results with.
>>>>         
>>>>> --
>>>>> Michael Lawson
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>           
>>>> --
>>>> Manik Surtani
>>>> manik at jboss.org
>>>> Lead, Infinispan
>>>> Lead, JBoss Cache
>>>> http://www.infinispan.org
>>>> http://www.jbosscache.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>>         
>>>
>>> --
>>> Michael Lawson
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>> --
>>> Manik Surtani
>>> manik at jboss.org
>>> Lead, Infinispan
>>> Lead, JBoss Cache
>>> http://www.infinispan.org
>>> http://www.jbosscache.org
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>       
>>
>> --
>> Michael Lawson (mshindo)
>>
>>
>>     
>
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Bela Ban
Lead JGroups / Clustering Team
JBoss