[infinispan-dev] again: "no physical address"

Bela Ban bban at redhat.com
Thu Feb 2 01:53:54 EST 2012


I can also reproduce it by now, in JGroups: I simply create 12 members 
in a loop...

Don't need the bombastic Transactional test

Looking into it.

On 2/2/12 7:46 AM, Dan Berindei wrote:
> Hi Bela
>
> I think I found why you weren't seeing the warnings. The
> bench-log4j.xml in github master is configured to log only to the log
> file (benchmark.log). If you add an<appender-ref ref="CONSOLE"/>
> you'll see the warnings on the console as well.
>
> I am now able to reproduce it pretty reliably, even from IntelliJ. I
> created a run configuration for the Transactional class, I set the
> module to infinispan-5.1-SNAPSHOT, and I added these JVM arguments:
>
> -Dbench.loops=3000 -Dbench.writerThreads=33 -Dbench.readerThreads=10
> -Dbench.dist=true -Dbench.transactional=true -Dbench.numkeys=500
> -Dbench.payloadsize=10240 -Dbench.nodes=12 -Djava.net.preferIPv4Stack
> -Dlog4j.configuration=bench-log4j.xml
>
>
> In Sanne's thread dump it was a ClusteredGetCommand that was waiting
> for a physical address, but in my test runs it's usually a JOIN that's
> failing. And it seems that it happens because we got a discovery
> response from 3 nodes (that is our initial num_initial_members), but
> none of them was the coordinator:
>
> 2012-02-02 08:04:41,672 TRACE [org.jgroups.protocols.PING] (main)
> discovery took 8 ms: responses: 3 total (2 servers (0 coord), 1
> clients)
> 2012-02-02 08:04:41,672 TRACE [org.jgroups.protocols.pbcast.GMS]
> (main) localhost-47135: initial_mbrs are localhost-36539
> localhost-30921
> 2012-02-02 08:04:41,672 DEBUG [org.jgroups.protocols.pbcast.GMS]
> (main) election results: {localhost-32436=2}
> 2012-02-02 08:04:41,672 DEBUG [org.jgroups.protocols.pbcast.GMS]
> (main) sending JOIN(localhost-47135) to localhost-32436
> 2012-02-02 08:04:41,673 TRACE [org.jgroups.protocols.UNICAST2] (main)
> localhost-47135: created connection to localhost-32436 (conn_id=0)
> 2012-02-02 08:04:41,673 TRACE [org.jgroups.protocols.UNICAST2] (main)
> localhost-47135 -->  DATA(localhost-32436: #1, conn_id=0, first)
> 2012-02-02 08:04:41,673 TRACE [org.jgroups.protocols.UDP] (main)
> sending msg to localhost-32436, src=localhost-47135, headers are GMS:
> GmsHeader[JOIN_REQ]: mbr=localhost-47135, UNICAST2: DATA, seqno=1,
> first, UDP: [channel_name=ISPN]
> 2012-02-02 08:04:41,673 WARN  [org.jgroups.protocols.UDP] (main)
> localhost-47135: no physical address for localhost-32436, dropping
> message
>
>
> Cheers
> Dan
>
>
> On Wed, Feb 1, 2012 at 7:12 PM, Sanne Grinovero<sanne at infinispan.org>  wrote:
>> On 1 February 2012 16:40, Bela Ban<bban at redhat.com>  wrote:
>>> Your benchmark is giving me the creeps !
>>
>> Manik was the original author, I've only been adapting it slightly to
>> identify performance issues. I wouldn't have used Maven either, but
>> it's serving me well especially since it turns out I have to
>> frequently change JGroups version ;-)
>>
>>> First, which version of JGroups / Infinispan does this use ? Second, is
>>> there a way to start this in an IDE rather than through maven ? Third, I
>>> don't think bench-jgroups.xml is picked up at all ! How do I make a change
>>> to bench-jgroups.xml and have Transactional *use* it ?
>>
>> 1) The intention is to compare Infinispan versions, and see how we
>> make progress, so the JGroups version is defined as each different
>> Infinispan version was released with, or in case of a SNAPSHOT the
>> version it's currently using.
>>
>> So, assuming you have a fresh snapshot of Infinispan 5.1.0-SNAPSHOT,
>> it's JGroups 3.0.4.Final
>>
>> 2) As I pointed out in the previous mail, I've reconfigured it for you
>> to not pick up the bench-jgroups.xml but go with Infinispan's default
>> UDP configuration.
>> You have to comment line 15 of bench.sh to use the bench-jgroups.xml,
>> but then I can't reproduce the issue anymore.
>>
>>> This maven crap hides so much, I never know what's going on under the covers
>>> !@$@ Do I have to do mvn install or mvn package when I make a change to
>>> bench-jgroups.xml ?
>>
>> Aren't you using bench.sh as I pointed out in the script in my
>> previous mail? It does mvn install before running it.
>> But it's also configured to NOT use bench-jgroups.xml, but rather
>> jgroups-udp.xml.
>>
>>> I don't think I'll ever switch to this f*cking piece of shit !
>>
>> I don't think Maven is to be blamed today! What's wrong?
>>
>> Anyway one of the nice things of this little benchmark is exactly that
>> it it's a single class with a main file, so you can just import it in
>> you IDE and run. Any IDE will pick the correct dependencies, thanks to
>> Maven. Just that if you do, it will use the default test properties as
>> hardcoded in the test class org.infinispan.benchmark.Transactional:
>> please set the same environment variable as bench.sh does, unless you
>> don't want to run my same configuration.
>>
>> Just ping me for any doubt.
>>
>> Cheers,
>> Sanne
>>
>>>
>>> Please ping when as soon as you've calmed down... :-)
>>> Cheers,
>>>
>>>
>>>
>>>
>>> On 2/1/12 4:52 PM, Sanne Grinovero wrote:
>>>>
>>>> On 1 February 2012 15:18, Bela Ban<bban at redhat.com>    wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2/1/12 10:25 AM, Dan Berindei wrote:
>>>>>
>>>>>>> That's not the way it works; at startup of F, it sends its IP address
>>>>>>> with the discovery request. Everybody returns its IP address with the
>>>>>>> discovery response, so even though we have F only talking to A (the
>>>>>>> coordinator) initially, F will also know the IP addresses of A,B,C,D
>>>>>>> and E.
>>>>>>>
>>>>>>
>>>>>> Ok, I stand corrected... since we start all the nodes on the same
>>>>>> thread, each of them should reply to the discovery request of the next
>>>>>> nodes.
>>>>>
>>>>>
>>>>>
>>>>> Hmm, can you reproduce this every time ? If so, can you send me the
>>>>> program so I can run it here ?
>>>>>
>>>>>
>>>>>> However, num_initial_members was set to 3 (the Infinispan default).
>>>>>> Could that make PING not wait for all the responses? If it's like
>>>>>> that, then I suggest we set a (much) higher num_initial_members and a
>>>>>> lower timeout in the default configuration.
>>>>>
>>>>>
>>>>>
>>>>> Yes, the discovery could return quickly, but the responses would even be
>>>>> processed if they were received later, so I don't think that's the issue.
>>>>>
>>>>> The initial discovery should discover *all* IP addresses, later
>>>>> triggering a discovery because an IP address wasn't found should always
>>>>> be the exceptional case !
>>>>>
>>>>> If you start members in turn, then they should easily form a cluster and
>>>>> not even merge. Here's what can happen on a merge:
>>>>> - The view is A|1={A,B}, both A and B have IP addresses for A and B
>>>>> - The view splits into A|2={A} and B|2={B}
>>>>> - A now marks B's IP address as removable and B marks A's IP address as
>>>>> removable
>>>>> - If the cache grows to over 500 entries
>>>>> (TP.logical_addr_cache_max_size) or TP.logical_addr_cache_expiration
>>>>> milliseconds elapse (whichever comes first), the entries marked as
>>>>> removable are removed
>>>>> - If, *before that* the merge view A|3={A,B} is installed, A unmarks B
>>>>> and B unmarks A, so the entries won't get removed
>>>>>
>>>>> So a hypothesis of how those IP addresses get removed could be that the
>>>>> cluster had a couple of merges, that didn't heal for 2 minutes (?) hard
>>>>> to believe though...
>>>>>
>>>>> We have to get to the bottom of this, so it would be great if you had a
>>>>> program that reproduced this, that I could send myself. The main
>>>>> question is why the IP address for the target is gone and/or why the IP
>>>>> address wasn't received in the first place.
>>>>>
>>>>> In any case, replacing MERGE2 with MERGE3 might help a bit, as MERGE3
>>>>> [1] periodically broadcasts IP address/logical name and logical address:
>>>>> "An INFO message carries the logical name and physical address of a
>>>>> member. Compared to MERGE2, this allows us to immediately send messages
>>>>> to newly merged members, and not have to solicit this information first.
>>>>> " (copied from the documentation)
>>>>>
>>>>>
>>>>>
>>>>>>>> Â  Note that everything is blocked at this point, we
>>>>>>>>
>>>>>>>> won't send another message in the entire cluster until we got the
>>>>>>>> physical address.
>>>>>
>>>>>
>>>>>
>>>>> Understood. Let me see if I can block sending of the message for a max
>>>>> time (say 2 seconds) until I get the IP address. Not very nice, and I
>>>>> prefer a different approach (plus we need to see why this happens in the
>>>>> first place anyway)...
>>>>>
>>>>>
>>>>>>> As I said; this is an exceptional case, probably caused by Sanne
>>>>>>> starting 12 channels inside the same JVM, at the same time, therefore
>>>>>>> causing a traffic spike, which results in dropped discovery requests or
>>>>>>> responses.
>>>>>>>
>>>>>>
>>>>>> Bela, we create the caches on a single thread, so we never have more
>>>>>> than one node joining at the same time.
>>>>>> At most we could have some extra activity if one node can't join the
>>>>>> existing cluster and starts a separate partition, but hardly enough to
>>>>>> cause congestion.
>>>>>
>>>>>
>>>>>
>>>>> Hmm, does indeed not sound like an issue...
>>>>>
>>>>>
>>>>>>> After than, when F wants to talk to C, it asks the cluster for C's IP
>>>>>>> address, and that should be a few ms at most.
>>>>>>>
>>>>>>
>>>>>> Ok, so when F wanted to send the ClusteredGetCommand request to C,
>>>>>> PING got the physical address right away. But the ClusteredGetCommand
>>>>>> had to wait for STABLE to kick in and for C to ask for retransmission
>>>>>> (because we didn't send any other messages).
>>>>>
>>>>>
>>>>>
>>>>> Yep. Before I implement some blocking until we have the IP address, or a
>>>>> timeout elapses, I'd like to try to get to the bottom of this problem
>>>>> first !
>>>>>
>>>>>
>>>>>> Maybe *we* should use RSVP for our ClusteredGetCommands, since those
>>>>>> can never block... Actually, we don't want to retransmit the request
>>>>>> if we already got a response from another node, so it would be best if
>>>>>> we could ask for retransmission of a particular request explicitly ;-)
>>>>>
>>>>>
>>>>>
>>>>> I'd rather implement the blocking approach above ! :-)
>>>>>
>>>>>
>>>>>> I wonder if we could also decrease desired_avg_gossip and
>>>>>> stability_delay in STABLE. After all, an extra STABLE round can't slow
>>>>>> us when we're not doing anything, and when we are busy we're going to
>>>>>> hit the max_bytes limit much sooner than the desired_avg_gossip time
>>>>>> limit anyway.
>>>>>
>>>>>
>>>>>
>>>>> I don't think this is a good idea as it will generate more traffic. The
>>>>> stable task is not skipped when we have a lot of traffic, so this will
>>>>> compound the issue.
>>>>>
>>>>>
>>>>>>>> I'm also not sure what to make of these lines:
>>>>>>>>
>>>>>>>>>>> [org.jgroups.protocols.UDP] sanne-55119: no physical address for
>>>>>>>>>>> sanne-53650, dropping message
>>>>>>>>>>> [org.jgroups.protocols.pbcast.GMS] JOIN(sanne-55119) sent to
>>>>>>>>>>> sanne-53650 timed out (after 3000 ms), retrying
>>>>>>>>
>>>>>>>>
>>>>>>>> It appears that sanne-55119 knows the logical name of sanne-53650, and
>>>>>>>> the fact that it's coordinator, but not its physical address.
>>>>>>>> Shouldn't all of this information have arrived at the same time?
>>>>>>>
>>>>>>>
>>>>>>> Hmm, correct. However, the logical names are kept in (a static)
>>>>>>> UUID.cache and the IP addresses in TP.logical_addr_cache.
>>>>>>>
>>>>>>
>>>>>> Ah, so if we have 12 nodes in the same VM they automatically know each
>>>>>> other's logical name - they don't need PING at all!
>>>>>
>>>>>
>>>>>
>>>>> Yes. Note that logical names are not the problem; even if we evict some
>>>>> logical name from the cache (and we do this only for removed members),
>>>>> JGroups will still work as it only needs UUIDs and IP addresses.
>>>>>
>>>>>
>>>>>> Does the logical cache get cleared on channel stop? I think that would
>>>>>> explain another weird thing I was seeing in the test suite logs,
>>>>>> sometimes everyone in a cluster would suddenly forget everyone else's
>>>>>> logical name and start logging UUIDs.
>>>>>
>>>>>
>>>>>
>>>>> On a view change, we remove all entries which are *not* in the new view.
>>>>> However, 'removing' is again simply marking those members as
>>>>> 'removable', and only if the cache grows beyond 500
>>>>> (-Djgroups.uuid_cache.max_entries=500) entries will all entries older
>>>>> than 5 seconds (-Djgroups.uuid_cache.max_age=5000) be removed. (There is
>>>>> no separate reaper task running for this).
>>>>>
>>>>> So, yes, this can happen, but on the next discovery round, we'll have
>>>>> the correct values. Again, as I said, UUID.cache is not as important as
>>>>> TP.logical_addr_cache.
>>>>>
>>>>>
>>>>>> This is running the Transactional benchmark, so it would be simpler if
>>>>>> we enabled PING trace in the configuration and disabled it before the
>>>>>> actual benchmark starts. I'm going to try it myself :)
>>>>>
>>>>>
>>>>>
>>>>> How do you run 12 instances ? Did you change something in the config ?
>>>>> I'd be interested in trying the *exact* same config you're running, to
>>>>> see what's going on !
>>>>
>>>>
>>>> Pushed it for you, committing the exact configuration changes as well:
>>>>
>>>> git clone git://github.com/Sanne/InfinispanStartupBenchmark.git
>>>> cd InfinispanStartupBenchmark
>>>> git co my
>>>> sh bench.sh
>>>>
>>>> If you look into bench.sh
>>>> (as it is at
>>>> https://github.com/Sanne/InfinispanStartupBenchmark/blob/my/bench.sh
>>>> )
>>>> the lines 9, 15, 28 should be the most interesting.
>>>>
>>>> You need to run with Infinispan's default configuration to reproduce the
>>>> issue,
>>>> but I wouldn't mind you commenting on
>>>>
>>>> https://github.com/Sanne/InfinispanStartupBenchmark/blob/my/benchmark/src/main/resources/bench-jgroups.xml
>>>> as well. That's what I'm now using as default for further performance
>>>> tests.
>>>>
>>>> Cheers,
>>>> Sanne
>>>>
>>>>>
>>>>>
>>>>> [1] http://www.jgroups.org/manual-3.x/html/protlist.html#MERGE3
>>>>>
>>>>> --
>>>>> Bela Ban
>>>>> Lead JGroups (http://www.jgroups.org)
>>>>> JBoss / Red Hat
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>> --
>>> Bela Ban
>>> Lead JGroups (http://www.jgroups.org)
>>> JBoss / Red Hat
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat


More information about the infinispan-dev mailing list