[infinispan-dev] again: "no physical address"

Wed Feb 1 02:48:27 EST 2012

On 1/31/12 10:55 PM, Dan Berindei wrote:
> Hi Bela
>
> I guess it's pretty clear now... In Sanne's thread dump the main
> thread is blocked in a cache.put() call after the cluster has
> supposedly already formed:
>
> "org.infinispan.benchmark.Transactional.main()" prio=10
> tid=0x00007ff4045de000 nid=0x7c92 in Object.wait()
> [0x00007ff40919d000]
>     java.lang.Thread.State: TIMED_WAITING (on object monitor)
>          at java.lang.Object.wait(Native Method)
>          - waiting on<0x00000007f61997d0>  (a
> org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$FutureCollator)
>          at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$FutureCollator.getResponseList(CommandAwareRpcDispatcher.java:372)
>          ...
>          at org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(DistributionManagerImpl.java:169)
>          ...
>          at org.infinispan.CacheSupport.put(CacheSupport.java:52)
>          at org.infinispan.benchmark.Transactional.start(Transactional.java:110)
>          at org.infinispan.benchmark.Transactional.main(Transactional.java:70)
>
> State transfer was disabled, so during the cluster startup the nodes
> only had to communicate with the coordinator and not between them. The
> put command had to get the old value from another node, so it needed
> the physical address and had to block until PING would retrieve it.

That's not the way it works; at startup of F, it sends its IP address 
with the discovery request. Everybody returns its IP address with the 
discovery response, so even though we have F only talking to A (the 
coordinator) initially, F will also know the IP addresses of A,B,C,D and E.

> Does PING use RSVP

No: (1) I don;'t want a dependency of Discovery on RSVP and (2) the 
discovery is unreliable; discovery requests or responses can get dropped.

> or does it wait for the normal STABLE timeout for retransmission?

>  Note that everything is blocked at this point, we
> won't send another message in the entire cluster until we got the physical address.

As I said; this is an exceptional case, probably caused by Sanne 
starting 12 channels inside the same JVM, at the same time, therefore 
causing a traffic spike, which results in dropped discovery requests or 
responses.

After than, when F wants to talk to C, it asks the cluster for C's IP 
address, and that should be a few ms at most.

> I'm sure you've already considered it before, but why not make the
> physical addresses a part of the view installation message? This
> should ensure that every node can communicate with every other node by
> the time the view is installed.

There's a few reasons:

- I don't want to make GMS dependent on logical addresses. GMS is 
completely independent and shouldn't know about physical addresses
- At the time GMS kicks in, it's already too late. Remember, F needs to 
send a unicast JOIN request to A, but at this point it doesn't yet know 
A's address
- MERGE{2,3} also use discovery to detect sub-partitions to be merged, 
so discovery needs to be a separate piece of functionality
- A View is already big as it is, and I've managed to reduce its size 
even more, but adding physical addresses would blow up the size of View 
even more, especially in large clusters

> I'm also not sure what to make of these lines:
>
>>>> [org.jgroups.protocols.UDP] sanne-55119: no physical address for
>>>> sanne-53650, dropping message
>>>> [org.jgroups.protocols.pbcast.GMS] JOIN(sanne-55119) sent to
>>>> sanne-53650 timed out (after 3000 ms), retrying
>
> It appears that sanne-55119 knows the logical name of sanne-53650, and
> the fact that it's coordinator, but not its physical address.
> Shouldn't all of this information have arrived at the same time?

Hmm, correct. However, the logical names are kept in (a static) 
UUID.cache and the IP addresses in TP.logical_addr_cache.

I suggest to do the following when this happens (can you reproduce this ?):
- Before: set enable_diagnostics=true in UDP
- probe.sh op=UDP.printLogicalAddressCache // you can replace probe.sh 
with java -jar jgroups.jar org.jgroups.tests.Probe

Here you can dump the logical caches, to see whether this information is 
absent.

You could also enable tracing for PING:
probe.sh op=PING.setLevel["trace"]

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat