On 1/31/12 10:55 PM, Dan Berindei wrote:
Hi Bela
I guess it's pretty clear now... In Sanne's thread dump the main
thread is blocked in a cache.put() call after the cluster has
supposedly already formed:
"org.infinispan.benchmark.Transactional.main()" prio=10
tid=0x00007ff4045de000 nid=0x7c92 in Object.wait()
[0x00007ff40919d000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on<0x00000007f61997d0> (a
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$FutureCollator)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$FutureCollator.getResponseList(CommandAwareRpcDispatcher.java:372)
...
at
org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(DistributionManagerImpl.java:169)
...
at org.infinispan.CacheSupport.put(CacheSupport.java:52)
at org.infinispan.benchmark.Transactional.start(Transactional.java:110)
at org.infinispan.benchmark.Transactional.main(Transactional.java:70)
State transfer was disabled, so during the cluster startup the nodes
only had to communicate with the coordinator and not between them. The
put command had to get the old value from another node, so it needed
the physical address and had to block until PING would retrieve it.
That's not the way it works; at startup of F, it sends its IP address
with the discovery request. Everybody returns its IP address with the
discovery response, so even though we have F only talking to A (the
coordinator) initially, F will also know the IP addresses of A,B,C,D and E.
Does PING use RSVP
No: (1) I don;'t want a dependency of Discovery on RSVP and (2) the
discovery is unreliable; discovery requests or responses can get dropped.
or does it wait for the normal STABLE timeout for retransmission?
Note that everything is blocked at this point, we
won't send another message in the entire cluster until we got the physical address.
As I said; this is an exceptional case, probably caused by Sanne
starting 12 channels inside the same JVM, at the same time, therefore
causing a traffic spike, which results in dropped discovery requests or
responses.
After than, when F wants to talk to C, it asks the cluster for C's IP
address, and that should be a few ms at most.
I'm sure you've already considered it before, but why not
make the
physical addresses a part of the view installation message? This
should ensure that every node can communicate with every other node by
the time the view is installed.
There's a few reasons:
- I don't want to make GMS dependent on logical addresses. GMS is
completely independent and shouldn't know about physical addresses
- At the time GMS kicks in, it's already too late. Remember, F needs to
send a unicast JOIN request to A, but at this point it doesn't yet know
A's address
- MERGE{2,3} also use discovery to detect sub-partitions to be merged,
so discovery needs to be a separate piece of functionality
- A View is already big as it is, and I've managed to reduce its size
even more, but adding physical addresses would blow up the size of View
even more, especially in large clusters
I'm also not sure what to make of these lines:
>>> [org.jgroups.protocols.UDP] sanne-55119: no physical address for
>>> sanne-53650, dropping message
>>> [org.jgroups.protocols.pbcast.GMS] JOIN(sanne-55119) sent to
>>> sanne-53650 timed out (after 3000 ms), retrying
It appears that sanne-55119 knows the logical name of sanne-53650, and
the fact that it's coordinator, but not its physical address.
Shouldn't all of this information have arrived at the same time?
Hmm, correct. However, the logical names are kept in (a static)
UUID.cache and the IP addresses in TP.logical_addr_cache.
I suggest to do the following when this happens (can you reproduce this ?):
- Before: set enable_diagnostics=true in UDP
- probe.sh op=UDP.printLogicalAddressCache // you can replace probe.sh
with java -jar jgroups.jar org.jgroups.tests.Probe
Here you can dump the logical caches, to see whether this information is
absent.
You could also enable tracing for PING:
probe.sh op=PING.setLevel["trace"]
--
Bela Ban
Lead JGroups (
http://www.jgroups.org)
JBoss / Red Hat