This happens every now and then, when multiple nodes join at the same
time, on the same host and PING has a small num_initial_mbrs.
Since 2.8, the identity of a member is not an IP address:port anymore,
but a UUID. The UUID has to be mapped to an IP address (and port), and
every member maintains a table of UUIDs/IP addresses. This table is
populated at startup, but the shipping of the IP address/UUID
association is unreliable (in the case of UDP), so packets do get
dropped when there are traffic spikes, like concurrent startup, or when
the high CPU usage slows down things.
If we need to send a unicast message to P, and the table doesn't have a
mapping for P, PING multicasts a discovery request, and drops the
message. Every member responds with the IP address of P, which is then
added to the table. The next time the message is sent (through
retransmission), P's IP address will be available, and the unicast send
should succeed.
Of course, if the multicast or unicast response is dropped too, we'll
run this protocol again... and again ... and again, until we finally
have a valid IP address for P.
On 1/31/12 11:29 AM, Manik Surtani wrote:
I have sporadically seen this before when running some perf tests as
well … curious to know what's up.
On 30 Jan 2012, at 17:45, Sanne Grinovero wrote:
> Hi Bela,
> this is the same error we where having in Boston when preparing the
> Infinispan nodes for some of the demos. So I didn't see it for a long
> time, but today it returned especially to add a special twist to my
> performance tests.
>
> Dan,
> when this happened it looked like I had a deadlock: the benchmark is
> not making any more progress, it looks like they are all waiting for
> answers. JConsole didn't detect a deadlock, and unfortunately I'm not
> having more logs than this from nor JGroups nor Infinispan (since it
> was supposed to be a performance test!).
>
> I'm attaching a threaddump in case it interests you, but I hope not:
> this is a DIST test with 12 nodes (in the same VM from this dump). I
> didn't have time to inspect it myself as I have to run, and I think
> the interesting news here is with the "no physical address"
>
> ideas?
>
> [org.jboss.logging] Logging Provider: org.jboss.logging.Log4jLoggerProvider
> [org.jgroups.protocols.UDP] sanne-55119: no physical address for
> sanne-53650, dropping message
> [org.jgroups.protocols.pbcast.GMS] JOIN(sanne-55119) sent to
> sanne-53650 timed out (after 3000 ms), retrying
> [org.jgroups.protocols.pbcast.GMS] sanne-55119 already present;
> returning existing view [sanne-53650|5] [sanne-53650, sanne-49978,
> sanne-27401, sanne-4741, sanne-29196, sanne-55119]
> [org.jgroups.protocols.UDP] sanne-39563: no physical address for
> sanne-53650, dropping message
> [org.jgroups.protocols.pbcast.GMS] JOIN(sanne-39563) sent to
> sanne-53650 timed out (after 3000 ms), retrying
> [org.jgroups.protocols.pbcast.GMS] sanne-39563 already present;
> returning existing view [sanne-53650|6] [sanne-53650, sanne-49978,
> sanne-27401, sanne-4741, sanne-29196, sanne-55119, sanne-39563]
> [org.jgroups.protocols.UDP] sanne-18071: no physical address for
> sanne-39563, dropping message
> [org.jgroups.protocols.UDP] sanne-18071: no physical address for
> sanne-55119, dropping message
> <threadDump.txt>_______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Bela Ban
Lead JGroups (
http://www.jgroups.org)
JBoss / Red Hat