A possible explanation for this:
Say we're not finding message 17 in [4 | 15 | 25]: 4 is the lowest
message we *garbage collected*, 15 the highest we *delivered* and 25 the
highest we *received* so far.
When different threads sent messages 15 - 18, they could have sent them
in the order 15 -> 18 -> 16 -> 17. (Messages are only ordered at the
receiver).
If, *before* 17 was added to the sender's retransmission table, the
retransmit task at the receiver kicked in, then message #17 would not be
found in the sender's retransmission table. A few microseconds later,
#17 would be added and therefore retransmission would pass, although the
receiver is likely *not* to ask for retransmission of #17 anymore as it
probably received the message by now.
This is *not* incorrect, but I mitigated it in 3.3.x by dividing message
gaps at the receiver into 2 groups: old and new, which is something like
a generational garbage collector, where the most recent missing messages
are not retransmitted for the first time, only when they 'survived' one
retransmission. In other words, with 3.3.0.x, you should see far fewer
of these warnings !
On 4/9/13 5:00 PM, Alan Field wrote:
Hey Bela,
A couple of weeks ago, I was trying to run the client stress test comparison between JDG
and Coherence under JProfiler. These test runs were not successful, because one of the
nodes in the cluster would always crash. However, I was also seeing missing element log
messages from JGroups, like this:
(
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-61-radargun-jdg-...)
16:48:09,363 WARN [org.jgroups.protocols.UNICAST2] (OOB-9,edg-perf01-16155)
edg-perf01-16155: (requester=edg-perf02-56801) message edg-perf02-56801::8723243 not found
in retransmission table of edg-perf02-56801:
[8722840 | 8722840 | 8723253] (411 elements, 2 missing)
16:48:12,327 WARN [org.jgroups.protocols.UNICAST2] (OOB-67,edg-perf01-16155)
edg-perf01-16155: (requester=edg-perf03-22539) message edg-perf03-22539::9484784 not found
in retransmission table of edg-perf03-22539:
[9484613 | 9484613 | 9484807] (193 elements, 1 missing)
16:48:12,794 WARN [org.jgroups.protocols.UNICAST2] (OOB-53,edg-perf01-16155)
edg-perf01-16155: (requester=edg-perf03-22539) message edg-perf03-22539::9484784 not found
in retransmission table of edg-perf03-22539:
[9484783 | 9484783 | 9484840] (56 elements, 1 missing)
I don't know if these messages have any relation to running the jobs with JProfiler,
but I wanted to ask you about them. In this job configuration, 4 nodes are used in the
cluster, (edg-perf01 to edg-perf04) but JProfiler is only attached to the JVM on
edg-perf01.
Thanks,
Alan
--
Bela Ban, JGroups lead (
http://www.jgroups.org)