GMS sometimes ignores view bundling timeout
-------------------------------------------
Key: JGRP-2028
URL:
https://issues.jboss.org/browse/JGRP-2028
Project: JGroups
Issue Type: Bug
Affects Versions: 3.6.8
Reporter: Dan Berindei
Assignee: Bela Ban
Priority: Minor
Fix For: 3.6.9, 4.0
{{GMS.ViewHandler.run()}} has this code:
{code}
wait_time=timeout - (System.nanoTime() - start_time);
if(wait_time > 0 && firstRequest.canBeProcessedTogether(firstRequest)) {
// JGRP-1438
long wait_time_ms=TimeUnit.MILLISECONDS.convert(wait_time,
TimeUnit.NANOSECONDS);
queue.waitUntilClosed(wait_time_ms); // misnomer: waits until element has been
added or q closed
}
{code}
The problem is {{wait_time_ms}} will be {{0}} if {{0 < wait_time < 1_000_000}}.
{{queue.waitUntilClosed(wait_time_ms)}} then calls {{Object.wait(0)}} and blocks forever.
Fortunately, the joiners re-sends the {{JOIN_REQ}} message after {{GMS.join_timeout}} (3s
by default), so all that happens is that the view is delayed by 3s. It does cause some
random failures in tests that expect the view to form in a precise amount of time,
though.
{noformat}
17:04:53,034 DEBUG (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: sending
JOIN(NodeF-49399) to NodeE-37644
17:04:53,034 TRACE (ForkThread-3,InitialClusterSizeTest:) [TCP_NIO2] NodeF-49399: sending
msg to NodeE-37644, src=NodeF-49399, headers are GMS: GmsHeader[JOIN_REQ]:
mbr=NodeF-49399, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
17:04:53,035 DEBUG (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: sending
JOIN(NodeG-49307) to NodeE-37644
17:04:53,035 TRACE (ForkThread-4,InitialClusterSizeTest:) [TCP_NIO2] NodeG-49307: sending
msg to NodeE-37644, src=NodeG-49307, headers are GMS: GmsHeader[JOIN_REQ]:
mbr=NodeG-49307, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
17:04:53,060 TRACE (INT-1,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst:
NodeE-37644, src: NodeF-49399 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are
GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=1, first, TP:
[cluster_name=ISPN]
17:04:53,060 TRACE (INT-2,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst:
NodeE-37644, src: NodeG-49307 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are
GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=1, first, TP:
[cluster_name=ISPN]
17:04:53,060 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644 <-- DATA(NodeF-49399:
#1, conn_id=0, first)
17:04:53,060 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644 <-- DATA(NodeG-49307:
#1, conn_id=0, first)
17:04:53,061 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeG-49307#1
17:04:53,061 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeF-49399#1
17:04:56,035 WARN (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399:
JOIN(NodeF-49399) sent to NodeE-37644 timed out (after 3000 ms), on try 1
17:04:56,035 WARN (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307:
JOIN(NodeG-49307) sent to NodeE-37644 timed out (after 3000 ms), on try 1
17:04:56,036 DEBUG (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: sending
JOIN(NodeF-49399) to NodeE-37644
17:04:56,036 TRACE (ForkThread-3,InitialClusterSizeTest:) [TCP_NIO2] NodeF-49399: sending
msg to NodeE-37644, src=NodeF-49399, headers are GMS: GmsHeader[JOIN_REQ]:
mbr=NodeF-49399, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
17:04:56,036 DEBUG (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: sending
JOIN(NodeG-49307) to NodeE-37644
17:04:56,036 TRACE (ForkThread-4,InitialClusterSizeTest:) [TCP_NIO2] NodeG-49307: sending
msg to NodeE-37644, src=NodeG-49307, headers are GMS: GmsHeader[JOIN_REQ]:
mbr=NodeG-49307, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
17:04:56,057 TRACE (INT-1,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst:
NodeE-37644, src: NodeF-49399 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are
GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=2, TP:
[cluster_name=ISPN]
17:04:56,057 TRACE (INT-2,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst:
NodeE-37644, src: NodeG-49307 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are
GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=2, TP:
[cluster_name=ISPN]
17:04:56,057 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeG-49307#2
17:04:56,057 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeF-49399#2
# finally
17:04:56,057 TRACE (ViewHandler,NodeE-37644:) [GMS] NodeE-37644: joiners=[NodeG-49307,
NodeF-49399], suspected=[], leaving=[], new view: [NodeE-37644|1] (3) [NodeE-37644,
NodeG-49307, NodeF-49399]
{noformat}