[jboss-jira] [JBoss JIRA] (JGRP-2028) GMS sometimes ignores view bundling timeout

Bela Ban (JIRA) issues at jboss.org
Wed Mar 16 13:25:00 EDT 2016


     [ https://issues.jboss.org/browse/JGRP-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban updated JGRP-2028:
---------------------------
    Fix Version/s: 3.6.9
                   4.0


> GMS sometimes ignores view bundling timeout
> -------------------------------------------
>
>                 Key: JGRP-2028
>                 URL: https://issues.jboss.org/browse/JGRP-2028
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.6.8
>            Reporter: Dan Berindei
>            Assignee: Bela Ban
>            Priority: Minor
>             Fix For: 3.6.9, 4.0
>
>
> {{GMS.ViewHandler.run()}} has this code:
> {code}
>     wait_time=timeout - (System.nanoTime() - start_time);
>     if(wait_time > 0 && firstRequest.canBeProcessedTogether(firstRequest)) { // JGRP-1438
>         long wait_time_ms=TimeUnit.MILLISECONDS.convert(wait_time, TimeUnit.NANOSECONDS);
>         queue.waitUntilClosed(wait_time_ms); // misnomer: waits until element has been added or q closed
>     }
> {code}
> The problem is {{wait_time_ms}} will be {{0}} if {{0 < wait_time < 1_000_000}}. {{queue.waitUntilClosed(wait_time_ms)}} then calls {{Object.wait(0)}} and blocks forever.
> Fortunately, the joiners re-sends the {{JOIN_REQ}} message after {{GMS.join_timeout}} (3s by default), so all that happens is that the view is delayed by 3s. It does cause some random failures in tests that expect the view to form in a precise amount of time, though.
> {noformat}
> 17:04:53,034 DEBUG (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: sending JOIN(NodeF-49399) to NodeE-37644
> 17:04:53,034 TRACE (ForkThread-3,InitialClusterSizeTest:) [TCP_NIO2] NodeF-49399: sending msg to NodeE-37644, src=NodeF-49399, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
> 17:04:53,035 DEBUG (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: sending JOIN(NodeG-49307) to NodeE-37644
> 17:04:53,035 TRACE (ForkThread-4,InitialClusterSizeTest:) [TCP_NIO2] NodeG-49307: sending msg to NodeE-37644, src=NodeG-49307, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
> 17:04:53,060 TRACE (INT-1,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeF-49399 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
> 17:04:53,060 TRACE (INT-2,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeG-49307 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
> 17:04:53,060 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644 <-- DATA(NodeF-49399: #1, conn_id=0, first)
> 17:04:53,060 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644 <-- DATA(NodeG-49307: #1, conn_id=0, first)
> 17:04:53,061 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeG-49307#1
> 17:04:53,061 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeF-49399#1
> 17:04:56,035 WARN  (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: JOIN(NodeF-49399) sent to NodeE-37644 timed out (after 3000 ms), on try 1
> 17:04:56,035 WARN  (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: JOIN(NodeG-49307) sent to NodeE-37644 timed out (after 3000 ms), on try 1
> 17:04:56,036 DEBUG (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: sending JOIN(NodeF-49399) to NodeE-37644
> 17:04:56,036 TRACE (ForkThread-3,InitialClusterSizeTest:) [TCP_NIO2] NodeF-49399: sending msg to NodeE-37644, src=NodeF-49399, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
> 17:04:56,036 DEBUG (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: sending JOIN(NodeG-49307) to NodeE-37644
> 17:04:56,036 TRACE (ForkThread-4,InitialClusterSizeTest:) [TCP_NIO2] NodeG-49307: sending msg to NodeE-37644, src=NodeG-49307, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
> 17:04:56,057 TRACE (INT-1,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeF-49399 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
> 17:04:56,057 TRACE (INT-2,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeG-49307 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
> 17:04:56,057 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeG-49307#2
> 17:04:56,057 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeF-49399#2
> # finally
> 17:04:56,057 TRACE (ViewHandler,NodeE-37644:) [GMS] NodeE-37644: joiners=[NodeG-49307, NodeF-49399], suspected=[], leaving=[], new view: [NodeE-37644|1] (3) [NodeE-37644, NodeG-49307, NodeF-49399]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.11#64026)


More information about the jboss-jira mailing list