[jboss-jira] [JBoss JIRA] (JGRP-2463) TransferQueueBundler: Message to stopped node blocks the bundler thread

Bela Ban (Jira) issues at jboss.org
Wed Apr 1 03:17:27 EDT 2020


    [ https://issues.redhat.com/browse/JGRP-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016690#comment-14016690 ] 

Bela Ban commented on JGRP-2463:
--------------------------------

Hi [~dan.berindei]
Can this be reproduced?

Establishing a connection to a port on which no process is listening will result in an ICMP error message back to the sender, and the connection will be closed immediately, so this should not be an issue.

This scenario _can_ occur when:
* The receiver doesn't read messages off of the TCP socket -> the TCP send-window at the sender will become 0 and the sender will block
* The receiver is suspended ({{CTRL-Z}} or {{kill -SIGTSTP PID}}); this ends up being the same scenario as above

We _could_ experiment with a bundler that has 1 queue for destination (and 1 associated thread dequeuing), and {{RED}} dropping messages before/when the queue gets full. However, this is too complicated a change...

I think we should use {{TCP_NIO2}} for scenarios in which TCP writes can block. I guess I should move JGRP-2108 up... wdyt?

> TransferQueueBundler: Message to stopped node blocks the bundler thread
> -----------------------------------------------------------------------
>
>                 Key: JGRP-2463
>                 URL: https://issues.redhat.com/browse/JGRP-2463
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 4.2.1
>            Reporter: Dan Berindei
>            Assignee: Bela Ban
>            Priority: Major
>             Fix For: 4.2.2, 5.0.0.Alpha4
>
>
> {{TransferQueueBundler}} sends all the messages from a single thread. When one of the {{TP.doSend()}} calls blocks, the bundler thread no longer makes any progress, and it doesn't send messages to any destination, even if {{TP.doSend()}} is only slow for one particular destination.
> One example is when sending a message to a stopped node, e.g. the coordinator sending a {{LEAVE_RSP}} after the leaver has already stopped. The bundler thread calls {{TP.doSend()}}, the connection no longer exists, so it ends up calling {{BaseServer.createConnection()}}. If the stopped node's machine is no longer up or it is configured to drop messages to closed ports, the connection open blocks the bundler thread for {{TCP.sock_conn_timeout}}(default: 2s).
> {{UNICAST3}} also retransmits the highest sent message every {{UNICAST3.xmit_interval}} (default: 500ms), for {{UNICAST3.max_retransmit_time}}(default: 1 min), so the bundler thread will block more than once for the same message.
> I assume the bundler thread will also block if the transport is {{TCP}}, one of the destinations is overloaded, and the TCP connection's send buffer is full. Normally applications try to spread the workload evenly among members, but e.g. with RELAY2 not all the members will be site masters.



--
This message was sent by Atlassian Jira
(v7.13.8#713008)


More information about the jboss-jira mailing list