[
https://issues.redhat.com/browse/JGRP-2463?page=com.atlassian.jira.plugin...
]
Bela Ban commented on JGRP-2463:
--------------------------------
{quote}
I now have another theory: each TransferQueueBundler.run() iteration drains the entire
contents of the queue into remove_queue, then tries to send the messages one by one. If
there's an exception (e.g. java.net.ConnectException) sending any of those messages,
it's only caught at the end of the iteration, and the next iteration drops all the
unsent messages with removed_queue.clear().
{quote}
No, that's not true: {{sendBundledMessages()}} will never throw an exception, as
{{sendSingleMessage()}} and {{sendMessageList()}} catch (and log) all exceptions.
TransferQueueBundler: Message to stopped node blocks the bundler
thread
-----------------------------------------------------------------------
Key: JGRP-2463
URL:
https://issues.redhat.com/browse/JGRP-2463
Project: JGroups
Issue Type: Bug
Affects Versions: 4.2.1
Reporter: Dan Berindei
Assignee: Bela Ban
Priority: Major
Fix For: 4.2.2, 5.0.0.Alpha4
{{TransferQueueBundler}} sends all the messages from a single thread. When one of the
{{TP.doSend()}} calls blocks, the bundler thread no longer makes any progress, and it
doesn't send messages to any destination, even if {{TP.doSend()}} is only slow for one
particular destination.
One example is when sending a message to a stopped node, e.g. the coordinator sending a
{{LEAVE_RSP}} after the leaver has already stopped. The bundler thread calls
{{TP.doSend()}}, the connection no longer exists, so it ends up calling
{{BaseServer.createConnection()}}. If the stopped node's machine is no longer up or it
is configured to drop messages to closed ports, the connection open blocks the bundler
thread for {{TCP.sock_conn_timeout}}(default: 2s).
{{UNICAST3}} also retransmits the highest sent message every {{UNICAST3.xmit_interval}}
(default: 500ms), for {{UNICAST3.max_retransmit_time}}(default: 1 min), so the bundler
thread will block more than once for the same message.
I assume the bundler thread will also block if the transport is {{TCP}}, one of the
destinations is overloaded, and the TCP connection's send buffer is full. Normally
applications try to spread the workload evenly among members, but e.g. with RELAY2 not all
the members will be site masters.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)