]
Bela Ban commented on JGRP-1171:
--------------------------------
See my comments on JGRP-1147 for more details.
Address cache in TP protocol never removes inactive members, which
causes enourmous delays sending multicast messages using TCP
-------------------------------------------------------------------------------------------------------------------------------
Key: JGRP-1171
URL:
https://jira.jboss.org/jira/browse/JGRP-1171
Project: JGroups
Issue Type: Bug
Affects Versions: 2.8, 2.9
Reporter: Fedor Cherepanov
Assignee: Bela Ban
Fix For: 2.10
org.jgroups.blocks.LazyRemovalCache used in org.jgroups.protocols.TP removes marked cache
items only when it's size exceeds max_elements size, which is set to 20 in TP.
I'm using jgroups (tried 2.8 and 2.9) with jboss-cache 3.2.1, using TCP protocol.
I've tried to investigate why when any node leaves the cluster, replication time
increases by a second (around 50ms initially).
Here's what I found:
What a node leaves the cluster and view changes:
1. TP calls logical_addr_cache.retainAll(members);
2. LazyRemovalCache.retainAll updates the map, setting removable flag to true on those
members that are not in the view.
3. LazyRemovalCache.checkMaxSizeExceeded NEVER removes them from the cache because
it's size is always less than max_elements, which is 20.
1. BasicTCP.sendMulticast calls TP.sendToAllPhysicalAddresses
2. TP.sendToAllPhysicalAddresses iterates through all values in logical_addr_cache
calling sendUnicast for each
3. logical_addr_cache contains all the nodes including those killed, and tries to connect
to each if them, which causes enormous delays
This is causing replication time to increase for connection timeout for every node
removed from cluster
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: