[jboss-jira] [JBoss JIRA] Updated: (JGRP-1171) Address cache in TP protocol never removes inactive members, which causes enourmous delays sending multicast messages using TCP
Bela Ban (JIRA)
jira-events at lists.jboss.org
Mon Mar 15 08:28:37 EDT 2010
[ https://jira.jboss.org/jira/browse/JGRP-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bela Ban updated JGRP-1171:
---------------------------
Fix Version/s: 2.10
> Address cache in TP protocol never removes inactive members, which causes enourmous delays sending multicast messages using TCP
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: JGRP-1171
> URL: https://jira.jboss.org/jira/browse/JGRP-1171
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 2.8, 2.9
> Reporter: Fedor Cherepanov
> Assignee: Bela Ban
> Fix For: 2.10
>
>
> org.jgroups.blocks.LazyRemovalCache used in org.jgroups.protocols.TP removes marked cache items only when it's size exceeds max_elements size, which is set to 20 in TP.
> I'm using jgroups (tried 2.8 and 2.9) with jboss-cache 3.2.1, using TCP protocol. I've tried to investigate why when any node leaves the cluster, replication time increases by a second (around 50ms initially).
> Here's what I found:
> What a node leaves the cluster and view changes:
> 1. TP calls logical_addr_cache.retainAll(members);
> 2. LazyRemovalCache.retainAll updates the map, setting removable flag to true on those members that are not in the view.
> 3. LazyRemovalCache.checkMaxSizeExceeded NEVER removes them from the cache because it's size is always less than max_elements, which is 20.
> 1. BasicTCP.sendMulticast calls TP.sendToAllPhysicalAddresses
> 2. TP.sendToAllPhysicalAddresses iterates through all values in logical_addr_cache calling sendUnicast for each
> 3. logical_addr_cache contains all the nodes including those killed, and tries to connect to each if them, which causes enormous delays
> This is causing replication time to increase for connection timeout for every node removed from cluster
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list