[jboss-jira] [JBoss JIRA] Commented: (JGRP-1171) Address cache in TP protocol never removes inactive members, which causes enourmous delays sending multicast messages using TCP

Wednesday, 31 March 2010

    [
https://jira.jboss.org/jira/browse/JGRP-1171?page=com.atlassian.jira.plug...
] 

Bela Ban commented on JGRP-1171:
--------------------------------

See my comments on JGRP-1147 for more details.

...
 Address cache in TP protocol never removes inactive members, which
causes enourmous delays sending multicast messages using TCP

-------------------------------------------------------------------------------------------------------------------------------

                 Key: JGRP-1171
                 URL: https://jira.jboss.org/jira/browse/JGRP-1171
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 2.8, 2.9
            Reporter: Fedor Cherepanov
            Assignee: Bela Ban
             Fix For: 2.10

 org.jgroups.blocks.LazyRemovalCache used in org.jgroups.protocols.TP removes marked cache
items only when it's size exceeds max_elements size, which is set to 20 in TP.
 I'm using jgroups (tried 2.8 and 2.9) with jboss-cache 3.2.1, using TCP protocol.
I've tried to investigate why when any node leaves the cluster, replication time
increases by a second (around 50ms initially). 
 Here's what I found:
 What a node leaves the cluster and view changes:
 1. TP calls logical_addr_cache.retainAll(members);
 2. LazyRemovalCache.retainAll updates the map, setting removable flag to true on those
members that are not in the view.
 3. LazyRemovalCache.checkMaxSizeExceeded NEVER removes them from the cache because
it's size is always less than max_elements, which is 20.
 1. BasicTCP.sendMulticast calls TP.sendToAllPhysicalAddresses
 2. TP.sendToAllPhysicalAddresses iterates through all values in logical_addr_cache
calling sendUnicast for each
 3. logical_addr_cache contains all the nodes including those killed, and tries to connect
to each if them, which causes enormous delays
 This is causing replication time to increase for connection timeout for every node
removed from cluster 
-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] Commented: (JGRP-1171) Address cache in TP protocol never removes inactive members, which causes enourmous delays sending multicast messages using TCP