[jboss-jira] [JBoss JIRA] Commented: (JGRP-1073) merge_view does not correctly update logical_addr_cache

Nomane Nomane (JIRA) jira-events at lists.jboss.org
Mon Nov 9 05:24:06 EST 2009


    [ https://jira.jboss.org/jira/browse/JGRP-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12493740#action_12493740 ] 

Nomane Nomane commented on JGRP-1073:
-------------------------------------

I have checked in BasicTCP class (function sendUnicast) , you manage suspect member by avoiding to send them a message by handle exception and a suspect list,

but concerning the cache, you know that despite the fact suspect list contain elements, we continue (in this ticket use case) to try to connect to member A (until the end of the program), 

you said that "It will take 120 seconds for A to get removed, but only if there's another add() or remove()", you mean that the deprecated member in cache will be remove only if a new view will be send ?

> merge_view does not correctly update logical_addr_cache
> -------------------------------------------------------
>
>                 Key: JGRP-1073
>                 URL: https://jira.jboss.org/jira/browse/JGRP-1073
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.8
>            Reporter: Nomane Nomane
>            Assignee: Bela Ban
>             Fix For: 2.8
>
>         Attachments: jgroup.log, tcp.xml
>
>
> I have a setup where there is a TCP jgroup cluster with member A&B and in a very short period of time member A is replaced by member C.
> A=10.62.2.68:7800(ACORES) B=10.62.3.108:7800(OULALI) and C=10.62.2.65:7800(jerome)
> There is a problem on C where the logical_addr_cache is not updated correctly and the member A is still in the list of address. Therefore, whenever I try to send a message the tcp stack try to contact A even if it is down.
> analysis:
> I use TCPPING to make TCP discovery.
> When C starts-up, it contacts member B and retrieves cluster members list : 
> >2009-10-15 17:53:36,489 - received GET_MBRS_RSP from OULALI-79: own_addr=ACORES-23748, view id=[OULALI-79|8], is_server=true, is_coord=false, 
> >logical_name=ACORES-23748, physical_addrs=10.62.2.68:7800
> >2009-10-15 17:53:36,489 - message is [dst: jerome-57964, src: OULALI-79 (2 headers), size=0 bytes, flags=OOB], headers are TCPPING: [PING: 
> >type=GET_MBRS_RSP, arg=own_addr=f5bca2fa-95d5-0791-e4c4-5628f9288962, view id=[OULALI-79|8], is_server=true, is_coord=false, 
> >logical_name=ACORES-33477, physical_addrs=10.62.2.68:7800], TCP: [channel_name=RelayDevCluster]
> Hereafter member A gets discarded and view is updated by B to C :
> >2009-10-15 17:53:40,689 - new_view=[OULALI-79|9] [OULALI-79, jerome-57964]
> >2009-10-15 17:53:40,689 - jerome-57964: view is [OULALI-79|9] [OULALI-79, jerome-57964]
> >2009-10-15 17:53:40,689 - VIEW_CHANGE received: [OULALI-79, jerome-57964]
> But member A is still contacted whenever I try to send a message:
> >2009-10-15 17:53:45,733 - failure sending message to 10.62.2.68:7800
> >java.lang.Exception: connection to 10.62.2.68:7800 could not be established
> I attach logs and configuration file of the C node.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list