[jboss-jira] [JBoss JIRA] Commented: (JGRP-1073) merge_view does not correctly update logical_addr_cache

Mon Nov 9 05:07:05 EST 2009

    [ https://jira.jboss.org/jira/browse/JGRP-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12493734#action_12493734 ] 

Bela Ban commented on JGRP-1073:
--------------------------------

No, but we cannot remove the non-members of a given view from the logical address cache.

However, here's what I did to solve this (in BasicTCP):
- When sending a message and skip_suspected_mbrs is true, if there's an exception, we add the destination to the suspected_mbrs list
- When sending a message, and the dest is part of suspected_mbrs, we drop the message
- When receiving traffic from that member (e.g. through a new TCP connection), we remove the member from suspected_mbrs

The test suite passes with these changes

> merge_view does not correctly update logical_addr_cache
> -------------------------------------------------------
>
>                 Key: JGRP-1073
>                 URL: https://jira.jboss.org/jira/browse/JGRP-1073
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.8
>            Reporter: Nomane Nomane
>            Assignee: Bela Ban
>             Fix For: 2.8
>
>         Attachments: jgroup.log, tcp.xml
>
>
> I have a setup where there is a TCP jgroup cluster with member A&B and in a very short period of time member A is replaced by member C.
> A=10.62.2.68:7800(ACORES) B=10.62.3.108:7800(OULALI) and C=10.62.2.65:7800(jerome)
> There is a problem on C where the logical_addr_cache is not updated correctly and the member A is still in the list of address. Therefore, whenever I try to send a message the tcp stack try to contact A even if it is down.
> analysis:
> I use TCPPING to make TCP discovery.
> When C starts-up, it contacts member B and retrieves cluster members list : 
> >2009-10-15 17:53:36,489 - received GET_MBRS_RSP from OULALI-79: own_addr=ACORES-23748, view id=[OULALI-79|8], is_server=true, is_coord=false, 
> >logical_name=ACORES-23748, physical_addrs=10.62.2.68:7800
> >2009-10-15 17:53:36,489 - message is [dst: jerome-57964, src: OULALI-79 (2 headers), size=0 bytes, flags=OOB], headers are TCPPING: [PING: 
> >type=GET_MBRS_RSP, arg=own_addr=f5bca2fa-95d5-0791-e4c4-5628f9288962, view id=[OULALI-79|8], is_server=true, is_coord=false, 
> >logical_name=ACORES-33477, physical_addrs=10.62.2.68:7800], TCP: [channel_name=RelayDevCluster]
> Hereafter member A gets discarded and view is updated by B to C :
> >2009-10-15 17:53:40,689 - new_view=[OULALI-79|9] [OULALI-79, jerome-57964]
> >2009-10-15 17:53:40,689 - jerome-57964: view is [OULALI-79|9] [OULALI-79, jerome-57964]
> >2009-10-15 17:53:40,689 - VIEW_CHANGE received: [OULALI-79, jerome-57964]
> But member A is still contacted whenever I try to send a message:
> >2009-10-15 17:53:45,733 - failure sending message to 10.62.2.68:7800
> >java.lang.Exception: connection to 10.62.2.68:7800 could not be established
> I attach logs and configuration file of the C node.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira