[jboss-jira] [JBoss JIRA] Commented: (JGRP-550) When a new member joins a group and requests the cache from the coordinator, it always fails first time

Tue Jul 10 07:57:31 EDT 2007

    [ http://jira.jboss.com/jira/browse/JGRP-550?page=comments#action_12368354 ] 

Bela Ban commented on JGRP-550:
-------------------------------

Your logs indicate that "localhost" (in TCPPING) might point to 127.0.0.1, which will cause the members not to find each other.

My startup params were:
 java -Dlog4j.configuration=file:c:\log4j.properties -Djgroups.bind_addr=192.168.5.2 jgroup.example.SimpleExample one

> When a new member joins a group and requests the cache from the coordinator, it always fails first time
> -------------------------------------------------------------------------------------------------------
>
>                 Key: JGRP-550
>                 URL: http://jira.jboss.com/jira/browse/JGRP-550
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.5
>         Environment: Example code was run on Windows
>            Reporter: Dipak Kothari
>         Assigned To: Bela Ban
>             Fix For: 2.5
>
>         Attachments: example.zip
>
>
> Service A starts - becomes coordinator.  After it has started properly, Service B is started.  As part of the Join, it
> 1) Gets the members using TCPPING
> 2) Determines the coordinator
> 3) Joins
> 4) Applies view change via the installView.  This re-adjusts the members and closes any connections that are no longer members (so the connection to service A is removed).
> 5) Requests the cache from the coordinator.  Service A on response to this tries to send the cache but fails as peer connection has been closed.  It tries twice and removes connection.  Service B timeout and tries again and this time it is successful.  This happens each time.  I don't think this should happen - it should return the cache as it knowns where it needs to be sent to.
> I have added additional trace statements (these start with APM:) to show the flow for my understanding.  I have deliberately set the get_cache_timeout to a high number to highlight this.  I have also provided source and protocol properties in the zip for convenience.  There are 2 logs from the run I carried out: cord.log is the coordinator log and cord1.log is second services' log.
> Please let me know if there is a work around or a fix I can apply.  If I have mis-configured the properties then please advise how to rectify it.
> To run the example, run the bat script passing in the service name.  Note, the service name needs to be unique as the log name is based on this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira