[jboss-jira] [JBoss JIRA] Commented: (JGRP-844) Discovery: make it a singleton with a shared transport

Thu Oct 23 09:43:20 EDT 2008

    [ https://jira.jboss.org/jira/browse/JGRP-844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12435113#action_12435113 ] 

Brian Stansberry commented on JGRP-844:
---------------------------------------

AIUI, this assumes homogeneous deployment of channels on top of the shared discovery.  Which may not be true in general, and definitely isn't true at temporary times. Scenario:

Two nodes, A and B, with two services 1 and 2 that open channels on top of a shared discovery. A and B are starting/deploying services concurrently. Order of start is:

1) A1
2) B1
3) B2
4) A2

Problem I think might be there is in step 3, B2's GMS will think A is the coordinator and try to send JOIN msgs to A2, which doesn't exist.

That specific scenario could be recoverable, i.e. A2 will eventually start, at which point B2's JOIN retries will succeed.  But if service 2 wasn't deployed at all on A, there would be no step 4). In that case I'd think there'd need to be some mechanism by which GMS could eventually force a real discovery, bypassing the cached data.

> Discovery: make it a singleton with a shared transport
> ------------------------------------------------------
>
>                 Key: JGRP-844
>                 URL: https://jira.jboss.org/jira/browse/JGRP-844
>             Project: JGroups
>          Issue Type: Feature Request
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 2.8
>
>
> When we have a shared transport and 5 channels on top of it, then every channel will run the discovery protocol. If it is the first node in a cluster, this will take <5 * Discovery.timeout> ms.
> Now, if the 5 channels didn't just share the transport, but also the discovery protocol, then only the first channel to start would have to wait for Discovery.timeout ms. It would then cache the results of that discovery and, when view changes are received, replace the contents of the cache with view information.
> The remaining 4 channels would then not even need to run the discovery phase, but the discovery protocol would simply use the current view to return the coordinator. This means that instead of 5 * timeout, we have 1 * timeout !

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira