[jboss-jira] [JBoss JIRA] (JGRP-1841) Discovery: reduce number of discovery requests to backend store

Fri May 23 05:28:56 EDT 2014

     [ https://issues.jboss.org/browse/JGRP-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban updated JGRP-1841:
---------------------------

    Summary: Discovery: reduce number of discovery requests to backend store   (was: Static discovery protocol)


> Discovery: reduce number of discovery requests to backend store 
> ----------------------------------------------------------------
>
>                 Key: JGRP-1841
>                 URL: https://issues.jboss.org/browse/JGRP-1841
>             Project: JGroups
>          Issue Type: Feature Request
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.5
>
>
> A new protocol similar to {{FILE_PING}} (perhaps a subclass?) which lists *all* members in a boostrap file in the following format:
> || Logical name || UUID || IP address:port || coord ||
> | A | 1 | 192.168.1.5:7800 | true |
> | B | 2 | 192.168.1.6:7800 | false |
> | C | 3 | 192.168.1.7:7800 | false |
> The file could be located on a (shared) file system, S3, a DB table or a cloud store.
> *This could possibly be an alternative impl of {{FILE_PING}}, {{S3_PING}}, {{GOOGLE_PING}}, {{JDBC_PING}} etc.*
> On startup, the static discovery protocol reads this file and populates the {{UUID.cache}} and {{TP.logical_addr_cache}} caches in the transport.
> Once this is done, there is no need for lookups as the caches should have the complete information. Note that {{TP.logical_addr_cache_max_size}} should be greater than the max number of nodes.
> When nodes are started, they need to be given the logical name and UUID indicated in the file. The former can be done via {{JChannel.name(String name)}}, the latter should be done via an {{AddressGenerator}}.
> Note that UUIDs cannot be reused, so when a channel is disconnected and subsequently reconnected, the address generator should pick a different UUID (perhaps a random one). This should be reflected in the config file as well.
> Also, nodes need to be started in the order in which they are listed. The coordinator to contact for joining the cluster is marked, so ideally only 1 JOIN req-rsp round is needed.
> The goals of this protocol are:
> * Used when IP multicast is not available
> * Quick startup
> * Reducing the number of calls to the cloud store (latency!)
> ** Instead of N calls to the (cloud) store, only 1 call is needed (to read the file)
> * Large clusters: using N-1 for multicast simulation quickly generates too much traffic in the discovery phase
> h5. Coordinator changes
> When the coordinator changes, the new coordinator needs to update the file; changing the coordinatorship so nodes started after this will contact the right coordinator.
> h5. New members (not listed) join
> This could be handled by either changing the bootstrap file manually or dynamically:
> * The new member reads the file and sends an INFO message with its UUID, logical_name and IP address to all members
> ** (This is done before sending a JOIN request to the coordinator)
> * Every member updates their local cache when receiving the INFO message
> * The coordinator, upon reception of an INFO message, updates the file
> ** This ensures that only 1 node (the coord) updates the file and prevents corruption of the file through concurrent updates


--
This message was sent by Atlassian JIRA
(v6.2.3#6260)