[jboss-jira] [JBoss JIRA] (JGRP-1841) Static discovery protocol

Fri May 23 02:44:56 EDT 2014

     [ https://issues.jboss.org/browse/JGRP-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban updated JGRP-1841:
---------------------------

    Description: 
A new protocol similar to {{FILE_PING}} (perhaps a subclass?) which lists *all* members in a boostrap file in the following format:
|| Logical name || UUID || IP address:port || coord ||
| A | 1 | 192.168.1.5:7800 | true |
| B | 2 | 192.168.1.6:7800 | false |
| C | 3 | 192.168.1.7:7800 | false |

The file could be located on a (shared) file system, S3, a DB table or a cloud store.

*This could possibly be an alternative impl of {{FILE_PING}}, {{S3_PING}}, {{GOOGLE_PING}}, {{JDBC_PING}} etc.*

On startup, the static discovery protocol reads this file and populates the {{UUID.cache}} and {{TP.logical_addr_cache}} caches in the transport.

Once this is done, there is no need for lookups as the caches should have the complete information. Note that {{TP.logical_addr_cache_max_size}} should be greater than the max number of nodes.

When nodes are started, they need to be given the logical name and UUID indicated in the file. The former can be done via {{JChannel.name(String name)}}, the latter should be done via an {{AddressGenerator}}.

Note that UUIDs cannot be reused, so when a channel is disconnected and subsequently reconnected, the address generator should pick a different UUID (perhaps a random one). This should be reflected in the config file as well.

Also, nodes need to be started in the order in which they are listed. The coordinator to contact for joining the cluster is marked, so ideally only 1 JOIN req-rsp round is needed.

The goals of this protocol are:
* Used when IP multicast is not available
* Quick startup
* Reducing the number of calls to the cloud store (latency!)
** Instead of N calls to the (cloud) store, only 1 call is needed (to read the file)
* Large clusters: using N-1 for multicast simulation quickly generates too much traffic in the discovery phase

h5. Coordinator changes
When the coordinator changes, the new coordinator needs to update the file; changing the coordinatorship so nodes started after this will contact the right coordinator.

h5. New members (not listed) join
This could be handled by either changing the bootstrap file manually or dynamically:
* The new member reads the file and sends an INFO message with its UUID, logical_name and IP address to all members
** (This is done before sending a JOIN request to the coordinator)
* Every member updates their local cache when receiving the INFO message
* The coordinator, upon reception of an INFO message, updates the file
** This ensures that only 1 node (the coord) updates the file and prevents corruption of the file through concurrent updates

  was:
A new protocol similar to {{FILE_PING}} (perhaps a subclass?) which lists *all* members in a boostrap file in the following format:
|| Logical name || UUID || IP address:port || coord ||
| A | 1 | 192.168.1.5:7800 | true |
| B | 2 | 192.168.1.6:7800 | false |
| C | 3 | 192.168.1.7:7800 | false |

The file could be located on a (shared) file system, S3, a DB table or a cloud store.

*This could possibly be an alternative impl of {{FILE_PING}}, {{S3_PING}}, {{GOOGLE_PING}}, {{JDBC_PING}} etc.*

On startup, the static discovery protocol reads this file and populates the {{UUID.cache}} and {{TP.logical_addr_cache}} caches in the transport.

Once this is done, there is no need for lookups as the caches should have the complete information. Note that {{TP.logical_addr_cache_max_size}} should be greater than the max number of nodes.

When nodes are started, they need to be given the logical name and UUID indicated in the file. The former can be done via {{JChannel.name(String name)}}, the latter should be done via an {{AddressGenerator}}.

Note that UUIDs cannot be reused, so when a channel is disconnected and subsequently reconnected, the address generator should pick a different UUID (perhaps a random one). This should be reflected in the config file as well.

Also, nodes need to be started in the order in which they are listed. The coordinator to contact for joining the cluster is marked, so ideally only 1 JOIN req-rsp round is needed.

The goals of this protocol are:
* Used when IP multicast is not available
* Quick startup
** Instead of N calls to the (cloud) store, only 1 call is needed (to read the file)
* Large clusters: using N-1 for multicast simulation quickly generates too much traffic in the discovery phase

h5. Coordinator changes
When the coordinator changes, the new coordinator needs to update the file; changing the coordinatorship so nodes started after this will contact the right coordinator.

h5. New members (not listed) join
This could be handled by either changing the bootstrap file manually or dynamically:
* The new member reads the file and sends an INFO message with its UUID, logical_name and IP address to all members
** (This is done before sending a JOIN request to the coordinator)
* Every member updates their local cache when receiving the INFO message
* The coordinator, upon reception of an INFO message, updates the file
** This ensures that only 1 node (the coord) updates the file and prevents corruption of the file through concurrent updates

> Static discovery protocol
> -------------------------
>
>                 Key: JGRP-1841
>                 URL: https://issues.jboss.org/browse/JGRP-1841
>             Project: JGroups
>          Issue Type: Feature Request
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.5
>
>
> A new protocol similar to {{FILE_PING}} (perhaps a subclass?) which lists *all* members in a boostrap file in the following format:
> || Logical name || UUID || IP address:port || coord ||
> | A | 1 | 192.168.1.5:7800 | true |
> | B | 2 | 192.168.1.6:7800 | false |
> | C | 3 | 192.168.1.7:7800 | false |
> The file could be located on a (shared) file system, S3, a DB table or a cloud store.
> *This could possibly be an alternative impl of {{FILE_PING}}, {{S3_PING}}, {{GOOGLE_PING}}, {{JDBC_PING}} etc.*
> On startup, the static discovery protocol reads this file and populates the {{UUID.cache}} and {{TP.logical_addr_cache}} caches in the transport.
> Once this is done, there is no need for lookups as the caches should have the complete information. Note that {{TP.logical_addr_cache_max_size}} should be greater than the max number of nodes.
> When nodes are started, they need to be given the logical name and UUID indicated in the file. The former can be done via {{JChannel.name(String name)}}, the latter should be done via an {{AddressGenerator}}.
> Note that UUIDs cannot be reused, so when a channel is disconnected and subsequently reconnected, the address generator should pick a different UUID (perhaps a random one). This should be reflected in the config file as well.
> Also, nodes need to be started in the order in which they are listed. The coordinator to contact for joining the cluster is marked, so ideally only 1 JOIN req-rsp round is needed.
> The goals of this protocol are:
> * Used when IP multicast is not available
> * Quick startup
> * Reducing the number of calls to the cloud store (latency!)
> ** Instead of N calls to the (cloud) store, only 1 call is needed (to read the file)
> * Large clusters: using N-1 for multicast simulation quickly generates too much traffic in the discovery phase
> h5. Coordinator changes
> When the coordinator changes, the new coordinator needs to update the file; changing the coordinatorship so nodes started after this will contact the right coordinator.
> h5. New members (not listed) join
> This could be handled by either changing the bootstrap file manually or dynamically:
> * The new member reads the file and sends an INFO message with its UUID, logical_name and IP address to all members
> ** (This is done before sending a JOIN request to the coordinator)
> * Every member updates their local cache when receiving the INFO message
> * The coordinator, upon reception of an INFO message, updates the file
> ** This ensures that only 1 node (the coord) updates the file and prevents corruption of the file through concurrent updates

--
This message was sent by Atlassian JIRA
(v6.2.3#6260)