[jboss-jira] [JBoss JIRA] Commented: (JGRP-860) GUI to test whether cluster nodes find each other ok

Fri Dec 12 09:48:36 EST 2008

    [ https://jira.jboss.org/jira/browse/JGRP-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12442341#action_12442341 ] 

Galder Zamarreno commented on JGRP-860:
---------------------------------------

Bela, as talked to you in Brno, I had an idea of building a tool that would
integrate the 3 tests that pretty much verify that nodes will be able to 
cluster together without any issues:

- muticast test
- view demo
- larget state test

Each of the test different scenarios that I've seen fail in customer's environment 
before. For example: sometimes multicasting does not work at all, sometimes
multicast works but after stop/start, nodes do not join and other times large
chunks cannot be transfered leading to farming issues that are rooted in network 
issues.

My idea was not to have a central GUI of any sorts but potentially have some kind
of coordinator (or 1st node where script is executed) and to start a "all-in-one" test
in each node, indicated how many nodes would be in the cluster. 

Then, this test would first execute a multicast test where each individual node 
sends a message containing something different and each node verifies that
it's received X number of messages.

Once the coordinator figured out that all nodes had seen all messages from all, then 
you can move onto the next test, which is a combination of view demo and larget 
state test:

- start with xml configuration and number of expected nodes in cluster (N) and state size to transfer (i.e 1 MB)
- coordinator checks that view contains the number of expected nodes.
- N-1 nodes that are only members must have received the state correctly (they know the size they should expect).
- coordinator instructs 2nd node in cluster to die/shutdown and restart channel after T timeout.
- coordinator checks that after instruction to shutdown, the view is updated correctly.
- coordinator checks that after waiting long enough, the 2nd node is part of the view again and 2nd node received state correclty.
- repeat this with N-1 nodes that are supposed to form the cluster.
- once the current coordinator, c1, has done this with all nodes, it will shutdown its own channel so that a new coordinator is elected.
- new coordinator, c2, will do the same thing all over again with the rest of nodes.
- test ends when you'd done all permutations c1-cN.

Some of Sacha's thoughts could be integrated with this, i.e. if all-in-one test fails with 
default configuration, try a TCP/MPING configuration...etc.

Not sure whether adding benchmarking here is top priority here though. A tool that quickly
and effectively show that something is wrong in the network is more important IMO.

Maybe we can discuss this when you're in NE next week?

As always, I'd love to do this but just gotta find out the time to do it.

> GUI to test whether cluster nodes find each other ok
> ----------------------------------------------------
>
>                 Key: JGRP-860
>                 URL: https://jira.jboss.org/jira/browse/JGRP-860
>             Project: JGroups
>          Issue Type: Feature Request
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 2.8
>
>
> - New DiscoveryServer (GUI) which listens on a TCP socket at a given port
> - Then the nodes are started and connect to the DiscoveryServer via TCP
> - Connected nodes are shown in a listbox in the DiscoveryServer's GUI
> - The TCP connection between a node and the DS is the *control channel*, and is independent of JGroups (doesn't use a JGroups config)
> - The control channel is used to (a) register nodes and (b) send new JGroups configurations from the DS to the nodes
> - The GUI has 4 tabs: UDP, TCP dynamic (TCP:MPING config), TCP static (TCP:TCPPING) and External discovery (TCP:TCPGOSSIP).
> - Each tab has configuration specific options, e.g. for UDP we have
>   - multicast address
>   - multicast port
>   - TTL
>   - checkbox: use flush or not
>   - checkbox: use state transfer or not
>   - etc etc etc
> There are 2 actions buttons: "Discover" and "Save config". The latter saves the current configuration to a file (XML format). The "Discovery" button sends the current config to all nodes, via the control channel. Each node then creates a channel with the received config and connects to the cluster. The DiscoveryServer GUI (which also joins the same cluster) shows all nodes in a second listbox. When the number of nodes in the first and second listbox is the same, then this shows that the config works. In that case, the user can press "Save config" and use the generated config with all cluster nodes.
> - In case UDP doesn't work or a user wants a TCP based config from the onset, the user simply clicks on the TCP tab, fills in the required information and clicks on "Discover".
> - The initial config for each tab is provided by the corresponding default shipped with JGroups, e.g. UDP uses udp.xml
> In a second stage, we could also not just verify that the cluster can be established, but also run some perf tests over it.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira