[
https://jira.jboss.org/jira/browse/JGRP-860?page=com.atlassian.jira.plugi...
]
Galder Zamarreno commented on JGRP-860:
---------------------------------------
Bela, as talked to you in Brno, I had an idea of building a tool that would
integrate the 3 tests that pretty much verify that nodes will be able to
cluster together without any issues:
- muticast test
- view demo
- larget state test
Each of the test different scenarios that I've seen fail in customer's environment
before. For example: sometimes multicasting does not work at all, sometimes
multicast works but after stop/start, nodes do not join and other times large
chunks cannot be transfered leading to farming issues that are rooted in network
issues.
My idea was not to have a central GUI of any sorts but potentially have some kind
of coordinator (or 1st node where script is executed) and to start a
"all-in-one" test
in each node, indicated how many nodes would be in the cluster.
Then, this test would first execute a multicast test where each individual node
sends a message containing something different and each node verifies that
it's received X number of messages.
Once the coordinator figured out that all nodes had seen all messages from all, then
you can move onto the next test, which is a combination of view demo and larget
state test:
- start with xml configuration and number of expected nodes in cluster (N) and state size
to transfer (i.e 1 MB)
- coordinator checks that view contains the number of expected nodes.
- N-1 nodes that are only members must have received the state correctly (they know the
size they should expect).
- coordinator instructs 2nd node in cluster to die/shutdown and restart channel after T
timeout.
- coordinator checks that after instruction to shutdown, the view is updated correctly.
- coordinator checks that after waiting long enough, the 2nd node is part of the view
again and 2nd node received state correclty.
- repeat this with N-1 nodes that are supposed to form the cluster.
- once the current coordinator, c1, has done this with all nodes, it will shutdown its own
channel so that a new coordinator is elected.
- new coordinator, c2, will do the same thing all over again with the rest of nodes.
- test ends when you'd done all permutations c1-cN.
Some of Sacha's thoughts could be integrated with this, i.e. if all-in-one test fails
with
default configuration, try a TCP/MPING configuration...etc.
Not sure whether adding benchmarking here is top priority here though. A tool that
quickly
and effectively show that something is wrong in the network is more important IMO.
Maybe we can discuss this when you're in NE next week?
As always, I'd love to do this but just gotta find out the time to do it.
GUI to test whether cluster nodes find each other ok
----------------------------------------------------
Key: JGRP-860
URL:
https://jira.jboss.org/jira/browse/JGRP-860
Project: JGroups
Issue Type: Feature Request
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 2.8
- New DiscoveryServer (GUI) which listens on a TCP socket at a given port
- Then the nodes are started and connect to the DiscoveryServer via TCP
- Connected nodes are shown in a listbox in the DiscoveryServer's GUI
- The TCP connection between a node and the DS is the *control channel*, and is
independent of JGroups (doesn't use a JGroups config)
- The control channel is used to (a) register nodes and (b) send new JGroups
configurations from the DS to the nodes
- The GUI has 4 tabs: UDP, TCP dynamic (TCP:MPING config), TCP static (TCP:TCPPING) and
External discovery (TCP:TCPGOSSIP).
- Each tab has configuration specific options, e.g. for UDP we have
- multicast address
- multicast port
- TTL
- checkbox: use flush or not
- checkbox: use state transfer or not
- etc etc etc
There are 2 actions buttons: "Discover" and "Save config". The latter
saves the current configuration to a file (XML format). The "Discovery" button
sends the current config to all nodes, via the control channel. Each node then creates a
channel with the received config and connects to the cluster. The DiscoveryServer GUI
(which also joins the same cluster) shows all nodes in a second listbox. When the number
of nodes in the first and second listbox is the same, then this shows that the config
works. In that case, the user can press "Save config" and use the generated
config with all cluster nodes.
- In case UDP doesn't work or a user wants a TCP based config from the onset, the
user simply clicks on the TCP tab, fills in the required information and clicks on
"Discover".
- The initial config for each tab is provided by the corresponding default shipped with
JGroups, e.g. UDP uses udp.xml
In a second stage, we could also not just verify that the cluster can be established, but
also run some perf tests over it.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira