]
Bela Ban commented on JGRP-2266:
--------------------------------
I didn't only remove the loop, but the reconnect task is also cancelled when there are
no stubs to reconnect to in {{reconnect_list}}.
RouterStubManager.run() endless reconnect loop burning a CPU
------------------------------------------------------------
Key: JGRP-2266
URL:
https://issues.jboss.org/browse/JGRP-2266
Project: JGroups
Issue Type: Bug
Affects Versions: 4.0.11
Environment: Small cluster (~ 20 nodes), some nodes are connected through OpenVPN
tunnels. MacOS and Linux nodes.
Reporter: Emmeran Seehuber
Assignee: Bela Ban
Fix For: 4.0.12
Attachments: cs_stack.xml
RouterStubManager.run() tries in a loop to reconnect all stubs currently not connected.
When for whatever reason it is not possible to connect one of this stubs, the method spins
in a endless loop and burns a CPU.
E.g. sometimes the VPN tunnel is down or one of the TCPGOSSIP hosts is down.
No idea if it is really required to loop here, but at least it should do some some
Thread.yield() or or sleep() here. As this run() method is called periodically it should
not be required to do a endless loop here, should it? Maybe only loop e.g. three times and
then give up?
As the all nodes in the cluster are iMac workstations or special render Linux slaves,
burning a CPU is very annoying. The CPU should rather be spend on the Blender render jobs
or for the interactive work the people are doing on their iMacs. (JGroups is used here to
distribute render jobs within the cluster)