[
https://jira.jboss.org/jira/browse/JGRP-100?page=com.atlassian.jira.plugi...
]
Bela Ban commented on JGRP-100:
-------------------------------
Further points to look at:
GMS:
- Is the single coordinator a bottle neck ? When a coord hangs, it won't be able to
admit new nodes into the cluster, until it is suspected and excluded
- Use view_bundling
Failure detection:
- Replace FD with FD_ALL
TP, timers:
- Increase thread pools
NAKACK:
- Set use_mcast_xmit="true" if UDP is used. UDP is recommended anyway for large
clusters
- Use stats for xmit_timeouts ?
PBCAST:
- Revisit ?
SMACK:
- Use instead of NAKACK ? Doesn't require GMS
- Improve: don't ack every single message !
FC:
- Increase max_credits
FRAG2:
- Use instead of FRAG
- Get latest version (fix for
https://jira.jboss.org/jira/browse/JGRP-800)
Streaming API (?):
- For large values ?
Discovery:
- Use only servers, not clients (
https://jira.jboss.org/jira/browse/JGRP-735)
- Workaround: set num_responses to a large value, so we do get some server responses
Large-scale JGroups
-------------------
Key: JGRP-100
URL:
https://jira.jboss.org/jira/browse/JGRP-100
Project: JGroups
Issue Type: Feature Request
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 2.x
- Run JGroups on hundreds of nodes (either physical, or simulation).
- Determine a protocol stack that can be used for large-scale execution
- Example:
- Coordinator may be SPOF. If coord is hung, messages will be sent, but no new views
will
be generated
- Retransmission: retransmit from anyone (not sender, otherwise we have NAK implosion)
- Look at PBCAST
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira