]
Dan Berindei resolved ISPN-4480.
--------------------------------
Fix Version/s: 7.0.0.Beta1
Resolution: Done
JGroups 3.5.0.Beta9 includes fixes for the 2 issues.
Messages sent to leavers can clog the JGroups bundler thread
------------------------------------------------------------
Key: ISPN-4480
URL:
https://issues.jboss.org/browse/ISPN-4480
Project: Infinispan
Issue Type: Bug
Security Level: Public(Everyone can see)
Components: Core
Affects Versions: 6.0.2.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 7.0.0.Beta1
In a stress test that repeatedly kills nodes while performing read/write operations, the
TransferQueueBundler thread seems to spend a lot of time waiting for physical addresses:
{noformat}
06:40:10,316 WARN [org.radargun.utils.Utils] (pool-5-thread-1) Stack for thread
TransferQueueBundler,default,apex953-14666:
java.lang.Thread.sleep(Native Method)
org.jgroups.util.Util.sleep(Util.java:1504)
org.jgroups.util.Util.sleepRandom(Util.java:1574)
org.jgroups.protocols.TP.sendToSingleMember(TP.java:1685)
org.jgroups.protocols.TP.doSend(TP.java:1670)
org.jgroups.protocols.TP$TransferQueueBundler.sendBundledMessages(TP.java:2476)
org.jgroups.protocols.TP$TransferQueueBundler.sendMessages(TP.java:2392)
org.jgroups.protocols.TP$TransferQueueBundler.run(TP.java:2383)
java.lang.Thread.run(Thread.java:744)
{noformat}
There are 2 bugs related to this already fixed in JGroups 3.5.0.Beta2+: JGRP-1814,
JGRP-1815
There is also a special case where the physical address could be removed from the cache
too soon, exacerbating the effect of JGRP-1815: JGRP-1858
We can work around the problem by changing the JGroups configuration:
* TP.logical_addr_cache_expiration=86400000
** Only expire addresses after 1 day
* TP.physical_addr_max_fetch_attempts=1
** Sleep for only 20ms waiting for the physical address (default 3 - 1500ms)
* UNICAST3_conn_close_timeout=10000
** Drop the pending messages to leavers sooner