[infinispan-issues] [JBoss JIRA] (ISPN-11407) XSite backup commands should be sent from a blocking thread
Pedro Ruivo (Jira)
issues at jboss.org
Mon Mar 2 12:34:22 EST 2020
Pedro Ruivo created ISPN-11407:
----------------------------------
Summary: XSite backup commands should be sent from a blocking thread
Key: ISPN-11407
URL: https://issues.redhat.com/browse/ISPN-11407
Project: Infinispan
Issue Type: Enhancement
Components: Core
Affects Versions: 9.4.18.Final, 10.1.2.Final, 11.0.0.Alpha1
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 9.4.19.Final
XSite backup commands usually need more processing on the receiving site than local cluster commands do on the receiving node, which means there's a much higher chance of {{channel.send(message)}} to block.
{{UFC}}, {{UFC_NB}}, {{MFC}} and {{MFC_NB}} all block when there are not enough credits.
The _NB variants have an additional queue as a safety net, but that only delays the blocking: it's the same as increasing {{max_credits}} by {{max_queue_size}}, except with less work for {{UNICAST3}}/{{NAKACK2}}.
{{TCP}} and {{UDP}} also block if their send buffer is full. Using a bundler like {{transfer-queue}} instead of the default {{no-bundler}} will only delay the blocking until the bundler's queue is also full.
The biggest problem is when xsite backup commands are sent from a jgroups thread, and {{channel.send(message)}} blocks the thread. If the jgroups thread pool becomes full, it cannot process more messages, not even responses from the remote site.
JGroups creates temporary threads to process internal messages when its thread pool is full, but not even that can help when the other nodes' thread pools are also full:
{noformat}
"jgroups-temp-thread-5728,_ma267mlvjdg015:dal_mcom_perf" #11443 prio=5 os_prio=0 tid=0x000000000906f800 nid=0x26cb waiting on condition [0x00007fb0b7b0a000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000005f3bce048> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
at org.jgroups.protocols.TransferQueueBundler.send(TransferQueueBundler.java:97)
at org.jgroups.protocols.TP.send(TP.java:1441)
at org.jgroups.protocols.TP._send(TP.java:1195)
at org.jgroups.protocols.TP.down(TP.java:1111)
...
at org.jgroups.protocols.FlowControl.sendCredit(FlowControl.java:480)
at org.jgroups.protocols.FlowControl.handleCreditRequest(FlowControl.java:469)
at org.jgroups.protocols.FlowControl.handleUpEvent(FlowControl.java:379)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:350)
{noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
More information about the infinispan-issues
mailing list