[infinispan-issues] [JBoss JIRA] (ISPN-11373) XSite backup commands should be sent from a blocking thread
Pedro Ruivo (Jira)
issues at jboss.org
Mon Mar 2 12:34:23 EST 2020
[ https://issues.redhat.com/browse/ISPN-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pedro Ruivo updated ISPN-11373:
-------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> XSite backup commands should be sent from a blocking thread
> -----------------------------------------------------------
>
> Key: ISPN-11373
> URL: https://issues.redhat.com/browse/ISPN-11373
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 9.4.18.Final, 10.1.2.Final, 11.0.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 9.4.19.Final
>
>
> XSite backup commands usually need more processing on the receiving site than local cluster commands do on the receiving node, which means there's a much higher chance of {{channel.send(message)}} to block.
> {{UFC}}, {{UFC_NB}}, {{MFC}} and {{MFC_NB}} all block when there are not enough credits.
> The _NB variants have an additional queue as a safety net, but that only delays the blocking: it's the same as increasing {{max_credits}} by {{max_queue_size}}, except with less work for {{UNICAST3}}/{{NAKACK2}}.
> {{TCP}} and {{UDP}} also block if their send buffer is full. Using a bundler like {{transfer-queue}} instead of the default {{no-bundler}} will only delay the blocking until the bundler's queue is also full.
> The biggest problem is when xsite backup commands are sent from a jgroups thread, and {{channel.send(message)}} blocks the thread. If the jgroups thread pool becomes full, it cannot process more messages, not even responses from the remote site.
> JGroups creates temporary threads to process internal messages when its thread pool is full, but not even that can help when the other nodes' thread pools are also full:
> {noformat}
> "jgroups-temp-thread-5728,_ma267mlvjdg015:dal_mcom_perf" #11443 prio=5 os_prio=0 tid=0x000000000906f800 nid=0x26cb waiting on condition [0x00007fb0b7b0a000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000005f3bce048> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
> at org.jgroups.protocols.TransferQueueBundler.send(TransferQueueBundler.java:97)
> at org.jgroups.protocols.TP.send(TP.java:1441)
> at org.jgroups.protocols.TP._send(TP.java:1195)
> at org.jgroups.protocols.TP.down(TP.java:1111)
> ...
> at org.jgroups.protocols.FlowControl.sendCredit(FlowControl.java:480)
> at org.jgroups.protocols.FlowControl.handleCreditRequest(FlowControl.java:469)
> at org.jgroups.protocols.FlowControl.handleUpEvent(FlowControl.java:379)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:350)
> {noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
More information about the infinispan-issues
mailing list