[JBoss JIRA] (JGRP-1675) Threads stuck in FlowControl.decrementIfEnoughCredits
by Sebastiano Vigna (JIRA)
[ https://issues.jboss.org/browse/JGRP-1675?page=com.atlassian.jira.plugin.... ]
Sebastiano Vigna commented on JGRP-1675:
----------------------------------------
In case someone can give us a suggestion, this is the configuration:
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.3.xsd">
<UDP
mcast_port="${jgroups.udp.mcast_port:45588}"
tos="8"
ucast_recv_buf_size="5M"
ucast_send_buf_size="640K"
mcast_recv_buf_size="5M"
mcast_send_buf_size="640K"
loopback="true"
max_bundle_size="64K"
max_bundle_timeout="30"
ip_ttl="${jgroups.udp.ip_ttl:8}"
enable_diagnostics="true"
thread_naming_pattern="cl"
timer_type="new3"
timer.min_threads="4"
timer.max_threads="10"
timer.keep_alive_time="3000"
timer.queue_max_size="500"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="8"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="10000"
thread_pool.rejection_policy="discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="1"
oob_thread_pool.max_threads="8"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="discard"/>
<PING timeout="2000"
num_initial_members="20"/>
<MERGE3 max_interval="30000"
min_interval="10000"/>
<FD_SOCK/>
<FD_ALL timeout="120000"
interval="10000"
/>
<VERIFY_SUSPECT timeout="1500" />
<BARRIER />
<pbcast.NAKACK2 xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="30000"
max_msg_batch_size="500"
use_mcast_xmit="false"
discard_delivered_msgs="true"/>
<UNICAST3 xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="60000"
conn_expiry_timeout="0"
max_msg_batch_size="500"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="4M"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
view_bundling="true"/>
<UFC max_credits="2M"
min_threshold="0.4"/>
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="60K" />
<RSVP resend_interval="2000" timeout="10000"/>
</config>
> Threads stuck in FlowControl.decrementIfEnoughCredits
> -----------------------------------------------------
>
> Key: JGRP-1675
> URL: https://issues.jboss.org/browse/JGRP-1675
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.4
> Reporter: Radim Vansa
> Assignee: Bela Ban
> Fix For: 3.5
>
> Attachments: jgroups-udp-radim.xml, RemoteGetStressTest.java, UPerf2.java
>
>
> I have recently observed a repeated situation where many (or all) threads have been stuck waiting for credits in FlowControl protocol.
> The credit request was not handled on the other node as this is non-oob message and some (actually many of them - cause unknown) messages before the request have been lost - therefore the request was waiting for them to be re-sent.
> However, these have not been re-sent properly as the retransmission request was not received - all OOB threads were stuck in the FlowControl protocol as these handled some other request and tried to send a response - but the response could not be sent until FlowControl gets the credits.
> The probability of such situation could be lowered by tagging the credit request to be OOB - then it would be handled immediately. If the credit replenish message would then be processed in regular OOB pool, this could get already depleted by many requests, but setting up the internal thread pool would solve the problem.
> Other consideration would be to allow releasing thread from FlowControl (let it send the message even without credits) if it waits there for too long.
> h3. Workaround
> It appears that setting MFC and UFC max_credits to 10M or removing these protocols at all is a workaround for this issue.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 8 months
[JBoss JIRA] (JGRP-1675) Threads stuck in FlowControl.decrementIfEnoughCredits
by Sebastiano Vigna (JIRA)
[ https://issues.jboss.org/browse/JGRP-1675?page=com.atlassian.jira.plugin.... ]
Sebastiano Vigna commented on JGRP-1675:
----------------------------------------
This bug is still present in 3.6.2 Final. We got all threads stuck:
"ParsingThread-63" #166 prio=3 os_prio=0 tid=0x00007f8dfd225000 nid=0x38a4 in Object.wait() [0x00007f8e01d7e000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.jgroups.protocols.FlowControl$Credit.decrementIfEnoughCredits(FlowControl.java:582)
- locked <0x000000070704e0d8> (a org.jgroups.protocols.FlowControl$Credit)
at org.jgroups.protocols.UFC.handleDownMessage(UFC.java:126)
at org.jgroups.protocols.FlowControl.down(FlowControl.java:330)
at org.jgroups.protocols.FlowControl.down(FlowControl.java:353)
at org.jgroups.protocols.FRAG2.down(FRAG2.java:136)
at org.jgroups.protocols.RSVP.down(RSVP.java:153)
What is the official suggestion? Removing the UFC protocol?
> Threads stuck in FlowControl.decrementIfEnoughCredits
> -----------------------------------------------------
>
> Key: JGRP-1675
> URL: https://issues.jboss.org/browse/JGRP-1675
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.4
> Reporter: Radim Vansa
> Assignee: Bela Ban
> Fix For: 3.5
>
> Attachments: jgroups-udp-radim.xml, RemoteGetStressTest.java, UPerf2.java
>
>
> I have recently observed a repeated situation where many (or all) threads have been stuck waiting for credits in FlowControl protocol.
> The credit request was not handled on the other node as this is non-oob message and some (actually many of them - cause unknown) messages before the request have been lost - therefore the request was waiting for them to be re-sent.
> However, these have not been re-sent properly as the retransmission request was not received - all OOB threads were stuck in the FlowControl protocol as these handled some other request and tried to send a response - but the response could not be sent until FlowControl gets the credits.
> The probability of such situation could be lowered by tagging the credit request to be OOB - then it would be handled immediately. If the credit replenish message would then be processed in regular OOB pool, this could get already depleted by many requests, but setting up the internal thread pool would solve the problem.
> Other consideration would be to allow releasing thread from FlowControl (let it send the message even without credits) if it waits there for too long.
> h3. Workaround
> It appears that setting MFC and UFC max_credits to 10M or removing these protocols at all is a workaround for this issue.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 8 months
[JBoss JIRA] (WFLY-4435) Mixed domain tests fail on infinispan/jgroups subystems
by Tomaz Cerar (JIRA)
[ https://issues.jboss.org/browse/WFLY-4435?page=com.atlassian.jira.plugin.... ]
Tomaz Cerar commented on WFLY-4435:
-----------------------------------
leave this be for time beeing, mixed domain mode tests are to be completly changed in next few weeks.
also transformation to 7.1.x and 7.2.x is beeing droped.
> Mixed domain tests fail on infinispan/jgroups subystems
> -------------------------------------------------------
>
> Key: WFLY-4435
> URL: https://issues.jboss.org/browse/WFLY-4435
> Project: WildFly
> Issue Type: Bug
> Affects Versions: 9.0.0.Alpha1
> Reporter: Radoslav Husar
> Assignee: Radoslav Husar
>
> The same problem happens when attempting to use 2_0 version of these subsystems in current configuration.
> {noformat}
> [Host Controller] [0m[0m15:33:41,155 INFO [org.jboss.as.controller.management-deprecated] (Controller Boot Thread) WFLYCTL0028: Attribute enabled is deprecated, and it might be removed in future version![0m
> [Host Controller] [0m[0m15:33:41,161 INFO [org.jboss.as.controller.management-deprecated] (Controller Boot Thread) WFLYCTL0028: Attribute default-clustered-sfsb-cache is deprecated, and it might be removed in future version![0m
> [Host Controller] [0m[31m15:33:41,174 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("add") failed - address: ([[0m
> [Host Controller] [31m ("profile" => "full-ha"),[0m
> [Host Controller] [31m ("subsystem" => "jgroups"),[0m
> [Host Controller] [31m ("channel" => "server")[0m
> [Host Controller] [31m]) - failure description: "WFLYCTL0175: Resource [[0m
> [Host Controller] [31m (\"profile\" => \"full-ha\"),[0m
> [Host Controller] [31m (\"subsystem\" => \"jgroups\")[0m
> [Host Controller] [31m] does not exist; a resource at address [[0m
> [Host Controller] [31m (\"profile\" => \"full-ha\"),[0m
> [Host Controller] [31m (\"subsystem\" => \"jgroups\"),[0m
> [Host Controller] [31m (\"channel\" => \"server\")[0m
> [Host Controller] [31m] cannot be created until all ancestor resources have been added"[0m
> [Host Controller] [31m[0m[31m15:33:41,183 FATAL [org.jboss.as.host.controller] (Controller Boot Thread) WFLYHC0034: Host Controller boot has failed in an unrecoverable manner; exiting. See previous messages for details.[0m
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 8 months
[JBoss JIRA] (WFLY-4435) Mixed domain tests fail on infinispan/jgroups subystems
by Radoslav Husar (JIRA)
Radoslav Husar created WFLY-4435:
------------------------------------
Summary: Mixed domain tests fail on infinispan/jgroups subystems
Key: WFLY-4435
URL: https://issues.jboss.org/browse/WFLY-4435
Project: WildFly
Issue Type: Bug
Affects Versions: 9.0.0.Alpha1
Reporter: Radoslav Husar
Assignee: Jason Greene
The same problem happens when attempting to use 2_0 version of these subsystems in current configuration.
{noformat}
[Host Controller] [0m[0m15:33:41,155 INFO [org.jboss.as.controller.management-deprecated] (Controller Boot Thread) WFLYCTL0028: Attribute enabled is deprecated, and it might be removed in future version![0m
[Host Controller] [0m[0m15:33:41,161 INFO [org.jboss.as.controller.management-deprecated] (Controller Boot Thread) WFLYCTL0028: Attribute default-clustered-sfsb-cache is deprecated, and it might be removed in future version![0m
[Host Controller] [0m[31m15:33:41,174 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("add") failed - address: ([[0m
[Host Controller] [31m ("profile" => "full-ha"),[0m
[Host Controller] [31m ("subsystem" => "jgroups"),[0m
[Host Controller] [31m ("channel" => "server")[0m
[Host Controller] [31m]) - failure description: "WFLYCTL0175: Resource [[0m
[Host Controller] [31m (\"profile\" => \"full-ha\"),[0m
[Host Controller] [31m (\"subsystem\" => \"jgroups\")[0m
[Host Controller] [31m] does not exist; a resource at address [[0m
[Host Controller] [31m (\"profile\" => \"full-ha\"),[0m
[Host Controller] [31m (\"subsystem\" => \"jgroups\"),[0m
[Host Controller] [31m (\"channel\" => \"server\")[0m
[Host Controller] [31m] cannot be created until all ancestor resources have been added"[0m
[Host Controller] [31m[0m[31m15:33:41,183 FATAL [org.jboss.as.host.controller] (Controller Boot Thread) WFLYHC0034: Host Controller boot has failed in an unrecoverable manner; exiting. See previous messages for details.[0m
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 8 months
[JBoss JIRA] (WFLY-4435) Mixed domain tests fail on infinispan/jgroups subystems
by Radoslav Husar (JIRA)
[ https://issues.jboss.org/browse/WFLY-4435?page=com.atlassian.jira.plugin.... ]
Radoslav Husar reassigned WFLY-4435:
------------------------------------
Assignee: Radoslav Husar (was: Jason Greene)
> Mixed domain tests fail on infinispan/jgroups subystems
> -------------------------------------------------------
>
> Key: WFLY-4435
> URL: https://issues.jboss.org/browse/WFLY-4435
> Project: WildFly
> Issue Type: Bug
> Affects Versions: 9.0.0.Alpha1
> Reporter: Radoslav Husar
> Assignee: Radoslav Husar
>
> The same problem happens when attempting to use 2_0 version of these subsystems in current configuration.
> {noformat}
> [Host Controller] [0m[0m15:33:41,155 INFO [org.jboss.as.controller.management-deprecated] (Controller Boot Thread) WFLYCTL0028: Attribute enabled is deprecated, and it might be removed in future version![0m
> [Host Controller] [0m[0m15:33:41,161 INFO [org.jboss.as.controller.management-deprecated] (Controller Boot Thread) WFLYCTL0028: Attribute default-clustered-sfsb-cache is deprecated, and it might be removed in future version![0m
> [Host Controller] [0m[31m15:33:41,174 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("add") failed - address: ([[0m
> [Host Controller] [31m ("profile" => "full-ha"),[0m
> [Host Controller] [31m ("subsystem" => "jgroups"),[0m
> [Host Controller] [31m ("channel" => "server")[0m
> [Host Controller] [31m]) - failure description: "WFLYCTL0175: Resource [[0m
> [Host Controller] [31m (\"profile\" => \"full-ha\"),[0m
> [Host Controller] [31m (\"subsystem\" => \"jgroups\")[0m
> [Host Controller] [31m] does not exist; a resource at address [[0m
> [Host Controller] [31m (\"profile\" => \"full-ha\"),[0m
> [Host Controller] [31m (\"subsystem\" => \"jgroups\"),[0m
> [Host Controller] [31m (\"channel\" => \"server\")[0m
> [Host Controller] [31m] cannot be created until all ancestor resources have been added"[0m
> [Host Controller] [31m[0m[31m15:33:41,183 FATAL [org.jboss.as.host.controller] (Controller Boot Thread) WFLYHC0034: Host Controller boot has failed in an unrecoverable manner; exiting. See previous messages for details.[0m
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 8 months