[jboss-jira] [JBoss JIRA] (WFLY-10047) OOM caused by jgroups objects UNICAST3$SenderEntry#1
Erich Duda (JIRA)
issues at jboss.org
Mon Mar 19 08:52:00 EDT 2018
[ https://issues.jboss.org/browse/WFLY-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547775#comment-13547775 ]
Erich Duda edited comment on WFLY-10047 at 3/19/18 8:51 AM:
------------------------------------------------------------
In the log I can see several messages \[1\], before the OOM happend. I suspected that they are expected since the node-2 was killed.
Both servers are running on single node and communicate over localhost. Do you know what could cause that these warnings were logged?
\[1\]
{code}
13:31:33,794 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,node-2) JGRP000032: node-2: no physical address for bb5fc2e1-deb1-30b6-0f2d-90a2b5239c6c, dropping message
13:31:35,804 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,node-2) JGRP000032: node-2: no physical address for bb5fc2e1-deb1-30b6-0f2d-90a2b5239c6c, dropping message
13:31:37,804 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,node-2) JGRP000032: node-2: no physical address for bb5fc2e1-deb1-30b6-0f2d-90a2b5239c6c, dropping message
13:31:39,804 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,node-2) JGRP000032: node-2: no physical address for bb5fc2e1-deb1-30b6-0f2d-90a2b5239c6c, dropping message
13:31:41,804 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,node-2) JGRP000032: node-2: no physical address for bb5fc2e1-deb1-30b6-0f2d-90a2b5239c6c, dropping message
13:31:43,804 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,node-2) JGRP000032: node-2: no physical address for bb5fc2e1-deb1-30b6-0f2d-90a2b5239c6c, dropping message
{code}
bq. The analysis is very likely incorrect – the SenderEntry are entries kept for retransmit – none of this is indicative of a leak (note that this size can be adjusted with xmit_table_msgs_per_row and xmit_table_num_rows).
If the OOM was caused by "cache" for retransmits, shouldn't be the default size of the cache lowered to not to cause OOM?
was (Author: eduda):
In the log I can see several messages \[1\], before the OOM happend. I suspected that they are expected since the node-2 was killed.
Both servers are running on single node and communicate over localhost. Do you know what could cause that these warning were logged?
\[1\]
{code}
13:31:33,794 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,node-2) JGRP000032: node-2: no physical address for bb5fc2e1-deb1-30b6-0f2d-90a2b5239c6c, dropping message
13:31:35,804 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,node-2) JGRP000032: node-2: no physical address for bb5fc2e1-deb1-30b6-0f2d-90a2b5239c6c, dropping message
13:31:37,804 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,node-2) JGRP000032: node-2: no physical address for bb5fc2e1-deb1-30b6-0f2d-90a2b5239c6c, dropping message
13:31:39,804 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,node-2) JGRP000032: node-2: no physical address for bb5fc2e1-deb1-30b6-0f2d-90a2b5239c6c, dropping message
13:31:41,804 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,node-2) JGRP000032: node-2: no physical address for bb5fc2e1-deb1-30b6-0f2d-90a2b5239c6c, dropping message
13:31:43,804 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,node-2) JGRP000032: node-2: no physical address for bb5fc2e1-deb1-30b6-0f2d-90a2b5239c6c, dropping message
{code}
> OOM caused by jgroups objects UNICAST3$SenderEntry#1
> ----------------------------------------------------
>
> Key: WFLY-10047
> URL: https://issues.jboss.org/browse/WFLY-10047
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 13.0.0.Beta1
> Reporter: Erich Duda
> Assignee: Paul Ferraro
> Priority: Blocker
> Attachments: heapdump.png
>
>
> JGroups objects UNICAST3$SenderEntry#1 caused OOM on Wildfly server during the boot. See attached picture. !heapdump.png|thumbnail!
> *User impact:* If users use JGroups for clustering, the server may get OOM what can cause undefined behavior.
> The *blocker* priority was set, because this is regression against previous versions of Wildfly and the OOM is serious error which prevents server to work properly.
> The issue was hit in following scenario.
> # start two servers (nodes) in cluster with one queue
> # producer starts to send messages to queue to node-1
> # node-2 is killed and restarted during sending messages <---- *Here the test failed, when the node-2 was started after that it had been killed.*
> # start consumer on node-2 which reads messages from queue
> # servers are stopped
> The Wildfly was built from following source code:
> repo: https://github.com/jmesnil/wildfly
> branch: WFLY-9407_upgrade_artemis_2.5.0
> commit SHA: 06c878a313d3cad323889d017e60fd5533204d1a
> JGroups version: 4.0.10.Final
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
More information about the jboss-jira
mailing list