[jboss-jira] [JBoss JIRA] (WFLY-10047) OOM caused by jgroups objects UNICAST3$SenderEntry#1

Radoslav Husar (JIRA) issues at jboss.org
Mon Mar 19 07:54:00 EDT 2018


    [ https://issues.jboss.org/browse/WFLY-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547743#comment-13547743 ] 

Radoslav Husar commented on WFLY-10047:
---------------------------------------

The analysis is very likely incorrect -- the SenderEntry are entries kept for retransmit -- none of this is indicative of a leak (note that this size can be adjusted with xmit_table_msgs_per_row and xmit_table_num_rows).

Nevertheless, the issue is why there are so many retransmits. While this might be a bug, these are typically host misconfiguration -- can you check for dropped packgets? Also, clearly, the test machines are not correctly configured as seen in the logs.

{noformat}13:29:53,746 WARN  [org.jgroups.protocols.UDP] (ServerService Thread Pool -- 75) JGRP000015: the receive buffer of socket MulticastSocket was set to 20.00MB, but the OS only allocated 212.99KB. This might lead to performance problems. Please set your max receive buffer in the OS correctly (e.g. net.core.rmem_max on Linux){noformat}

> OOM caused by jgroups objects UNICAST3$SenderEntry#1
> ----------------------------------------------------
>
>                 Key: WFLY-10047
>                 URL: https://issues.jboss.org/browse/WFLY-10047
>             Project: WildFly
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 13.0.0.Beta1
>            Reporter: Erich Duda
>            Assignee: Paul Ferraro
>            Priority: Blocker
>         Attachments: heapdump.png
>
>
> JGroups objects UNICAST3$SenderEntry#1 caused OOM on Wildfly server during the boot. See attached picture.  !heapdump.png|thumbnail! 
> *User impact:* If users use JGroups for clustering, the server may get OOM what can cause undefined behavior.
> The *blocker* priority was set, because this is regression against previous versions of Wildfly and the OOM is serious error which prevents server to work properly.
> The issue was hit in following scenario.
> # start two servers (nodes) in cluster with one queue
> # producer starts to send messages to queue to node-1
> # node-2 is killed and restarted during sending messages <---- *Here the test failed, when the node-2 was started after that it had been killed.*
> # start consumer on node-2 which reads messages from queue
> # servers are stopped
> The Wildfly was built from following source code:
> repo: https://github.com/jmesnil/wildfly
> branch: WFLY-9407_upgrade_artemis_2.5.0
> commit SHA: 06c878a313d3cad323889d017e60fd5533204d1a
> JGroups version: 4.0.10.Final



--
This message was sent by Atlassian JIRA
(v7.5.0#75005)


More information about the jboss-jira mailing list