[jboss-jira] [JBoss JIRA] (WFLY-6583) Session leak on SmartOS hosts

Wed May 4 12:47:00 EDT 2016

    [ https://issues.jboss.org/browse/WFLY-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201130#comment-13201130 ] 

Michael Noack edited comment on WFLY-6583 at 5/4/16 12:46 PM:
--------------------------------------------------------------

[~pferraro] I certainly understand why this could lead to wrong assumptions on a live system. However on my test system, where I control every single request made to the system and thus every session ever created, I would expect this management operation to return zero, if the system has been sitting without requests for longer than the configured session timeout.

And this is indeed what I can see on KVM virtualized CentOS instances!

The very same configuration will never again return to zero sessions on said SmartOS instances or on virtual CentOS instances using a BrandZ kernel. And I can also see the live system getting OOM every few days, if not restarted on a daily basis. So those session do indeed still hang around and take memory.

I thought that we might hit a bug on some of the replication libraries at first, which is why I took up tests again using wildfly-10.0.0.Final. But since I can reproduce the issue with an application as simple as this one:  https://github.com/liweinan/cluster-demo I no longer believe this. I've since been able to establish the link between the issue and infrastructure instances on Joyents plattform and can reproduce the issue reliably. KVM instances are free of said session leak. If you want, you can provide me with a public ssh-key and I can give you access to such a CentOS 7 with BrandZ kernel test instance to take a look for yourself.

I'm about to migrate our live system to said KVM instances because of this, despite those being twice as expensive, since this is the only work-around I've found to far. I'm not complaining here, and I'm well aware that this is very suspicous. The OS kernel shouldn't be able to affect the session handling of an application running on top of a JVM. I'm merely documenting this here for reference in case others might see similar issues.

was (Author: michael.noack):
[~pferraro] I certainly understand why this could lead to wrong assumptions on a live system. However on my test system, where I control every single request made to the system and thus every session ever created, I would expect this management operation to return zero, if the system has been sitting without requests for longer than the configured session timeout.

And this is indeed what I can see on KVM virtualized CentOS instances!

The very same configuration will never again return to zero sessions on said SmartOS instances or on virtual CentOS instances using a BrandZ kernel. And I can also see the live system getting OOM every few days, if not restarted on a daily basis. So those session do indeed still hang around and take memory.

I thought that we might hit a bug on some of the replication libraries at first, which is why I took up tests again using wildfly-10.0.0.Final. But since I can reproduce the issue with an application as simple as this one:  https://github.com/liweinan/cluster-demo I no longer believe this. I've since been able to establish the link between the issue and infrastructure instances on Joyents plattform and can reproduce the issue reliably. KVM instances are free of said session leak. If you want, you can provide me with a public ssh-key and I can give you access to such a BrandZ test instance to take a look for yourself.

I'm about to migrate our live system to said KVM instances because of this, despite those being twice as expensive, since this is the only work-around I've found to far. I'm not complaining here, and I'm well aware that this is very suspicous. The OS kernel shouldn't be able to affect the session handling of an application running on top of a JVM. I'm merely documenting this here for reference in case others might see similar issues.

> Session leak on SmartOS hosts
> -----------------------------
>
>                 Key: WFLY-6583
>                 URL: https://issues.jboss.org/browse/WFLY-6583
>             Project: WildFly
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 8.2.0.Final, 9.0.0.Final, 10.0.0.Final
>         Environment: CentOS 7 or SmartOS instance using Joyents Infrastructure/Bare metal container.
> [root at 979638eb-b45c-45b3-9fdb-d7f48276e4ef /]# java -version
> java version "1.8.0_77"
> Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
> [root at 979638eb-b45c-45b3-9fdb-d7f48276e4ef /]# uname -a
> Linux 979638eb-b45c-45b3-9fdb-d7f48276e4ef 3.10.0 BrandZ virtual linux x86_64 x86_64 x86_64 GNU/Linux
> [root at 979638eb-b45c-45b3-9fdb-d7f48276e4ef /]# cat /etc/issue
> \S
> Kernel \r on an \m
>            Reporter: Michael Noack
>            Assignee: Paul Ferraro
>            Priority: Minor
>
> When running Wildfly 8.2.0-Final, 9.0.0-Final or 10.0.0-Final in domain mode using the full-ha profile some sessions never get closed when running on SmartOS or a BrandZ kernel on SmartOS. The amount of unclosed sessions rises slowly. With 1 session per second and server created, roughly 30-50 sessions are left unclosed on each server. I've been keeping track of this issue for almost a year now and handled it by restarting the entire cluster at first. It took me a while to connect the dots here.
> When registering a HttpSessionListener and logging any sessionCreated(HttpSessionEvent se) and sessionDestroyed(HttpSessionEvent se) one can cleary see some sessions never generate the sessionDestroyed event.
> The problem disappears when running the very same setup on a KVM instance of CentOS 6 or 7 (regardless whether the KVM host is SmartOS or Linux).

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)