[JBoss JIRA] (WFLY-7322) LDAP referrals does not work in Elytron ldap-realm
by Jan Kalina (JIRA)
[ https://issues.jboss.org/browse/WFLY-7322?page=com.atlassian.jira.plugin.... ]
Jan Kalina reopened WFLY-7322:
------------------------------
> LDAP referrals does not work in Elytron ldap-realm
> --------------------------------------------------
>
> Key: WFLY-7322
> URL: https://issues.jboss.org/browse/WFLY-7322
> Project: WildFly
> Issue Type: Bug
> Components: Security
> Reporter: Ondrej Lukas
> Assignee: Jan Kalina
> Priority: Blocker
> Fix For: 11.0.0.Alpha1
>
>
> LDAP referrals cannot be used in Elytron {{ldap-realm}}. Ldap Realm is currently not prepared to work with referrals at all:
> * {{ldap-realm}} does not include any options which enable working with LDAP referrals (PicketBox use {{baseFilter}} option which can be configured to return also referral object)
> * implementation of {{org.wildfly.security.auth.realm.ldap.LdapSecurityRealm}} does not include any logic which handles referrals
> Referrals are important feature of LDAP. It has to be covered by Elytron => requested blocker flag.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years, 5 months
[JBoss JIRA] (WFLY-7693) Specify detailed HttpServerAuthenticationMechanismFactory interface contract
by Martin Choma (JIRA)
[ https://issues.jboss.org/browse/WFLY-7693?page=com.atlassian.jira.plugin.... ]
Martin Choma moved JBEAP-7554 to WFLY-7693:
-------------------------------------------
Project: WildFly (was: JBoss Enterprise Application Platform)
Key: WFLY-7693 (was: JBEAP-7554)
Workflow: GIT Pull Request workflow (was: CDW with loose statuses v1)
Component/s: Security
(was: Security)
> Specify detailed HttpServerAuthenticationMechanismFactory interface contract
> ----------------------------------------------------------------------------
>
> Key: WFLY-7693
> URL: https://issues.jboss.org/browse/WFLY-7693
> Project: WildFly
> Issue Type: Bug
> Components: Security
> Reporter: Martin Choma
> Priority: Critical
>
> Please specify detailed contract of HttpServerAuthenticationMechanismFactory.
> Describe which params are allowed to be null and what happens in that case. Also describe if null return values are allowed from interface methods and when does that could happen.
> You can consider {{javax.security.sasl.SaslServerFactory}} as example of detailed contract.
> For example:
> * Is {{properties}} parameter of {{getMechanismNames()}} allowed to be null?
> * is {{getMechanismNames()}} allowed to return null ?
> * Are any of {{createAuthenticationMechanism()}} parameters allowed to be null?
> ** For {{ServerMechanismFactoryImpl}} implementation {{properties}} could not be null - is it general rule?
> {code}
> java.lang.IllegalArgumentException: Parameter 'properties' may not be null
> at org.wildfly.common.Assert.checkNotNullParamChecked(Assert.java:69)
> at org.wildfly.common.Assert.checkNotNullParam(Assert.java:47)
> at org.wildfly.security.http.impl.ServerMechanismFactoryImpl.createAuthenticationMechanism(ServerMechanismFactoryImpl.java:79)
> {code}
> ** For {{ServerMechanismFactoryImpl}} implementation {{callbackHandler}} could not be null - is it general rule?
> {code}
> java.lang.IllegalArgumentException: Parameter 'callbackHandler' may not be null
> at org.wildfly.common.Assert.checkNotNullParamChecked(Assert.java:69)
> at org.wildfly.common.Assert.checkNotNullParam(Assert.java:47)
> at org.wildfly.security.http.impl.ServerMechanismFactoryImpl.createAuthenticationMechanism(ServerMechanismFactoryImpl.java:80)
> {code}
> ** For {{ServerMechanismFactoryImpl}} implementation {{mechanismName}} could not be null - is it general rule?
> {code}
> java.lang.IllegalArgumentException: Parameter 'mechanismName' may not be null
> at org.wildfly.common.Assert.checkNotNullParamChecked(Assert.java:69)
> at org.wildfly.common.Assert.checkNotNullParam(Assert.java:47)
> at org.wildfly.security.http.impl.ServerMechanismFactoryImpl.createAuthenticationMechanism(ServerMechanismFactoryImpl.java:78)
> {code}
> I would suggest to wrap {{java.lang.IllegalArgumentException}} to HttpAuthenticationException. Otherwise possibility of {{IllegalArgumentException}} should be documented in contract.
> * Is {{createAuthenticationMechanism()}} allowed to return null?
> Filing as Critical, as this interface is expected to be implemented by custom factories.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years, 5 months
[JBoss JIRA] (WFCORE-1344) Adding a module on remote host doesn't work as expected
by Daniel Kreuter (JIRA)
[ https://issues.jboss.org/browse/WFCORE-1344?page=com.atlassian.jira.plugi... ]
Daniel Kreuter closed WFCORE-1344.
----------------------------------
Resolution: Won't Fix
Works for me.
> Adding a module on remote host doesn't work as expected
> -------------------------------------------------------
>
> Key: WFCORE-1344
> URL: https://issues.jboss.org/browse/WFCORE-1344
> Project: WildFly Core
> Issue Type: Feature Request
> Components: CLI
> Reporter: Daniel Kreuter
> Assignee: Alexey Loubyansky
>
> Hi,
> I connected to a remote WildFly instance via jboss-cli.bat (from Windows to Linux) and tried to install the MySQL module via
> {code}
> module add --name com.mysql --resources=C:\mysql....jar --dependencies=javax.api,javax.transaction.api
> {code}
> What I expected was, that the module would be installed on the remote host. What actually happend was: The CLI installed the module on my local WildFly instance instead.
> Regards
> Daniel
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years, 5 months
[JBoss JIRA] (JGRP-2137) JGroups: one slow/stuck node slows/freezes entire cluster
by Bharad S (JIRA)
[ https://issues.jboss.org/browse/JGRP-2137?page=com.atlassian.jira.plugin.... ]
Bharad S commented on JGRP-2137:
--------------------------------
Thanks Bela, this is extremely useful. Appreciate the quick response!
> JGroups: one slow/stuck node slows/freezes entire cluster
> ---------------------------------------------------------
>
> Key: JGRP-2137
> URL: https://issues.jboss.org/browse/JGRP-2137
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.4
> Environment: Multi node cluster. Uses TUNNEL mode with GossipRouter, TCP.
> Reporter: Bharad S
> Assignee: Bela Ban
> Attachments: replication-server.xml
>
>
> We have a multi node cluster with one node (say Node A) running the gossip router. We use TUNNEL mode, i.e., other nodes in cluster can talk to each other only via Node A. If one of the nodes in the cluster (say Node B) is slow in reading or gets stuck while reading from the channel, it affects the entire cluster. Inter node gossip also gets stuck and the nodes fall out of cluster.
> Thread dump on Node A indicate that 'ConnectionHandler' for node B stuck (at SocketOutputStream.socketWrite) while it is holding on to a lock, thus blocking ConnectionHandlers for all other nodes.
> --snip (from thread dump on Node A) --
> "gossip-handlers-129" #1088 daemon prio=5 os_prio=0 tid=0x00007f65d20ce800 nid=0x2353 runnable [0x00007f6557efd000]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
> at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
> at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:857)
> at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:828)
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
> - locked <0x00000005f2445028> (a sun.security.ssl.AppOutputStream)
> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> - locked <0x00000005f248a210> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at org.jgroups.stack.GossipRouter.sendToMember(GossipRouter.java:607)
> - locked <0x00000005f248a1f0> (a java.io.DataOutputStream)
> at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:590)
> - locked <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
> at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
> at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
> at java.lang.Thread.run(Thread.java:745)
> --snip end--
> Other gossip-handler threads (meant for other nodes in the cluster) on Node A wait for acquiring lock on the ConnectionHandler map at following place: GossipRouter.java, method: sendToAllMembersInGroup
> --snip--
> "gossip-handlers-128"
> #1078 daemon prio=5 os_prio=0 tid=0x00007f65d20ce000 nid=0x2343 waiting
> for monitor entry [0x00007f654c258000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:583)
> - waiting to lock <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
> at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
> at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
> at java.lang.Thread.run(Thread.java:745)
> "gossip-handlers-127"
> #1073 daemon prio=5 os_prio=0 tid=0x00007f65d01a6800 nid=0x233c waiting
> for monitor entry [0x00007f6697afb000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:583)
> - waiting to lock <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
> at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
> at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
> at java.lang.Thread.run(Thread.java:745)
> --snip end--
> If node B were to go down, JGroups quickly takes it out of the cluster and
> there is no problem. But if it stays in the cluster and is slow/stuck, is
> there a way to avoid rest of the cluster getting affected? We'd
> appreciate any help/pointers. Thanks.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years, 5 months
[JBoss JIRA] (WFCORE-2046) KeyManager synchronization issue when using IBM JDK
by Josef Cacek (JIRA)
[ https://issues.jboss.org/browse/WFCORE-2046?page=com.atlassian.jira.plugi... ]
Josef Cacek commented on WFCORE-2046:
-------------------------------------
I've added a simple reproducer (JUnit test).
> KeyManager synchronization issue when using IBM JDK
> ---------------------------------------------------
>
> Key: WFCORE-2046
> URL: https://issues.jboss.org/browse/WFCORE-2046
> Project: WildFly Core
> Issue Type: Bug
> Components: Security
> Reporter: Josef Cacek
> Assignee: Darran Lofthouse
> Priority: Blocker
> Attachments: test-app-ibm-jdk-keymanager-sync.zip
>
>
> We hit a {{KeyManagerFactory}} related synchronization issue in {{org.jboss.as.domain.management.security.AbstractKeyManagerService.createKeyManagers(boolean)}} method on IBM JDK. The issue occurs if there are more security realms with SSL identities in EAP and they have keystores with different passwords.
> As the ApplicationRealm (in EAP 7.1) has preconfigured ssl identity configuration, the risk customers will hit this when they add their own security realm with a ssl identity is big. The frequency we hit this issue is more than 10% cases on our machines.
> Our debugging suggests the problem is located in IBM JDK implementation of {{javax.net.ssl.KeyManagerFactorySpi}} (class {{com.ibm.jsse2.ae$a}}).
> The workflow:
> # user calls {{keyManagerFactory.init(keyStore, keystorePassword)}} which invokes {{com.ibm.jsse2.ae$a.engineInit(Keystore keyStore, char[] password)}}
> # the password (from the second method parameter) is stored into static field {{com.ibm.jsse2.ae.d}} and in the next step the field is used as parameter for creating new object {{new com.ibm.jsse2.aw(keyStore, d)}}
> # the previous step is not synchronized and when more threads call {{keyManagerFactory.init()}} with different passwords, wrong password may be used for retrieving a key from keystore.
> *Possible workaround*
> We could workaround this issue on EAP side (until it's fixed in the JDK) by synchronizing {{keyManagerFactory.init()}} call in {{AbstractKeyManagerService.createKeyManagers(boolean)}} when IBM JDK is used.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years, 5 months
[JBoss JIRA] (WFCORE-2046) KeyManager synchronization issue when using IBM JDK
by Josef Cacek (JIRA)
[ https://issues.jboss.org/browse/WFCORE-2046?page=com.atlassian.jira.plugi... ]
Josef Cacek updated WFCORE-2046:
--------------------------------
Attachment: test-app-ibm-jdk-keymanager-sync.zip
> KeyManager synchronization issue when using IBM JDK
> ---------------------------------------------------
>
> Key: WFCORE-2046
> URL: https://issues.jboss.org/browse/WFCORE-2046
> Project: WildFly Core
> Issue Type: Bug
> Components: Security
> Reporter: Josef Cacek
> Assignee: Darran Lofthouse
> Priority: Blocker
> Attachments: test-app-ibm-jdk-keymanager-sync.zip
>
>
> We hit a {{KeyManagerFactory}} related synchronization issue in {{org.jboss.as.domain.management.security.AbstractKeyManagerService.createKeyManagers(boolean)}} method on IBM JDK. The issue occurs if there are more security realms with SSL identities in EAP and they have keystores with different passwords.
> As the ApplicationRealm (in EAP 7.1) has preconfigured ssl identity configuration, the risk customers will hit this when they add their own security realm with a ssl identity is big. The frequency we hit this issue is more than 10% cases on our machines.
> Our debugging suggests the problem is located in IBM JDK implementation of {{javax.net.ssl.KeyManagerFactorySpi}} (class {{com.ibm.jsse2.ae$a}}).
> The workflow:
> # user calls {{keyManagerFactory.init(keyStore, keystorePassword)}} which invokes {{com.ibm.jsse2.ae$a.engineInit(Keystore keyStore, char[] password)}}
> # the password (from the second method parameter) is stored into static field {{com.ibm.jsse2.ae.d}} and in the next step the field is used as parameter for creating new object {{new com.ibm.jsse2.aw(keyStore, d)}}
> # the previous step is not synchronized and when more threads call {{keyManagerFactory.init()}} with different passwords, wrong password may be used for retrieving a key from keystore.
> *Possible workaround*
> We could workaround this issue on EAP side (until it's fixed in the JDK) by synchronizing {{keyManagerFactory.init()}} call in {{AbstractKeyManagerService.createKeyManagers(boolean)}} when IBM JDK is used.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years, 5 months
[JBoss JIRA] (JGRP-2137) JGroups: one slow/stuck node slows/freezes entire cluster
by Bharad S (JIRA)
[ https://issues.jboss.org/browse/JGRP-2137?page=com.atlassian.jira.plugin.... ]
Bharad S updated JGRP-2137:
---------------------------
Description:
We have a multi node cluster with one node (say Node A) running the gossip router. We use TUNNEL mode, i.e., other nodes in cluster can talk to each other only via Node A. If one of the nodes in the cluster (say Node B) is slow in reading or gets stuck while reading from the channel, it affects the entire cluster. Inter node gossip also gets stuck and the nodes fall out of cluster.
Thread dump on Node A indicate that 'ConnectionHandler' for node B stuck (at SocketOutputStream.socketWrite) while it is holding on to a lock, thus blocking ConnectionHandlers for all other nodes.
--snip (from thread dump on Node A) --
"gossip-handlers-129" #1088 daemon prio=5 os_prio=0 tid=0x00007f65d20ce800 nid=0x2353 runnable [0x00007f6557efd000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:857)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:828)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
- locked <0x00000005f2445028> (a sun.security.ssl.AppOutputStream)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000005f248a210> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.jgroups.stack.GossipRouter.sendToMember(GossipRouter.java:607)
- locked <0x00000005f248a1f0> (a java.io.DataOutputStream)
at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:590)
- locked <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
at java.lang.Thread.run(Thread.java:745)
--snip end--
Other gossip-handler threads (meant for other nodes in the cluster) on Node A wait for acquiring lock on the ConnectionHandler map at following place: GossipRouter.java, method: sendToAllMembersInGroup
--snip--
"gossip-handlers-128"
#1078 daemon prio=5 os_prio=0 tid=0x00007f65d20ce000 nid=0x2343 waiting
for monitor entry [0x00007f654c258000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:583)
- waiting to lock <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
at java.lang.Thread.run(Thread.java:745)
"gossip-handlers-127"
#1073 daemon prio=5 os_prio=0 tid=0x00007f65d01a6800 nid=0x233c waiting
for monitor entry [0x00007f6697afb000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:583)
- waiting to lock <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
at java.lang.Thread.run(Thread.java:745)
--snip end--
If node B were to go down, JGroups quickly takes it out of the cluster and
there is no problem. But if it stays in the cluster and is slow/stuck, is
there a way to avoid rest of the cluster getting affected? We'd
appreciate any help/pointers. Thanks.
was:
We have a multi node cluster with one node (say Node A) running the gossip router. We use TUNNEL mode, i.e., other nodes in cluster can talk to each other only via Node A. If one of the nodes in the cluster (say Node B) is slow in reading or gets stuck while reading from the channel, it affects the entire cluster. Inter node gossip also gets stuck and the nodes fall out of cluster.
Thread dump on Node A indicate that 'ConnectionHandler' for node B stuck (at SocketOutputStream.socketWrite) while it is holding on to a lock, thus blocking ConnectionHandlers for all other nodes.
--snip (from thread dump on Node A) --
"gossip-handlers-129" #1088 daemon prio=5 os_prio=0 tid=0x00007f65d20ce800 nid=0x2353 runnable [0x00007f6557efd000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:857)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:828)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
- locked <0x00000005f2445028> (a sun.security.ssl.AppOutputStream)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000005f248a210> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.jgroups.stack.GossipRouter.sendToMember(GossipRouter.java:607)
- locked <0x00000005f248a1f0> (a java.io.DataOutputStream)
at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:590)
- locked <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
at java.lang.Thread.run(Thread.java:745)
--snip end--
Other gossip-handler threads (meant for other nodes in the cluster) on Node A wait for acquiring lock on the ConnectionHandler map at following place: GossipRouter.java, method: sendToAllMembersInGroup
--snip--
"gossip-handlers-128"
#1078 daemon prio=5 os_prio=0 tid=0x00007f65d20ce000 nid=0x2343 waiting
for monitor entry [0x00007f654c258000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:583)
- waiting to lock <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
at java.lang.Thread.run(Thread.java:745)
"gossip-handlers-127"
#1073 daemon prio=5 os_prio=0 tid=0x00007f65d01a6800 nid=0x233c waiting
for monitor entry [0x00007f6697afb000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:583)
- waiting to lock <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
at java.lang.Thread.run(Thread.java:745)
--snip end--
If node B were to go down, it is quickly taken out of the cluster and
there is no problem. But if it stays in the cluster and is slow, is
there a way to avoid rest of the cluster getting affected? We'd
appreciate any help/pointers. Thanks.
> JGroups: one slow/stuck node slows/freezes entire cluster
> ---------------------------------------------------------
>
> Key: JGRP-2137
> URL: https://issues.jboss.org/browse/JGRP-2137
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.4
> Environment: Multi node cluster. Uses TUNNEL mode with GossipRouter, TCP.
> Reporter: Bharad S
> Assignee: Bela Ban
> Attachments: replication-server.xml
>
>
> We have a multi node cluster with one node (say Node A) running the gossip router. We use TUNNEL mode, i.e., other nodes in cluster can talk to each other only via Node A. If one of the nodes in the cluster (say Node B) is slow in reading or gets stuck while reading from the channel, it affects the entire cluster. Inter node gossip also gets stuck and the nodes fall out of cluster.
> Thread dump on Node A indicate that 'ConnectionHandler' for node B stuck (at SocketOutputStream.socketWrite) while it is holding on to a lock, thus blocking ConnectionHandlers for all other nodes.
> --snip (from thread dump on Node A) --
> "gossip-handlers-129" #1088 daemon prio=5 os_prio=0 tid=0x00007f65d20ce800 nid=0x2353 runnable [0x00007f6557efd000]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
> at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
> at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:857)
> at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:828)
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
> - locked <0x00000005f2445028> (a sun.security.ssl.AppOutputStream)
> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> - locked <0x00000005f248a210> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at org.jgroups.stack.GossipRouter.sendToMember(GossipRouter.java:607)
> - locked <0x00000005f248a1f0> (a java.io.DataOutputStream)
> at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:590)
> - locked <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
> at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
> at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
> at java.lang.Thread.run(Thread.java:745)
> --snip end--
> Other gossip-handler threads (meant for other nodes in the cluster) on Node A wait for acquiring lock on the ConnectionHandler map at following place: GossipRouter.java, method: sendToAllMembersInGroup
> --snip--
> "gossip-handlers-128"
> #1078 daemon prio=5 os_prio=0 tid=0x00007f65d20ce000 nid=0x2343 waiting
> for monitor entry [0x00007f654c258000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:583)
> - waiting to lock <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
> at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
> at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
> at java.lang.Thread.run(Thread.java:745)
> "gossip-handlers-127"
> #1073 daemon prio=5 os_prio=0 tid=0x00007f65d01a6800 nid=0x233c waiting
> for monitor entry [0x00007f6697afb000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:583)
> - waiting to lock <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
> at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
> at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
> at java.lang.Thread.run(Thread.java:745)
> --snip end--
> If node B were to go down, JGroups quickly takes it out of the cluster and
> there is no problem. But if it stays in the cluster and is slow/stuck, is
> there a way to avoid rest of the cluster getting affected? We'd
> appreciate any help/pointers. Thanks.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years, 5 months
[JBoss JIRA] (JGRP-2137) JGroups: one slow/stuck node slows/freezes entire cluster
by Bharad S (JIRA)
[ https://issues.jboss.org/browse/JGRP-2137?page=com.atlassian.jira.plugin.... ]
Bharad S updated JGRP-2137:
---------------------------
Attachment: replication-server.xml
> JGroups: one slow/stuck node slows/freezes entire cluster
> ---------------------------------------------------------
>
> Key: JGRP-2137
> URL: https://issues.jboss.org/browse/JGRP-2137
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.4
> Environment: Multi node cluster. Uses TUNNEL mode with GossipRouter, TCP.
> Reporter: Bharad S
> Assignee: Bela Ban
> Attachments: replication-server.xml
>
>
> We have a multi node cluster with one node (say Node A) running the gossip router. We use TUNNEL mode, i.e., other nodes in cluster can talk to each other only via Node A. If one of the nodes in the cluster (say Node B) is slow in reading or gets stuck while reading from the channel, it affects the entire cluster. Inter node gossip also gets stuck and the nodes fall out of cluster.
> Thread dump on Node A indicate that 'ConnectionHandler' for node B stuck (at SocketOutputStream.socketWrite) while it is holding on to a lock, thus blocking ConnectionHandlers for all other nodes.
> --snip (from thread dump on Node A) --
> "gossip-handlers-129" #1088 daemon prio=5 os_prio=0 tid=0x00007f65d20ce800 nid=0x2353 runnable [0x00007f6557efd000]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
> at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
> at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:857)
> at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:828)
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
> - locked <0x00000005f2445028> (a sun.security.ssl.AppOutputStream)
> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> - locked <0x00000005f248a210> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at org.jgroups.stack.GossipRouter.sendToMember(GossipRouter.java:607)
> - locked <0x00000005f248a1f0> (a java.io.DataOutputStream)
> at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:590)
> - locked <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
> at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
> at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
> at java.lang.Thread.run(Thread.java:745)
> --snip end--
> Other gossip-handler threads (meant for other nodes in the cluster) on Node A wait for acquiring lock on the ConnectionHandler map at following place: GossipRouter.java, method: sendToAllMembersInGroup
> --snip--
> "gossip-handlers-128"
> #1078 daemon prio=5 os_prio=0 tid=0x00007f65d20ce000 nid=0x2343 waiting
> for monitor entry [0x00007f654c258000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:583)
> - waiting to lock <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
> at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
> at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
> at java.lang.Thread.run(Thread.java:745)
> "gossip-handlers-127"
> #1073 daemon prio=5 os_prio=0 tid=0x00007f65d01a6800 nid=0x233c waiting
> for monitor entry [0x00007f6697afb000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:583)
> - waiting to lock <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
> at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
> at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
> at java.lang.Thread.run(Thread.java:745)
> --snip end--
> If node B were to go down, it is quickly taken out of the cluster and
> there is no problem. But if it stays in the cluster and is slow, is
> there a way to avoid rest of the cluster getting affected? We'd
> appreciate any help/pointers. Thanks.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years, 5 months
[JBoss JIRA] (JGRP-2137) JGroups: one slow/stuck node slows/freezes entire cluster
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2137?page=com.atlassian.jira.plugin.... ]
Bela Ban resolved JGRP-2137.
----------------------------
Resolution: Won't Fix
Use 3.6.5 or later, which provides non-blocking NIO for GossipRouter.
> JGroups: one slow/stuck node slows/freezes entire cluster
> ---------------------------------------------------------
>
> Key: JGRP-2137
> URL: https://issues.jboss.org/browse/JGRP-2137
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.4
> Environment: Multi node cluster. Uses TUNNEL mode with GossipRouter, TCP.
> Reporter: Bharad S
> Assignee: Bela Ban
>
> We have a multi node cluster with one node (say Node A) running the gossip router. We use TUNNEL mode, i.e., other nodes in cluster can talk to each other only via Node A. If one of the nodes in the cluster (say Node B) is slow in reading or gets stuck while reading from the channel, it affects the entire cluster. Inter node gossip also gets stuck and the nodes fall out of cluster.
> Thread dump on Node A indicate that 'ConnectionHandler' for node B stuck (at SocketOutputStream.socketWrite) while it is holding on to a lock, thus blocking ConnectionHandlers for all other nodes.
> --snip (from thread dump on Node A) --
> "gossip-handlers-129" #1088 daemon prio=5 os_prio=0 tid=0x00007f65d20ce800 nid=0x2353 runnable [0x00007f6557efd000]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
> at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
> at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:857)
> at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:828)
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
> - locked <0x00000005f2445028> (a sun.security.ssl.AppOutputStream)
> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> - locked <0x00000005f248a210> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at org.jgroups.stack.GossipRouter.sendToMember(GossipRouter.java:607)
> - locked <0x00000005f248a1f0> (a java.io.DataOutputStream)
> at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:590)
> - locked <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
> at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
> at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
> at java.lang.Thread.run(Thread.java:745)
> --snip end--
> Other gossip-handler threads (meant for other nodes in the cluster) on Node A wait for acquiring lock on the ConnectionHandler map at following place: GossipRouter.java, method: sendToAllMembersInGroup
> --snip--
> "gossip-handlers-128"
> #1078 daemon prio=5 os_prio=0 tid=0x00007f65d20ce000 nid=0x2343 waiting
> for monitor entry [0x00007f654c258000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:583)
> - waiting to lock <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
> at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
> at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
> at java.lang.Thread.run(Thread.java:745)
> "gossip-handlers-127"
> #1073 daemon prio=5 os_prio=0 tid=0x00007f65d01a6800 nid=0x233c waiting
> for monitor entry [0x00007f6697afb000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at org.jgroups.stack.GossipRouter.sendToAllMembersInGroup(GossipRouter.java:583)
> - waiting to lock <0x00000005d4aa1458> (a java.util.concurrent.ConcurrentHashMap)
> at org.jgroups.stack.GossipRouter.route(GossipRouter.java:487)
> at org.jgroups.stack.GossipRouter.access$800(GossipRouter.java:63)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.readLoop(GossipRouter.java:753)
> at org.jgroups.stack.GossipRouter$ConnectionHandler.run(GossipRouter.java:706)
> at java.lang.Thread.run(Thread.java:745)
> --snip end--
> If node B were to go down, it is quickly taken out of the cluster and
> there is no problem. But if it stays in the cluster and is slow, is
> there a way to avoid rest of the cluster getting affected? We'd
> appreciate any help/pointers. Thanks.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
9 years, 5 months