[JBoss JIRA] (JGRP-2361) Error related to Jgroup and Database connection is getting reset
by karthikeyan Aruljothi (Jira)
[ https://issues.jboss.org/browse/JGRP-2361?page=com.atlassian.jira.plugin.... ]
karthikeyan Aruljothi commented on JGRP-2361:
---------------------------------------------
I found this Jar added to our project : jgroups-3.6.11.Final.jar , so version 3.6.11
I attached jgroup-tcp.xml and configuration details for your reference.
[^Jgroup node configuration.txt]
[^jgroups-tcp.xml]
Also adding to it getting connect timedout exception as below from one of servers. we getting while starting of the servers.
08/01 01:58:30.300 | at org.jgroups.protocols.TransferQueueBundler.run(TransferQueueBundler.java:105) [jgroups-3.6.11.Final.jar:3.6.11.Final]
INFO | jvm 1 | main | 2019/08/01 01:58:30.300 | at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
INFO | jvm 1 | main | 2019/08/01 01:58:30.600 | ERROR [TransferQueueBundler,hybris-broadcast,hybrisnode-0] [TCP] JGRP000036: hybrisnode-0: exception sending bundled msgs: java.net.SocketTimeoutException: connect timed out
INFO | jvm 1 | main | 2019/08/01 01:58:30.600 | java.net.SocketTimeoutException: connect timed out
INFO | jvm 1 | main | 2019/08/01 01:58:30.600 | at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_211]
INFO | jvm 1 | main | 2019/08/01 01:58:30.600 | at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_211]
INFO | jvm 1 | main | 2019/08/01 01:58:30.600 | at java.net.
> Error related to Jgroup and Database connection is getting reset
> ----------------------------------------------------------------
>
> Key: JGRP-2361
> URL: https://issues.jboss.org/browse/JGRP-2361
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.11
> Environment: Hybris running on tomcat - Centos 7
> Reporter: karthikeyan Aruljothi
> Assignee: Bela Ban
> Priority: Major
> Attachments: Jgroup error in preprod-000.txt, Jgroup node configuration.txt, Jgroups blocking and terminating connection.txt, Jgroups error in console.txt, error Jgroups.txt, jgroups-tcp.xml
>
>
> Hi ,
> we are facing an issue with our cluster configuration and due to this JVM responding time also takes more time, after clearing the cache / restarting all nodes application works as expected.
> When issue arises one of the core occupies 100% cpu utilization then it confirms to restart the server otherwise it never process any request. Below is our configuration in local.properties. Also providing error logs as attachment. could see error in logs related to Jgroups blocking and connection getting terminated between nodes.
> Let us know your valuable inputs, on what exactly the issue i.e causing the slowness then blocking the whole server.
> Attached cluster configuration for each nodes and error logs
> Adding to this we are getting below error while doing deployment/restarting of servers
> WARN [localhost-startStop-1] [GMS] hybrisnode-0: JOIN(hybrisnode-0) sent to hybrisnode-2 timed out (after 3000 ms), on try 3
> WARN [pool-3-thread-1] [GMS] hybrisnode-3: JOIN(hybrisnode-3) sent to hybrisnode-1 timed out (after 3000 ms), on try 4
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 1 month
[JBoss JIRA] (JGRP-2361) Error related to Jgroup and Database connection is getting reset
by karthikeyan Aruljothi (Jira)
[ https://issues.jboss.org/browse/JGRP-2361?page=com.atlassian.jira.plugin.... ]
karthikeyan Aruljothi updated JGRP-2361:
----------------------------------------
Attachment: Jgroup node configuration.txt
> Error related to Jgroup and Database connection is getting reset
> ----------------------------------------------------------------
>
> Key: JGRP-2361
> URL: https://issues.jboss.org/browse/JGRP-2361
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.11
> Environment: Hybris running on tomcat - Centos 7
> Reporter: karthikeyan Aruljothi
> Assignee: Bela Ban
> Priority: Major
> Attachments: Jgroup error in preprod-000.txt, Jgroup node configuration.txt, Jgroups blocking and terminating connection.txt, Jgroups error in console.txt, error Jgroups.txt, jgroups-tcp.xml
>
>
> Hi ,
> we are facing an issue with our cluster configuration and due to this JVM responding time also takes more time, after clearing the cache / restarting all nodes application works as expected.
> When issue arises one of the core occupies 100% cpu utilization then it confirms to restart the server otherwise it never process any request. Below is our configuration in local.properties. Also providing error logs as attachment. could see error in logs related to Jgroups blocking and connection getting terminated between nodes.
> Let us know your valuable inputs, on what exactly the issue i.e causing the slowness then blocking the whole server.
> Attached cluster configuration for each nodes and error logs
> Adding to this we are getting below error while doing deployment/restarting of servers
> WARN [localhost-startStop-1] [GMS] hybrisnode-0: JOIN(hybrisnode-0) sent to hybrisnode-2 timed out (after 3000 ms), on try 3
> WARN [pool-3-thread-1] [GMS] hybrisnode-3: JOIN(hybrisnode-3) sent to hybrisnode-1 timed out (after 3000 ms), on try 4
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 1 month
[JBoss JIRA] (JGRP-2361) Error related to Jgroup and Database connection is getting reset
by karthikeyan Aruljothi (Jira)
[ https://issues.jboss.org/browse/JGRP-2361?page=com.atlassian.jira.plugin.... ]
karthikeyan Aruljothi updated JGRP-2361:
----------------------------------------
Attachment: jgroups-tcp.xml
> Error related to Jgroup and Database connection is getting reset
> ----------------------------------------------------------------
>
> Key: JGRP-2361
> URL: https://issues.jboss.org/browse/JGRP-2361
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.11
> Environment: Hybris running on tomcat - Centos 7
> Reporter: karthikeyan Aruljothi
> Assignee: Bela Ban
> Priority: Major
> Attachments: Jgroup error in preprod-000.txt, Jgroups blocking and terminating connection.txt, Jgroups error in console.txt, error Jgroups.txt, jgroups-tcp.xml
>
>
> Hi ,
> we are facing an issue with our cluster configuration and due to this JVM responding time also takes more time, after clearing the cache / restarting all nodes application works as expected.
> When issue arises one of the core occupies 100% cpu utilization then it confirms to restart the server otherwise it never process any request. Below is our configuration in local.properties. Also providing error logs as attachment. could see error in logs related to Jgroups blocking and connection getting terminated between nodes.
> Let us know your valuable inputs, on what exactly the issue i.e causing the slowness then blocking the whole server.
> Attached cluster configuration for each nodes and error logs
> Adding to this we are getting below error while doing deployment/restarting of servers
> WARN [localhost-startStop-1] [GMS] hybrisnode-0: JOIN(hybrisnode-0) sent to hybrisnode-2 timed out (after 3000 ms), on try 3
> WARN [pool-3-thread-1] [GMS] hybrisnode-3: JOIN(hybrisnode-3) sent to hybrisnode-1 timed out (after 3000 ms), on try 4
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 1 month
[JBoss JIRA] (JGRP-2234) Unlocked locks stay locked forever
by Bela Ban (Jira)
[ https://issues.jboss.org/browse/JGRP-2234?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2234:
--------------------------------
Nope. But since I do have quite a few lock-related issues in 4.1.2, I'll get to them after a jgroups-raft issue I need to fix first...
> Unlocked locks stay locked forever
> ----------------------------------
>
> Key: JGRP-2234
> URL: https://issues.jboss.org/browse/JGRP-2234
> Project: JGroups
> Issue Type: Bug
> Reporter: Bram Klein Gunnewiek
> Assignee: Bela Ban
> Priority: Major
> Fix For: 4.0.11, 3.6.18
>
> Attachments: ClusterSplitLockTest.java, jg_clusterlock_output_testfail.txt
>
>
> As discussed in the mailing list we have issues where locks from the central lock protocol stay locked forever when the coordinator of the cluster disconnects. We can reproduce this with the attached ClusterSplitLockTest.java. Its a race condition and we need to run the test a lot of times (sometimes > 20) before we encounter a failure.
> What we think is happening:
> In a three node cluster (node A, B and C where node A is the coordinator) unlock requests from B and/or C can be missed when node A leaves and B and/or C don't have the new view installed yet. When, for example, node B takes over coordination it creates the lock table based on the back-ups. Lets say node C has locked the lock with name 'lockX'. Node C performs an unlock of 'lockX' just after node A (gracefully) leaves and sends the unlock request to node A since node C doesn't have the correct view installed yet. Node B has recreated the lock table where 'lockX' is locked by Node C. Node C doesn't resend the unlock request so 'lockX' gets locked forever.
> Attached is the testng test we wrote and the output of a test failure.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 2 months