[JBoss JIRA] (JGRP-2245) JGroup JDBC_PING is not clearing the crashed members
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2245?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2245:
--------------------------------
Actually, I can now see it: https://search.maven.org/#artifactdetails%7Corg.jgroups%7Cjgroups%7C4.0.1...
> JGroup JDBC_PING is not clearing the crashed members
> ----------------------------------------------------
>
> Key: JGRP-2245
> URL: https://issues.jboss.org/browse/JGRP-2245
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.8
> Reporter: Sibin Karnavar
> Assignee: Bela Ban
> Fix For: 4.0.10
>
>
> 1) In AWS cloud environments, IP address will be different when a node crashes and when a new cluster node gets recreated.
> 2) In this situation, JGroup is not clearing logical_addr_cache and it gets confused, when we restart the cluster nodes.
> 3)logical_addr_cache_max_size and the eviction did not work because, the cache is again getting updated from the ping and it never getting marked as removable.
> I think the issue is
> handleView method is always re writing the entire cache on view change to the db. So even if we clear the table with the help of above mentioned flags (remove_all_data_on_view_change && remove_old_coords_on_view_change) , its getting re written to the table.
> {code:java}
> // remove all files which are not from the current members
> protected void handleView(View new_view, View old_view, boolean coord_changed) {
> if(is_coord) {
> if(coord_changed) {
> if(remove_all_data_on_view_change)
> removeAll(cluster_name);
> else if(remove_old_coords_on_view_change) {
> Address old_coord=old_view != null? old_view.getCreator() : null;
> if(old_coord != null)
> remove(cluster_name, old_coord);
> }
> }
> if(coord_changed || View.diff(old_view, new_view)[1].length > 0) {
> writeAll();
> if(remove_all_data_on_view_change || remove_old_coords_on_view_change)
> startInfoWriter();
> }
> }
> else if(coord_changed) // I'm no longer the coordinator
> remove(cluster_name, local_addr);
> }
> {code}
> 4) Because of the crashed members (non existing ip address), we are getting lot of socket timeouts
> sendToMembers of TP is trying to send messages to old crashed members and writing error logs while startup.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 2 months
[JBoss JIRA] (JGRP-2245) JGroup JDBC_PING is not clearing the crashed members
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2245?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2245:
--------------------------------
It's here [1], but it seems it hasn't yet synced with maven central.
[1] https://repository.jboss.org/nexus/content/groups/public/org/jgroups/jgro...
> JGroup JDBC_PING is not clearing the crashed members
> ----------------------------------------------------
>
> Key: JGRP-2245
> URL: https://issues.jboss.org/browse/JGRP-2245
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.8
> Reporter: Sibin Karnavar
> Assignee: Bela Ban
> Fix For: 4.0.10
>
>
> 1) In AWS cloud environments, IP address will be different when a node crashes and when a new cluster node gets recreated.
> 2) In this situation, JGroup is not clearing logical_addr_cache and it gets confused, when we restart the cluster nodes.
> 3)logical_addr_cache_max_size and the eviction did not work because, the cache is again getting updated from the ping and it never getting marked as removable.
> I think the issue is
> handleView method is always re writing the entire cache on view change to the db. So even if we clear the table with the help of above mentioned flags (remove_all_data_on_view_change && remove_old_coords_on_view_change) , its getting re written to the table.
> {code:java}
> // remove all files which are not from the current members
> protected void handleView(View new_view, View old_view, boolean coord_changed) {
> if(is_coord) {
> if(coord_changed) {
> if(remove_all_data_on_view_change)
> removeAll(cluster_name);
> else if(remove_old_coords_on_view_change) {
> Address old_coord=old_view != null? old_view.getCreator() : null;
> if(old_coord != null)
> remove(cluster_name, old_coord);
> }
> }
> if(coord_changed || View.diff(old_view, new_view)[1].length > 0) {
> writeAll();
> if(remove_all_data_on_view_change || remove_old_coords_on_view_change)
> startInfoWriter();
> }
> }
> else if(coord_changed) // I'm no longer the coordinator
> remove(cluster_name, local_addr);
> }
> {code}
> 4) Because of the crashed members (non existing ip address), we are getting lot of socket timeouts
> sendToMembers of TP is trying to send messages to old crashed members and writing error logs while startup.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 2 months
[JBoss JIRA] (DROOLS-2289) The response time jump add in stability test
by LIANG LIU (JIRA)
[ https://issues.jboss.org/browse/DROOLS-2289?page=com.atlassian.jira.plugi... ]
LIANG LIU updated DROOLS-2289:
------------------------------
Attachment: InputFact.java
> The response time jump add in stability test
> ---------------------------------------------
>
> Key: DROOLS-2289
> URL: https://issues.jboss.org/browse/DROOLS-2289
> Project: Drools
> Issue Type: Quality Risk
> Components: kie server
> Affects Versions: 7.5.0.Final
> Environment: tomcat version:8.5.24 jmeter version:jakarta-jmeter-2.5.1 Drools version:7.5.0.Final
> cpu:2x6 x86_64, memory:32G, sysOS:CentOS release 6.3
> Reporter: LIANG LIU
> Assignee: Maciej Swiderski
> Attachments: InputFact.java, image-2018-02-02-14-51-31-166.png, tomcat8_test.jmx
>
>
> !image-2018-02-02-14-51-31-166.png|thumbnail!
> Stability test the response time jump add! Figure ; En,there are 20 rules,like:
> rule "1"
> salience 9999
> when
> $inputFact:InputFact(stateless_data["black_test"]== 1)
> then
> result.put(drools.getRule().getName(),1);
> end
> It's a problem I use or some parameters need to be set?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 2 months
[JBoss JIRA] (DROOLS-2289) The response time jump add in stability test
by Edson Tirelli (JIRA)
[ https://issues.jboss.org/browse/DROOLS-2289?page=com.atlassian.jira.plugi... ]
Edson Tirelli commented on DROOLS-2289:
---------------------------------------
Can you provide a reproducer, please?
> The response time jump add in stability test
> ---------------------------------------------
>
> Key: DROOLS-2289
> URL: https://issues.jboss.org/browse/DROOLS-2289
> Project: Drools
> Issue Type: Quality Risk
> Components: kie server
> Affects Versions: 7.5.0.Final
> Environment: tomcat version:8.5.24 jmeter version:jakarta-jmeter-2.5.1 Drools version:7.5.0.Final
> cpu:2x6 x86_64, memory:32G, sysOS:CentOS release 6.3
> Reporter: LIANG LIU
> Assignee: Edson Tirelli
> Attachments: image-2018-02-02-14-51-31-166.png, tomcat8_test.jmx
>
>
> !image-2018-02-02-14-51-31-166.png|thumbnail!
> Stability test the response time jump add! Figure ; En,there are 20 rules,like:
> rule "1"
> salience 9999
> when
> $inputFact:InputFact(stateless_data["black_test"]== 1)
> then
> result.put(drools.getRule().getName(),1);
> end
> It's a problem I use or some parameters need to be set?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 2 months
[JBoss JIRA] (DROOLS-2289) The response time jump add in stability test
by Edson Tirelli (JIRA)
[ https://issues.jboss.org/browse/DROOLS-2289?page=com.atlassian.jira.plugi... ]
Edson Tirelli reassigned DROOLS-2289:
-------------------------------------
Assignee: Maciej Swiderski (was: Edson Tirelli)
> The response time jump add in stability test
> ---------------------------------------------
>
> Key: DROOLS-2289
> URL: https://issues.jboss.org/browse/DROOLS-2289
> Project: Drools
> Issue Type: Quality Risk
> Components: kie server
> Affects Versions: 7.5.0.Final
> Environment: tomcat version:8.5.24 jmeter version:jakarta-jmeter-2.5.1 Drools version:7.5.0.Final
> cpu:2x6 x86_64, memory:32G, sysOS:CentOS release 6.3
> Reporter: LIANG LIU
> Assignee: Maciej Swiderski
> Attachments: image-2018-02-02-14-51-31-166.png, tomcat8_test.jmx
>
>
> !image-2018-02-02-14-51-31-166.png|thumbnail!
> Stability test the response time jump add! Figure ; En,there are 20 rules,like:
> rule "1"
> salience 9999
> when
> $inputFact:InputFact(stateless_data["black_test"]== 1)
> then
> result.put(drools.getRule().getName(),1);
> end
> It's a problem I use or some parameters need to be set?
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 2 months
[JBoss JIRA] (JGRP-2245) JGroup JDBC_PING is not clearing the crashed members
by Sibin Karnavar (JIRA)
[ https://issues.jboss.org/browse/JGRP-2245?page=com.atlassian.jira.plugin.... ]
Sibin Karnavar commented on JGRP-2245:
--------------------------------------
I was trying to add this dependency through Maven but was not able to download 4.0.10.Final. Do I need to wait some more time to make it available to maven central?
> JGroup JDBC_PING is not clearing the crashed members
> ----------------------------------------------------
>
> Key: JGRP-2245
> URL: https://issues.jboss.org/browse/JGRP-2245
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.8
> Reporter: Sibin Karnavar
> Assignee: Bela Ban
> Fix For: 4.0.10
>
>
> 1) In AWS cloud environments, IP address will be different when a node crashes and when a new cluster node gets recreated.
> 2) In this situation, JGroup is not clearing logical_addr_cache and it gets confused, when we restart the cluster nodes.
> 3)logical_addr_cache_max_size and the eviction did not work because, the cache is again getting updated from the ping and it never getting marked as removable.
> I think the issue is
> handleView method is always re writing the entire cache on view change to the db. So even if we clear the table with the help of above mentioned flags (remove_all_data_on_view_change && remove_old_coords_on_view_change) , its getting re written to the table.
> {code:java}
> // remove all files which are not from the current members
> protected void handleView(View new_view, View old_view, boolean coord_changed) {
> if(is_coord) {
> if(coord_changed) {
> if(remove_all_data_on_view_change)
> removeAll(cluster_name);
> else if(remove_old_coords_on_view_change) {
> Address old_coord=old_view != null? old_view.getCreator() : null;
> if(old_coord != null)
> remove(cluster_name, old_coord);
> }
> }
> if(coord_changed || View.diff(old_view, new_view)[1].length > 0) {
> writeAll();
> if(remove_all_data_on_view_change || remove_old_coords_on_view_change)
> startInfoWriter();
> }
> }
> else if(coord_changed) // I'm no longer the coordinator
> remove(cluster_name, local_addr);
> }
> {code}
> 4) Because of the crashed members (non existing ip address), we are getting lot of socket timeouts
> sendToMembers of TP is trying to send messages to old crashed members and writing error logs while startup.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
6 years, 2 months