[JBoss JIRA] (WFLY-8929) Race condition if timers overlap due to long running execution and short schedules if database persistence is used
by Wolf-Dieter Fink (JIRA)
[ https://issues.jboss.org/browse/WFLY-8929?page=com.atlassian.jira.plugin.... ]
Wolf-Dieter Fink updated WFLY-8929:
-----------------------------------
Attachment: server1.log
> Race condition if timers overlap due to long running execution and short schedules if database persistence is used
> ------------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-8929
> URL: https://issues.jboss.org/browse/WFLY-8929
> Project: WildFly
> Issue Type: Bug
> Components: EJB
> Environment: Configure DB persistence for timers as file-persistence will not have a persistence check for shouldRun to lock the timer execution.
> Reporter: Wolf-Dieter Fink
> Assignee: Stuart Douglas
> Attachments: server-extract.log, server1.log
>
>
> If timers (here calendar timer) are running longer than scheduled, or the schedule/processing get stuck do to thread or cpu bottleneck, it is possible that the updates for persistence overlap.
> The issue seems that the task(1) try to finish the timer and task(2) is about to start but see the concurrency.
> The DB is updated with the 'old' next timeout, but the internal Timer instance will be updated with the next possible schedule due to a race condition between the two threads updating the object.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (WFCORE-2954) Elytron sasl-authentication-factory does not offer mechanisms in configured order
by Farah Juma (JIRA)
[ https://issues.jboss.org/browse/WFCORE-2954?page=com.atlassian.jira.plugi... ]
Farah Juma moved JBEAP-11529 to WFCORE-2954:
--------------------------------------------
Project: WildFly Core (was: JBoss Enterprise Application Platform)
Key: WFCORE-2954 (was: JBEAP-11529)
Workflow: GIT Pull Request workflow (was: CDW with loose statuses v1)
Component/s: Security
(was: Security)
Affects Version/s: (was: 7.1.0.DR19)
> Elytron sasl-authentication-factory does not offer mechanisms in configured order
> ---------------------------------------------------------------------------------
>
> Key: WFCORE-2954
> URL: https://issues.jboss.org/browse/WFCORE-2954
> Project: WildFly Core
> Issue Type: Bug
> Components: Security
> Reporter: Farah Juma
> Assignee: Farah Juma
> Priority: Blocker
> Labels: eap71_priority
>
> Application server does not offer SASL mechanisms in order defined in Elytron sasl-authentication-factory. See Steps to Reproduce for more details.
> Screenshot from wireshark 'follow TCP stream' is attached. {{JBOSS-LOCAL-USER}} is offered before {{DIGEST-MD5}}.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (JGRP-2195) [JGRP00012] discarded message from different cluster with JGroups Upgrade
by Swathi Kumar (JIRA)
Swathi Kumar created JGRP-2195:
----------------------------------
Summary: [JGRP00012] discarded message from different cluster with JGroups Upgrade
Key: JGRP-2195
URL: https://issues.jboss.org/browse/JGRP-2195
Project: JGroups
Issue Type: Bug
Affects Versions: 3.4
Environment: All OS(Linux, AIX, Windows, Solaris)
Reporter: Swathi Kumar
Assignee: Bela Ban
Greetings Team.
We recently upgraded the jgroups jars from version 2_5_2/jgroups-all.jar to 3_4_0/jgroups-3.4.0.Alpha2.jar.
With the upgrade we are seeing *[JGRP00012] discarded message from different cluster* messages every alternate seconds on all the nodes in the cluster.
Also it is to be noted that this issue started to recur only when we switched the protocol from UDP to TCP. If we start using UDP again, we no longer see these *WARN* messages.
We no longer support UDP in our application and we can't be using UDP anymore.
We have several 100's of customers in the field who are using our product with this upgraded jgroups jar and have started to raise tickets against our product.
We are clueless as to why the upgrade is producing enormous WARN messages - is there an issue with this version of the jgroups jar?
The sample WARN message is shown below :-
[2017-06-13 11:56:38.117] ALL 000000000000 GLOBAL_SCOPE 141694 [OOB-1,Sterling_NodeInfo_group,dublr005vm-24633] WARN org.jgroups.protocols.TCP - [JGRP00012] discarded message from different cluster Sterling_NodeInfo_group_WFC (our cluster is Sterling_NodeInfo_group). Sender was dublr005vm-2060
[2017-06-13 11:56:41.72] ALL 000000000000 GLOBAL_SCOPE 145297 [OOB-1,Sterling_NodeInfo_group_WFC,dublr005vm-2060] WARN org.jgroups.protocols.TCP - [JGRP00012] discarded message from different cluster Sterling_NodeInfo_group (our cluster is Sterling_NodeInfo_group_WFC). Sender was dublr005vm-24633
We have the below jgroups config properties in our application to create 3 channels (for security reasons have used a dummy host name here) :-
jgroups_cluster.property_string=TCP(bind_addr=host_name_A;bind_port=34061):TCPPING(initial_hosts=host_name_A[34061],host_name_A[44061],host_name_A[54061];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
jgroups_cluster.distribution_property_string=TCP(bind_port= 34061;thread_pool_rejection_policy=run):TCPPING(initial_hosts=host_name_A[34061],host_name_A[44061],host_name_A[54061];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
jgroups_cluster.lock.protocolStack=TCP(bind_addr=host_name_A;bind_port=34062;):TCPPING(initial_hosts=host_name_A[34062],host_name_A[44062],host_name_A[54062];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
Test considerations :-
1. For in-house testing, I have created a 3 node cluster.
2. All the 3 nodes reside on the same box.
If you need any further information please let me know.
Regards
Swathi BN
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (WFLY-8929) Race condition if timers overlap due to long running execution and short schedules if database persistence is used
by Wolf-Dieter Fink (JIRA)
[ https://issues.jboss.org/browse/WFLY-8929?page=com.atlassian.jira.plugin.... ]
Wolf-Dieter Fink reassigned WFLY-8929:
--------------------------------------
Assignee: Stuart Douglas
> Race condition if timers overlap due to long running execution and short schedules if database persistence is used
> ------------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-8929
> URL: https://issues.jboss.org/browse/WFLY-8929
> Project: WildFly
> Issue Type: Bug
> Components: EJB
> Environment: Configure DB persistence for timers as file-persistence will not have a persistence check for shouldRun to lock the timer execution.
> Reporter: Wolf-Dieter Fink
> Assignee: Stuart Douglas
> Attachments: server-extract.log
>
>
> If timers (here calendar timer) are running longer than scheduled, or the schedule/processing get stuck do to thread or cpu bottleneck, it is possible that the updates for persistence overlap.
> The issue seems that the task(1) try to finish the timer and task(2) is about to start but see the concurrency.
> The DB is updated with the 'old' next timeout, but the internal Timer instance will be updated with the next possible schedule due to a race condition between the two threads updating the object.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (WFCORE-2953) Add a socket-handler resource to the logging subsystem
by James Perkins (JIRA)
James Perkins created WFCORE-2953:
-------------------------------------
Summary: Add a socket-handler resource to the logging subsystem
Key: WFCORE-2953
URL: https://issues.jboss.org/browse/WFCORE-2953
Project: WildFly Core
Issue Type: Feature Request
Components: Logging
Reporter: James Perkins
Assignee: James Perkins
Fix For: 4.0.0.Beta1
The logging subsystem should include a resource for a {{socket-handler}}. This will be useful for structured formatters, such as the [{{json-formatter}}|WFCORE-2951].
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (WFCORE-2951) Add JSON and Logstash formatter resources to the logging subsystem
by James Perkins (JIRA)
James Perkins created WFCORE-2951:
-------------------------------------
Summary: Add JSON and Logstash formatter resources to the logging subsystem
Key: WFCORE-2951
URL: https://issues.jboss.org/browse/WFCORE-2951
Project: WildFly Core
Issue Type: Feature Request
Components: Logging
Reporter: James Perkins
Assignee: James Perkins
Fix For: 4.0.0.Beta1
The logging subsystem should include resources for a {{json-formatter}} and probably a {{logstash-formatter}}. The {{logstash-formatter}} is likely just an extension of the {{json-formatter}} with some predefined attributes that logstash requires.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (WFLY-8929) Race condition if timers overlap due to long running execution and short schedules if database persistence is used
by Wolf-Dieter Fink (JIRA)
[ https://issues.jboss.org/browse/WFLY-8929?page=com.atlassian.jira.plugin.... ]
Wolf-Dieter Fink updated WFLY-8929:
-----------------------------------
Steps to Reproduce:
Create a Simple Bean @Singleton and use a method like this:
@Schedule(second = "\*/1", minute = "\*", hour = "*", persistent = true)
private void longRunning4parallelExecution() {
LOGGER.info("Start long running scheduled method");
int sec = 988;
LOGGER.info("sleeping " + sec + " milli seconds.");
try {
Thread.sleep(sec);
} catch (InterruptedException e) {}
LOGGER.info("Finished executing concurrent schedule");
}
The timer show often concurrent WARN messages and should stuck after a few minutes running
was:
Create a Simple Bean @Singleton and use a method like this:
@Schedule(second = "*/1", minute = "*", hour = "*", persistent = true)
private void longRunning4parallelExecution() {
LOGGER.info("Start long running scheduled method");
int sec = 988;
LOGGER.info("sleeping " + sec + " milli seconds.");
try {
Thread.sleep(sec);
} catch (InterruptedException e) {}
LOGGER.info("Finished executing concurrent schedule");
}
The timer show often concurrent WARN messages and should stuck after a few minutes running
> Race condition if timers overlap due to long running execution and short schedules if database persistence is used
> ------------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-8929
> URL: https://issues.jboss.org/browse/WFLY-8929
> Project: WildFly
> Issue Type: Bug
> Components: EJB
> Environment: Configure DB persistence for timers as file-persistence will not have a persistence check for shouldRun to lock the timer execution.
> Reporter: Wolf-Dieter Fink
> Attachments: server-extract.log
>
>
> If timers (here calendar timer) are running longer than scheduled, or the schedule/processing get stuck do to thread or cpu bottleneck, it is possible that the updates for persistence overlap.
> The issue seems that the task(1) try to finish the timer and task(2) is about to start but see the concurrency.
> The DB is updated with the 'old' next timeout, but the internal Timer instance will be updated with the next possible schedule due to a race condition between the two threads updating the object.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (WFLY-8929) Race condition if timers overlap due to long running execution and short schedules if database persistence is used
by Wolf-Dieter Fink (JIRA)
[ https://issues.jboss.org/browse/WFLY-8929?page=com.atlassian.jira.plugin.... ]
Wolf-Dieter Fink updated WFLY-8929:
-----------------------------------
Steps to Reproduce:
Create a Simple Bean @Singleton and use a method like this:
@Schedule(second = "*/1", minute = "*", hour = "*", persistent = true)
private void longRunning4parallelExecution() {
LOGGER.info("Start long running scheduled method");
int sec = 988;
LOGGER.info("sleeping " + sec + " milli seconds.");
try {
Thread.sleep(sec);
} catch (InterruptedException e) {}
LOGGER.info("Finished executing concurrent schedule");
}
The timer show often concurrent WARN messages and should stuck after a few minutes running
> Race condition if timers overlap due to long running execution and short schedules if database persistence is used
> ------------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-8929
> URL: https://issues.jboss.org/browse/WFLY-8929
> Project: WildFly
> Issue Type: Bug
> Components: EJB
> Environment: Configure DB persistence for timers as file-persistence will not have a persistence check for shouldRun to lock the timer execution.
> Reporter: Wolf-Dieter Fink
> Attachments: server-extract.log
>
>
> If timers (here calendar timer) are running longer than scheduled, or the schedule/processing get stuck do to thread or cpu bottleneck, it is possible that the updates for persistence overlap.
> The issue seems that the task(1) try to finish the timer and task(2) is about to start but see the concurrency.
> The DB is updated with the 'old' next timeout, but the internal Timer instance will be updated with the next possible schedule due to a race condition between the two threads updating the object.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months