[JBoss JIRA] (JGRP-2360) DeadLock while acqiring a distributed lock consecutively by the same thread in a loop
by Daniel Klosinski (Jira)
[ https://issues.jboss.org/browse/JGRP-2360?page=com.atlassian.jira.plugin.... ]
Daniel Klosinski commented on JGRP-2360:
----------------------------------------
Hi [~belaban],
Unfortunately, we can't upgrade to 4.x as versions 3&4 are not compatible and we cannot afford downtime.
I've created a pull request. Could you please review it?
> DeadLock while acqiring a distributed lock consecutively by the same thread in a loop
> -------------------------------------------------------------------------------------
>
> Key: JGRP-2360
> URL: https://issues.jboss.org/browse/JGRP-2360
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.18, 4.1.1
> Environment: JGroups-4.1.1-Final
> Red Hat 4.4.7-23
> JDK 1.8.0_202
> Reporter: Daniel Klosinski
> Assignee: Bela Ban
> Priority: Major
> Fix For: 4.1.2
>
> Attachments: DLTest.java, DistributedLockRepoducer.zip, log.log
>
>
> Deadlock intermittently happens when trying to acquire a distributed lock by the same VM, consecutively by the same thread in a loop. Here is a code snippet for which this issue can occur :
> {code}
> for(String s : list){
> Lock lock=lock_service.getLock("test_lock_name");
> lock.lock();
> //perform bussines logic
> lock.unlock();
> }
> {code}
> Running such loop I am getting dead look after a few loop iterations. In the attached logs program hanged after 3 iterations
> During the troubleshooting, I found out that lock_id is not being incremented for the new distributed lock. In the first two loop iterations everything was fine. At the third iteration lock_id didn't get increased:
> {code}
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: GRANT_LOCK[test_lock_name, lock_id=1, owner=svc-2-sps-34594::1]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: LOCK_GRANTED[test_lock_name, lock_id=1, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: RELEASE_LOCK[test_lock_name, lock_id=1, owner=svc-2-sps-34594::1]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: RELEASE_LOCK_OK[test_lock_name, lock_id=1, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: GRANT_LOCK[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: LOCK_GRANTED[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: RELEASE_LOCK[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: RELEASE_LOCK_OK[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 --> svc-1-sps-4688: GRANT_LOCK[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: CREATE_LOCK[test_lock_name, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
> 2019-07-15 16:03:32 TRACE CENTRAL_LOCK:163 - svc-2-sps-34594 <-- svc-1-sps-4688: LOCK_GRANTED[test_lock_name, lock_id=2, owner=svc-2-sps-34594::1, sender=svc-1-sps-4688]
> {code}
> I've added few extra loggers into Jgroups-4.1.1.Final code and I realized that the second client lock was not removed from the client lock table before the creation of 3rd client lock. The issue lays in below piece of code. Owner consists of address and threadID. If the same thread, on the same VM, creates distributed lock consecutively and if there is an existing entry in the client lock table for the same owner, the new lock won't be created. The old client lock will be used to acquire a new distributed lock :
> {code}
> protected synchronized ClientLock getLock(String name, Owner owner, boolean create_if_absent) {
> Map<Owner,ClientLock> owners=table.get(name);
> if(owners == null) {
> if(!create_if_absent)
> return null;
> owners=Util.createConcurrentMap(20);
> Map<Owner,ClientLock> existing=table.putIfAbsent(name,owners);
> if(existing != null)
> owners=existing;
> }
> ClientLock lock=owners.get(owner);
> if(lock == null) {
> if(!create_if_absent)
> return null;
> lock=createLock(name, owner);
> owners.put(owner, lock);
> }
> return lock;
> }
> {code}
> I believe that this issue was introduced by the fix for JGRP-2234 and it is caused by the race condition. The logic that deletes client lock from the client lock table is now executed when the client's VM receives RELEASE_LOCK_OK message from the coordinator. Previously this deletion was executed by the thread in which unlock() method was called. Now, it is executed by the separate thread which handles RELEASE_LOCK_OK from the coordinator and this is why we have a race condition here. Here is a sequence which leads to deadlock:
> 1. Create client lock (lock_id=2)
> 2. Send GRANT_LOCK (lock_id=2) to coordinator
> 3. Receive LOCK_GRANTED (lock_id=2) from coordinator
> 4. Send RELEASE_LOCK (lock_id=2) to coordinator
> 5. Call look() method in the same thread (new client lock won't be created as there is an existing entry in the client lock table for this owner)
> 6. Receive RELEASE_LOCK_OK and delete client lock from client lock table.
> 7. Send GRANT_LOCK (lock_id=2) to coordinator
> 8. Receive LOCK_GRANTED (lock_id=2) from coordinator
> 9. No entry in the client lock table. It's not possible to get the thread which needs to be notified.
> I am attaching a simple program which can be used to reproduce and generated logs.
--
This message was sent by Atlassian Jira
(v7.13.5#713005)
5 years, 4 months
[JBoss JIRA] (WFLY-12462) Incompatible conflicting binding Exception and EJB naming
by Alexey Usharovski (Jira)
[ https://issues.jboss.org/browse/WFLY-12462?page=com.atlassian.jira.plugin... ]
Alexey Usharovski commented on WFLY-12462:
------------------------------------------
Feel free to contact me for any related questions about this jira.
> Incompatible conflicting binding Exception and EJB naming
> ---------------------------------------------------------
>
> Key: WFLY-12462
> URL: https://issues.jboss.org/browse/WFLY-12462
> Project: WildFly
> Issue Type: Bug
> Components: CDI / Weld, EJB
> Affects Versions: 17.0.1.Final
> Environment: MacOS High Sierra
> Reporter: Alexey Usharovski
> Assignee: Cheng Fang
> Priority: Minor
> Attachments: server.log.zip, simplewebapp-lesson-6-ejb.zip, war-content.txt.zip
>
>
> Wired exception for very simple EJB with @Remote annotated interface in time of deploy to WildFly 17.0.1.Final
> {code:java}
> @Stateless
> public class UserServiceImpl implements UserServiceRemote {
> @Override
> public List<UserRepr> getAllUsers() {
> return null;
> }
> }
> @Remote
> public interface UserServiceRemote {
> List<UserRepr> getAllUsers();
> }
> {code}
> The exception is
> {code}
> Caused by: java.lang.IllegalArgumentException: WFLYEE0047: Incompatible conflicting binding at java:jboss/exported/simple-webapp/UserServiceImpl!ru.geekbrains.jsf.UserServiceRemote
> source: org.jboss.as.ejb3.deployment.processors.EjbJndiBindingsDeploymentUnitProcessor$2@6aba1c4b"},
> "WFLYCTL0412: Required services that are not installed:" => [ "jboss.deployment.unit.\"simple-webapp.war\".beanmanager", "jboss.deployment.unit.\"simple-webapp.war\".WeldStartService" ],
> "WFLYCTL0180: Services with missing/unavailable dependencies" => [ "jboss.deployment.unit.\"simple-webapp.war\".batch.artifact.factory is missing [jboss.deployment.unit.\"simple-webapp.war\".beanmanager]",
> "jboss.deployment.unit.\"simple-webapp.war\".weld.weldClassIntrospector is missing [jboss.deployment.unit.\"simple-webapp.war\".beanmanager, jboss.deployment.unit.\"simple-webapp.war\".WeldStartService]" ] }
> {code}
> Notice that problem could be resolved by undeploy and Wildfly server restart. May be something wrong with JNDI content?
> I've tried to look through the web console but found nothing interesting.
> Full application code on GitHub https://github.com/usharik/simplewebapp/tree/lesson-6-ejb
--
This message was sent by Atlassian Jira
(v7.13.5#713005)
5 years, 4 months
[JBoss JIRA] (WFCORE-4623) Intermittent failures in IdentityOperationsTestCase
by Ashley Abdel-Sayed (Jira)
[ https://issues.jboss.org/browse/WFCORE-4623?page=com.atlassian.jira.plugi... ]
Ashley Abdel-Sayed reassigned WFCORE-4623:
------------------------------------------
Assignee: Ashley Abdel-Sayed (was: Darran Lofthouse)
> Intermittent failures in IdentityOperationsTestCase
> ---------------------------------------------------
>
> Key: WFCORE-4623
> URL: https://issues.jboss.org/browse/WFCORE-4623
> Project: WildFly Core
> Issue Type: Bug
> Components: Security
> Reporter: Brian Stansberry
> Assignee: Ashley Abdel-Sayed
> Priority: Major
>
> IdentityOperationsTestCase fails intermittently, producing a set of 21 failures. When this happens the entire job seems to time out.
> https://ci.wildfly.org/project.html?projectId=WildFlyCore_PullRequest&bui...
> The problem seems to involve a server not being able to reach MSC stability during boot and then a lot of problems trying to roll back the boot op. The latter are kind of noise, i.e. the stack trace bit in the snippet below. The key thing is the failure to get MSC stability.
> {code}
> 17:11:15,658 INFO (main) [org.wildfly.security] <Version.java:55> ELY00001: WildFly Elytron version 1.10.0.CR5
> 17:11:15,878 INFO (main) [org.jboss.msc] <ServiceContainerImpl.java:90> JBoss MSC version 1.4.8.Final
> 17:11:15,893 INFO (main) [org.jboss.threads] <Version.java:52> JBoss Threads version 2.3.3.Final
> 17:11:16,027 TRACE (main) [org.wildfly.security] <SecurityDomain.java:1056> Building security domain with defaultRealmName Empty.
> 17:11:16,037 TRACE (main) [org.wildfly.security] <SecurityDomain.java:708> Role mapping: principal [anonymous] -> decoded roles [] -> realm mapped roles [] -> domain mapped roles []
> 17:11:16,312 TRACE (MSC service thread 1-2) [org.wildfly.extension.elytron] <ProviderDefinitions.java:238> Loaded providers [WildFlyElytron version 1.0]
> 17:16:16,313 ERROR (Controller Boot Thread) [org.jboss.as.controller.management-operation] <OperationContextImpl.java:489> WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[("path" => "jboss.server.data.dir")]'
> 17:16:21,322 ERROR (Controller Boot Thread) [org.jboss.as.controller.management-operation] <AbstractOperationContext.java:1525> WFLYCTL0190: Step handler org.jboss.as.controller.AbstractAddStepHandler$1@b0b65f for operation add at address [
> ("subsystem" => "elytron"),
> ("filesystem-realm" => "FileSystemRealm")
> ] failed handling operation rollback -- java.util.concurrent.TimeoutException: java.util.concurrent.TimeoutException
> at org.jboss.as.controller.OperationContextImpl.waitForRemovals(OperationContextImpl.java:523)
> at org.jboss.as.controller.AbstractOperationContext$Step.handleResult(AbstractOperationContext.java:1518)
> at org.jboss.as.controller.AbstractOperationContext$Step.finalizeInternal(AbstractOperationContext.java:1472)
> at org.jboss.as.controller.AbstractOperationContext$Step.finalizeStep(AbstractOperationContext.java:1455)
> at org.jboss.as.controller.AbstractOperationContext$Step.access$400(AbstractOperationContext.java:1319)
> at org.jboss.as.controller.AbstractOperationContext.executeResultHandlerPhase(AbstractOperationContext.java:876)
> at org.jboss.as.controller.AbstractOperationContext.processStages(AbstractOperationContext.java:726)
> at org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:467)
> at org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1413)
> at org.jboss.as.controller.ModelControllerImpl.boot(ModelControllerImpl.java:495)
> {code}
--
This message was sent by Atlassian Jira
(v7.13.5#713005)
5 years, 4 months