[jboss-jira] [JBoss JIRA] (WFCORE-218) wildfly web management console hangs during deploy from cli

Brian Stansberry (JIRA) issues at jboss.org
Mon Nov 28 15:53:00 EST 2016


    [ https://issues.jboss.org/browse/WFCORE-218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13330480#comment-13330480 ] 

Brian Stansberry edited comment on WFCORE-218 at 11/28/16 3:52 PM:
-------------------------------------------------------------------

I briefly considered not holding any long lasting topology lock and simply getting the set of hosts under a short lived lock. But that is not reliable:

1) T1 is doing a domain-wide write, on DC OperationCoordinatorStepHandler gathers the registered servers and creates DomainSlaveHandler to do the HC rollout.
2) New HC starts, connects, gets exclusive lock, starts registration stuff.
3) T1 gets to the Stage.MODEL handler that detects a write, tries to get exclusive lock, blocks
4) New HC reg is completed, exclusive lock released
5) T1 gets lock, proceeds
6) T1 gets to DomainSlaveHandler, rolls out the change to the set of slaves provided in 1) above, which does not include New HC.
7) New HC misses the update.

The situation with servers I believe is simpler. There the set of host and server proxies is a ref to the complete, dynamically updated set. Which servers get called depends on the rollout plan. The rollout plan is created after Stage.MODEL, so the exclusive lock will be held when it is created. So any "New Server" joining in a race with the change will either a) block in registration acquiring the exclusive lock until after the change is complete or b) cause the change to block in Stage.MODEL until reg is complete, with New Server then being picked up by DomainRolloutStepHandler the same as if it had been registered before the change op even began.

The way the server case is handled by DomainRolloutStepHandler suggests a possible easy fix for the host case as well. DomainSlaveHandler should be constructed with a ref to the complete dynamically updated map of host proxies (the way DomainRolloutStepHandler is). It should also be given the set of host names to update, or null if the update is global. If the list of host names is not null, that means the op only targets particular hosts, with no possibility of that set being added to in the course of execution. So, if if the change is global, the write lock in a Stage.MODEL step will ensure that any new host is either registered before DomainSlaveHandler executes, or is blocking waiting for the change op to complete. If the change is not global, the registration of a new slave is irrelevant to DomainSlaveHandler; it just works with the set of hosts it knows about.


Reads still need some thought though. The current behavior of overly aggressively taking the exclusive lock prevents some possible scenarios, like a client periodically reading a bunch of metrics getting a failure because a host or server is removed by another op in the middle of the read. This could be a real scenario now that things like multi-process reads and the query op are supported.


was (Author: brian.stansberry):
I briefly considered not holding any long lasting topology lock and simply getting the set of hosts under a short lived lock. But that is not reliable:

1) T1 is doing a domain-wide write, on DC OperationCoordinatorStepHandler gathers the registered servers and creates DomainSlaveHandler to do the HC rollout.
2) New HC starts, connects, gets exclusive lock, starts registration stuff.
3) T1 gets to the Stage.MODEL handler that detects a write, tries to get exclusive lock, blocks
4) New HC reg is completed, exclusive lock released
5) T1 gets lock, proceeds
6) T1 gets to DomainSlaveHandler, rolls out the change to the set of slaves provided in 1) above, which does not include New HC.
7) New HC misses the update.

The situation with servers I believe is simpler. There the set of host and server proxies is the complete set. Which servers get called depends on the rollout plan. The rollout plan is created after Stage.MODEL, so the exclusive lock will be held when it is created. So any "New Server" joining in a race with the change will either a) block in registration acquiring the exclusinve lock until after the change is complete or b) cause the change to block in Stage.MODEL until reg is complete, with New Server then being picked up by DomainRolloutStepHandler the same as if it had been registered before the change op even began.

The way the server case is handled by DomainRolloutStepHandler suggests a possible easy fix for the host case as well. DomainSlaveHandler should be constructed with a ref to the dynamically changing map of host proxies (the way DomainRolloutStepHandler is). It should also be given the set of host names to update, or null if the update is global. If the list of host names is not null, that means the op only targets particular hosts, with no possibility of that set being added to in the course of execution. So, if if the change is global, the write  lock in a Stage.MODEL step will ensure that any new host is either registered before DomainSlaveHandler executes, or is blocking waiting for the change op to complete. If the change is not global, the registration of a new slave is irrelevant to DomainSlaveHandler; it just works with the set of hosts it knows about.

> wildfly web management console hangs during deploy from cli
> -----------------------------------------------------------
>
>                 Key: WFCORE-218
>                 URL: https://issues.jboss.org/browse/WFCORE-218
>             Project: WildFly Core
>          Issue Type: Bug
>          Components: Domain Management
>    Affects Versions: 1.0.0.Alpha1
>            Reporter: Ian Kent
>         Attachments: threaddump-1415735255304.tdump
>
>
> We are running wildfly in domain mode with the following configuration.
> host A running domain controlller
> host B running host controller with one app sever
> host C running host controller with one app server
> host D running host controller with one app server
> When we deloy war using jboss-cli the web console is blocked for usage until deploy completes. I have run jvisualvm and it does not appear that domain controller process is starved for resources (cpu, memory, threads).



--
This message was sent by Atlassian JIRA
(v7.2.3#72005)


More information about the jboss-jira mailing list