[JBoss JIRA] (WFLY-2742) Domain rollout operation timeouts
by Brian Stansberry (JIRA)
[ https://issues.jboss.org/browse/WFLY-2742?page=com.atlassian.jira.plugin.... ]
Brian Stansberry updated WFLY-2742:
-----------------------------------
Parent: (was: WFLY-2740)
Issue Type: Feature Request (was: Sub-task)
> Domain rollout operation timeouts
> ---------------------------------
>
> Key: WFLY-2742
> URL: https://issues.jboss.org/browse/WFLY-2742
> Project: WildFly
> Issue Type: Feature Request
> Components: Domain Management
> Reporter: Brian Stansberry
> Assignee: Brian Stansberry
> Fix For: 9.0.0.Beta1
>
>
> Portion of the parent task related to rolling out changes to a domain.
> The expectation is that single process timeouts (WFLY-2741) will handle most failure conditions related to domain rollouts (e.g. if a single server hangs, preventing completion of the rollout, eventually that server will time out, allowing the domain wide rollout to continue.) Timeouts in the domain rollout code serve as a second line of defense:
> 1) In case of protocol or other problems that prevent the calling process learning about the timeout on the remote process
> 2) In case of bugs in the single process timeout handling on the remote process
> 3) In mixed domain cases where remote hosts are running previous versions and do not have the timeout function
> Potential places to add timeouts:
> DomainSlaveHandler->HostControllerUpdateTask.ProxyOperationListener.retrievePreparedOperation()
> -- where the master HC waits for responses from slaves
> RollingServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
> -- timeout here means 1 server didn't respond, but need to move on to next
> ConcurrentServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
> -- timeout here means none of the remaining servers have responded w/in the timeout
> DomainRolloutStepHandler.finalizeOp() -> future.get()
> ---- the ServerGroupUpdateTask should fail in the normal phase, so any timeout here would indicate a problem committing the tx or a comms problem getting back the response
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (WFLY-2741) Single process management operation timeouts
by Brian Stansberry (JIRA)
[ https://issues.jboss.org/browse/WFLY-2741?page=com.atlassian.jira.plugin.... ]
Brian Stansberry updated WFLY-2741:
-----------------------------------
Parent: (was: WFLY-2740)
Issue Type: Feature Request (was: Sub-task)
> Single process management operation timeouts
> --------------------------------------------
>
> Key: WFLY-2741
> URL: https://issues.jboss.org/browse/WFLY-2741
> Project: WildFly
> Issue Type: Feature Request
> Components: Domain Management
> Reporter: Brian Stansberry
> Assignee: Brian Stansberry
> Fix For: 9.0.0.Alpha1
>
>
> Portion of the parent task related to operation execution within a single process (i.e. not involving intra-process calls in a domain).
> Potential points to apply the timeout:
> ModelControllerImpl.acquireLock()
> Questionable, as blocking here means waiting for another op, not a problem with the current op
> ModelControllerImpl.awaitContainerMonitor() -> ContainerStateMonitor.await() -> StabilityMonitor.awaitStability()
> This is the typical blocking point in a server op
> -- on way in, fail the op
> -- on way out (OperationContextImpl.releaseStepLocks()) log a WARN/ERROR
> ModelControllerImpl.awaitContainerStateChangedReport()
> -- on the way in, before VERIFY
> OperationContextImpl.waitForRemovals()
> -- on way in, before VERIFY
> -- on way out, in handleRollback
> ---- problem: this is called repeatedly
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (WFCORE-360) HTTP management API section in the Admin Guide
by Brian Stansberry (JIRA)
[ https://issues.jboss.org/browse/WFCORE-360?page=com.atlassian.jira.plugin... ]
Brian Stansberry moved WFLY-748 to WFCORE-360:
----------------------------------------------
Project: WildFly Core (was: WildFly)
Key: WFCORE-360 (was: WFLY-748)
Component/s: Domain Management
(was: Domain Management)
Fix Version/s: (was: No Release)
> HTTP management API section in the Admin Guide
> ----------------------------------------------
>
> Key: WFCORE-360
> URL: https://issues.jboss.org/browse/WFCORE-360
> Project: WildFly Core
> Issue Type: Task
> Components: Domain Management
> Reporter: Brian Stansberry
> Assignee: Emmanuel Hugonnet
>
> We need a technical guide to the HTTP management API.
> The "Management clients" section gives a basic overview. But it should reference a separate section that documents the usage of GET and POST, the URL patterns, the JSON representation of DMR etc.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (WFCORE-358) Add a generic query facility
by Brian Stansberry (JIRA)
[ https://issues.jboss.org/browse/WFCORE-358?page=com.atlassian.jira.plugin... ]
Brian Stansberry moved WFLY-708 to WFCORE-358:
----------------------------------------------
Project: WildFly Core (was: WildFly)
Key: WFCORE-358 (was: WFLY-708)
Component/s: Domain Management
(was: Domain Management)
Fix Version/s: (was: Awaiting Volunteers)
> Add a generic query facility
> ----------------------------
>
> Key: WFCORE-358
> URL: https://issues.jboss.org/browse/WFCORE-358
> Project: WildFly Core
> Issue Type: Feature Request
> Components: Domain Management
> Reporter: Heiko Rupp
> Labels: rhq
>
> To find out, which e.g. jdbc drivers are deployed on a server group, I have to visit each managed server and query it. This is cumbersome
> - first the server group should have the same driver for all servers, so asking the server group should be enough
> - as the server group does not even know its connected managed servers (there is no :list-servers) operation, I need to potentially loop over the whole domain including tons of remote HC in order to query for e.g. the jdbc drivers or other stuff
> There should be a mechanism to fire queries at which then get relayed to the HCs and the answers combined and returned.
> Still I don't get why if a server-group is meant to contain an identical configuration for all its managed servers (besides VM params, ports and IPs) it does not just have "proxy methods" like :installed-drivers-list or :server-list etc.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (WFCORE-359) Failure to start the JVM should trigger a management update plan to rollback
by Brian Stansberry (JIRA)
[ https://issues.jboss.org/browse/WFCORE-359?page=com.atlassian.jira.plugin... ]
Brian Stansberry moved WFLY-744 to WFCORE-359:
----------------------------------------------
Project: WildFly Core (was: WildFly)
Key: WFCORE-359 (was: WFLY-744)
Component/s: Domain Management
(was: Domain Management)
Fix Version/s: (was: Awaiting Volunteers)
> Failure to start the JVM should trigger a management update plan to rollback
> ----------------------------------------------------------------------------
>
> Key: WFCORE-359
> URL: https://issues.jboss.org/browse/WFCORE-359
> Project: WildFly Core
> Issue Type: Task
> Components: Domain Management
> Reporter: Andrig Miller
>
> I have come across situations where certain JVM options, typically related to NUMA options, can cause the JVM to segfault. With our domain model, we expose the ability to set the JVM options, so I think this certainly would be a possibility.
> So, as we discussed on the Andiamo call today, this type of situation should cause a rollback of the management update plan, if the JVM fails to start.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months
[JBoss JIRA] (WFCORE-356) Make socket binding interface and port values available for use in property subsitution
by Brian Stansberry (JIRA)
[ https://issues.jboss.org/browse/WFCORE-356?page=com.atlassian.jira.plugin... ]
Brian Stansberry moved WFLY-702 to WFCORE-356:
----------------------------------------------
Project: WildFly Core (was: WildFly)
Key: WFCORE-356 (was: WFLY-702)
Component/s: Domain Management
(was: Domain Management)
Fix Version/s: (was: Awaiting Volunteers)
> Make socket binding interface and port values available for use in property subsitution
> ---------------------------------------------------------------------------------------
>
> Key: WFCORE-356
> URL: https://issues.jboss.org/browse/WFCORE-356
> Project: WildFly Core
> Issue Type: Feature Request
> Components: Domain Management
> Reporter: Brian Stansberry
>
> See linked dev list thread for an example use case (remote socket binding used in an driver-specific JDBC URL).
> Some gotchas noted on the thread that need to be resolved:
> A tricky thing to deal with is interfaces and socket bindings are
> actually on-demand services. They aren't resolved until they are
> demanded. Nothing in this connection-url scenario would demand the
> socket binding, so it won't be resolved, and until it's resolved no
> system property could be set.
> That could be handled by adding an "auto-start" attribute on the
> socket-binding config. Not particularly intuitive though.
> Another thing to deal with is interface resolution. With the exception
> of the "<loopback address="127.0.0.4"/> criteria Scott added, resolving
> an interface means finding a NIC on the machine that matches the
> criteria. If that can't be done, it's an error condition. To avoid that
> there would need to be some new criteria added (e.g.
> <remote-inet-address value="10.0.0.53"/>) or an attribute added to the
> existing inet-address criteria (e.g. <inet-address value="10.0.0.53"
> local="false"/>. (There is a separate JIRA for this issue: AS7-1614)
> A minor issue relates to changing the configuration of a socket binding.
> Basically, we try and track whether you've used a particular
> SocketBinding service to open a socket; if not we let you change the
> binding config without requiring a server restart or reload. This
> connection-url stuff won't use the SocketBinding service to create a
> socket, so the binding will be editable at runtime with no
> reload/restart required. But there's no dependency relationship between
> the binding and the datasource, so that change is not going to be
> reflected in the datasource. This is just an example of the general
> problem with using system properties as an injection mechanism.
> I don't think this last point is a blocker, it just requires documentation.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 5 months