[JBoss JIRA] (WFCORE-378) Domain rollout operation timeouts
by Brian Stansberry (JIRA)
[ https://issues.jboss.org/browse/WFCORE-378?page=com.atlassian.jira.plugin... ]
Brian Stansberry updated WFCORE-378:
------------------------------------
Git Pull Request: https://github.com/wildfly/wildfly-core/pull/1305
> Domain rollout operation timeouts
> ---------------------------------
>
> Key: WFCORE-378
> URL: https://issues.jboss.org/browse/WFCORE-378
> Project: WildFly Core
> Issue Type: Feature Request
> Components: Domain Management
> Reporter: Brian Stansberry
> Assignee: Brian Stansberry
> Fix For: 3.0.0.Alpha1
>
>
> Portion of the parent task related to rolling out changes to a domain.
> The expectation is that single process timeouts (WFLY-2741) will handle most failure conditions related to domain rollouts (e.g. if a single server hangs, preventing completion of the rollout, eventually that server will time out, allowing the domain wide rollout to continue.) Timeouts in the domain rollout code serve as a second line of defense:
> 1) In case of protocol or other problems that prevent the calling process learning about the timeout on the remote process
> 2) In case of bugs in the single process timeout handling on the remote process
> 3) In mixed domain cases where remote hosts are running previous versions and do not have the timeout function
> Potential places to add timeouts:
> DomainSlaveHandler->HostControllerUpdateTask.ProxyOperationListener.retrievePreparedOperation()
> -- where the master HC waits for responses from slaves
> RollingServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
> -- timeout here means 1 server didn't respond, but need to move on to next
> ConcurrentServerGroupUpdateTask.run() -> ServerTaskExecutor.ServerOperationListener.retrievePreparedOperation()
> -- timeout here means none of the remaining servers have responded w/in the timeout
> DomainRolloutStepHandler.finalizeOp() -> future.get()
> ---- the ServerGroupUpdateTask should fail in the normal phase, so any timeout here would indicate a problem committing the tx or a comms problem getting back the response
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months
[JBoss JIRA] (WFCORE-1134) A java.lang.Error in Stage.DONE can lead to hangs
by Brian Stansberry (JIRA)
[ https://issues.jboss.org/browse/WFCORE-1134?page=com.atlassian.jira.plugi... ]
Brian Stansberry updated WFCORE-1134:
-------------------------------------
Git Pull Request: https://github.com/wildfly/wildfly-core/pull/1305
> A java.lang.Error in Stage.DONE can lead to hangs
> -------------------------------------------------
>
> Key: WFCORE-1134
> URL: https://issues.jboss.org/browse/WFCORE-1134
> Project: WildFly Core
> Issue Type: Bug
> Components: Domain Management
> Affects Versions: 2.0.1.Final
> Reporter: Brian Stansberry
> Assignee: Brian Stansberry
> Fix For: 3.0.0.Alpha1
>
>
> As part of investigating https://bugzilla.redhat.com/show_bug.cgi?id=1259767 I've looked into general handling of situations when java.lang.Error is thrown during management operation execution.
> Handling prior to Stage.DONE looks ok, with the error caught, logged and the failure-description on the response set. But if the error happens after commit or during rollback, it isn't always properly triggering a response. In the intra-domain case the remote node that has the problem tries to send a failure response to the calling HC, but uses the wrong request id, resulting in that response being dropped and the calling HC continuing to wait.
> A post-commit Error seems pretty unlikely. A rollback error seems more possible, as a rollback may involve installing significant numbers of services.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months
[JBoss JIRA] (WFCORE-1220) Try closing the channel if java.lang.Error prevents sending error responses to the client
by Brian Stansberry (JIRA)
Brian Stansberry created WFCORE-1220:
----------------------------------------
Summary: Try closing the channel if java.lang.Error prevents sending error responses to the client
Key: WFCORE-1220
URL: https://issues.jboss.org/browse/WFCORE-1220
Project: WildFly Core
Issue Type: Sub-task
Components: Domain Management
Reporter: Brian Stansberry
Assignee: Brian Stansberry
Beyond the basic work on WFCORE-1134, look into further reaction when Errors occur and the server or HC cannot even send an error response to the caller. If we get to this point, the task has already failed to handle a problem and now we can't notify the remote side either. Most likely this is an OOME situation, although any Error here means a major issue.
Options:
1) Try and close the channel to disconnect this process from the remote end so it doesn't disrupt the remote end. If this is an intra-HC or HC-server connection a successful close will remove this process from normal domain control. If this is a server the HC still has some control over the server via the ProcessController, e.g. to handle a 'kill' or 'destroy' management op. If this is a slave HC, the slave is disconnected from the domain. Either a server or a slave HC may try to reconnect, although it's likely better if that fails and the user just restarts the process.
If the remote side is an end user client (e.g. CLI) then a successful close will be noticed by the client. The user can reconnect or take action to terminate this process.
2) Commit suicide via SystemExiter.exit. But I'm not certain complete termination of the process is how we want to deal with problems with management requests. Probably some sort of configurable policy would be better.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months
[JBoss JIRA] (WFLY-5840) mod_cluster proxies doesn't accept expressions
by Rafael Benevides (JIRA)
[ https://issues.jboss.org/browse/WFLY-5840?page=com.atlassian.jira.plugin.... ]
Rafael Benevides commented on WFLY-5840:
----------------------------------------
Thanks
> mod_cluster proxies doesn't accept expressions
> ----------------------------------------------
>
> Key: WFLY-5840
> URL: https://issues.jboss.org/browse/WFLY-5840
> Project: WildFly
> Issue Type: Feature Request
> Components: Clustering
> Affects Versions: 9.0.2.Final
> Reporter: Rafael Benevides
> Assignee: Radoslav Husar
> Labels: mod_cluster
>
> proxy-list was deprecated in WFLY-457
> Now I'm forced to use proxies but it doesn't accept expressions.
> I want to create a customisable Docker image with where the modcluster ip can be configured at runtime
> I'd like that Wildfly image to accept the following command but
> {noformat}
> /subsystem=modcluster/mod-cluster-config=configuration:write-attribute(name=proxies,value=[${modcluster.host:modcluster}:${modcluster.port:80}])}}
> {noformat}
> but if fails with:{color:red} "Services that may be the cause:" => ["jboss.outbound-socket-binding.\"${modcluster.host:modcluster}:${modcluster.port:80}\""]{color}
> With
> {noformat}
> /subsystem=modcluster/:add-proxy(host=${modcluster.host:modcluster},port=80)
> {noformat}
> it fails with {color:red}WFLYMODCLS0016: No IP address could be resolved for the specified host of the proxy.{color}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months
[JBoss JIRA] (WFLY-5846) WaitTimeInterceptor should store the start wait time in the context only if statistics are enabled.
by Andrej Golovnin (JIRA)
Andrej Golovnin created WFLY-5846:
-------------------------------------
Summary: WaitTimeInterceptor should store the start wait time in the context only if statistics are enabled.
Key: WFLY-5846
URL: https://issues.jboss.org/browse/WFLY-5846
Project: WildFly
Issue Type: Enhancement
Components: EJB
Affects Versions: 10.0.0.CR4
Reporter: Andrej Golovnin
Currently the WaitTimeInterceptor always stores the start wait time in the interceptor context. But the ExecutionTimeInterceptor uses this value only if the statistics are enabled on the component. Therefore the WaitTimeInterceptor should be changed to store the start wait time in the context only if statistics are enabled.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months
[JBoss JIRA] (WFLY-5846) WaitTimeInterceptor should store the start wait time in the context only if the statistics are enabled.
by Andrej Golovnin (JIRA)
[ https://issues.jboss.org/browse/WFLY-5846?page=com.atlassian.jira.plugin.... ]
Andrej Golovnin updated WFLY-5846:
----------------------------------
Summary: WaitTimeInterceptor should store the start wait time in the context only if the statistics are enabled. (was: WaitTimeInterceptor should store the start wait time in the context only if statistics are enabled.)
> WaitTimeInterceptor should store the start wait time in the context only if the statistics are enabled.
> -------------------------------------------------------------------------------------------------------
>
> Key: WFLY-5846
> URL: https://issues.jboss.org/browse/WFLY-5846
> Project: WildFly
> Issue Type: Enhancement
> Components: EJB
> Affects Versions: 10.0.0.CR4
> Reporter: Andrej Golovnin
>
> Currently the WaitTimeInterceptor always stores the start wait time in the interceptor context. But the ExecutionTimeInterceptor uses this value only if the statistics are enabled on the component. Therefore the WaitTimeInterceptor should be changed to store the start wait time in the context only if statistics are enabled.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months
[JBoss JIRA] (WFLY-5522) split artemis journal in its own module
by Tom Jenkinson (JIRA)
[ https://issues.jboss.org/browse/WFLY-5522?page=com.atlassian.jira.plugin.... ]
Tom Jenkinson commented on WFLY-5522:
-------------------------------------
Thanks Jeff
> split artemis journal in its own module
> ---------------------------------------
>
> Key: WFLY-5522
> URL: https://issues.jboss.org/browse/WFLY-5522
> Project: WildFly
> Issue Type: Enhancement
> Components: JMS, Transactions
> Affects Versions: 10.0.0.CR2
> Reporter: Jeff Mesnil
> Assignee: Jeff Mesnil
> Priority: Minor
>
> The org.jboss.jts is depending on the org.apache.activemq.artemis only for its journal functionality but get all the module dependencies (including messaging stuff) that is not used at all.
> The org.apache.activemq.artemis should be split with a separate module for its journal containing the journal jar + commons jar + native lib required for libAIO.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
10 years, 5 months