We have been observing testsuite execution failures due to:
(i) servers failing to shutdown correctly, blocking the startup of subsequent server
starts
(ii) servers timing out on startup
Some of these failures arise on platforms which have been newly introducted for JBPAPP
testing, where some platforms (e.g. HPUX) cause the AS to start more slowly. These build
failures wreak havoc on any attempts to automate the execution of the testsuite.
I've been investigating these problems. Here is what I have found:
1. The standard means of starting and stopping servers is to:
(i) define a server configuration in the server:config element in the file
testsuite/import/server-config.xml
(ii) use server:start conf="" to start the configuration
(iii) use server:stop conf="" to stop the configuration
server:start will exec the program org.jboss.Main and wait for 120 seconds for the server
to begin running and reach a started state (the server is 'started' if it responds
to an HTTP request); if the server fails to respond, the ant task StartServerTask which
will raise a BuildException with the message "Error starting server " and the
build will be terminated (if ant is set up to do so).
server:stop will exec the program org.jboss.Shutdown and will wait 120 seconds for the
server to shutdown (determined by a call to Runtime.exitValue()). If the server does not
shutdown, the message "Failed to shutdown server before timeout. Destroying the
process" and an attempt is made to destroy the process. No build exception is raised
and the build will continue.
The target jboss-all-config-tests uses this approach, and it has been known to fail to
correctly shut down the 'all' server instance, blocking the startup of
tests-security-manager, which follows it in the testsuite. This needs further
investigation. A similar issue was raised (JBQA-405) and the code was changed by Anil to
add the class library commons-logging.jar to the stopServerClasspath(), as when
exec'ing org.jboss.Shutdown, it may fail to execute due to classes it needs not being
present on ths classpath. I believe this was in response to 'minimal' not shutting
down, and this was a repeatble error. However, these changes made in the JIRA issue are
not present in the current distribution versions.
2. There are some old mechanisms for starting and stopping servers, which some tests are
using:
<start-jboss>
- executes the java program org.jboss.Main in the background (with 64m memory overriding)
- doesn't block, so we need to use <wait-on-host> to wait until startup has
completed
<wait-on-host>
- waits for a host to start, by periodically trying to make a connection to
http:hostname:8080
- fails the build and spits out a message if not up in 60 seconds
<stop-jboss>
- executes the java program org.jboss.Shutdown in the background
- does not block once executed, therefore we need to follow this with
<wait-for-shutdown>
<wait-on-shutdown>
- waits for a host to shutdown, by checks for the presence of
"[org.jboss.system.server.Server] Shutdown complete" in server.log
- fails the build and spits out a message if not up in 60 seconds
These old, and really deprecated, methods of startup and shutdown are inferior to
server:start and server:stop, in that:
- they will not try to kill a process which won't shut down, but only fail the build
- they also use their own timeouts, which are 60 secs, instead of 120 secs. Given that
JBoss configuration 'all' takes 45 secs to start on a fast machine, with no other
processes running, this does not account for (i) slow hosts (ii) hosts with several jboss
instances running (iii) complex configuration startups.
- the wait-on-shutdown target seems to be broken, as the log entry for system startup has
changed from "[org.jboss.system.server.Server] Shutdown complete" to
"[Server] Shutdown complete".
There are a number of targets which use <start-jboss>/<wait-on-host> and
<stop-jboss>/<wait-on-shutdown> and so are suscecptible to timeouts.
They are:
* tests-compatibility-pooledInvokers
* tests-compatibility
* tests-jacc-securitymgr
* tests-jacc-security-external
* tests-jacc-security-allstarrole
What is worse, the latter three mix a server:start startup with
<stop-jboss>/<wait-on-shutdown>.
Thus, in trying to fix the problem of the testsuite failing due to (i) servers failing to
start up because other servers have not cleanly shutdown and (ii) server startups timing
out and causing the build to fail, it might help by:
1. reworking the targets above to use server:start and server:stop, so that at least we
have consistent behaviours being applied to stopping and starting servers.
2. re-investigate the probelm with 'all' not shutting down, which may be sue to
OOME. One approach would be to introduce TRACE statements which trace the execution of
steps between a client exec'ing Shutdown, and the shutdown() request being processed
on the server side.
I'lll go ahead and make these changes to a temporary copy of the source tree and test
them out as much as I can.
If anyone has any comments, suggestions, please feel free to comment.
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4029136#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...