[Design of JBoss internal QA (Test Suite)] - Testsuite runs failing due to server failed startups/timeout
by rachmatowicz@jboss.com
We have been observing testsuite execution failures due to:
(i) servers failing to shutdown correctly, blocking the startup of subsequent server starts
(ii) servers timing out on startup
Some of these failures arise on platforms which have been newly introducted for JBPAPP testing, where some platforms (e.g. HPUX) cause the AS to start more slowly. These build failures wreak havoc on any attempts to automate the execution of the testsuite.
I've been investigating these problems. Here is what I have found:
1. The standard means of starting and stopping servers is to:
(i) define a server configuration in the server:config element in the file testsuite/import/server-config.xml
(ii) use server:start conf="" to start the configuration
(iii) use server:stop conf="" to stop the configuration
server:start will exec the program org.jboss.Main and wait for 120 seconds for the server to begin running and reach a started state (the server is 'started' if it responds to an HTTP request); if the server fails to respond, the ant task StartServerTask which will raise a BuildException with the message "Error starting server " and the build will be terminated (if ant is set up to do so).
server:stop will exec the program org.jboss.Shutdown and will wait 120 seconds for the server to shutdown (determined by a call to Runtime.exitValue()). If the server does not shutdown, the message "Failed to shutdown server before timeout. Destroying the process" and an attempt is made to destroy the process. No build exception is raised and the build will continue.
The target jboss-all-config-tests uses this approach, and it has been known to fail to correctly shut down the 'all' server instance, blocking the startup of tests-security-manager, which follows it in the testsuite. This needs further investigation. A similar issue was raised (JBQA-405) and the code was changed by Anil to add the class library commons-logging.jar to the stopServerClasspath(), as when exec'ing org.jboss.Shutdown, it may fail to execute due to classes it needs not being present on ths classpath. I believe this was in response to 'minimal' not shutting down, and this was a repeatble error. However, these changes made in the JIRA issue are not present in the current distribution versions.
2. There are some old mechanisms for starting and stopping servers, which some tests are using:
<start-jboss>
- executes the java program org.jboss.Main in the background (with 64m memory overriding)
- doesn't block, so we need to use <wait-on-host> to wait until startup has completed
<wait-on-host>
- waits for a host to start, by periodically trying to make a connection to http:hostname:8080
- fails the build and spits out a message if not up in 60 seconds
<stop-jboss>
- executes the java program org.jboss.Shutdown in the background
- does not block once executed, therefore we need to follow this with <wait-for-shutdown>
<wait-on-shutdown>
- waits for a host to shutdown, by checks for the presence of "[org.jboss.system.server.Server] Shutdown complete" in server.log
- fails the build and spits out a message if not up in 60 seconds
These old, and really deprecated, methods of startup and shutdown are inferior to server:start and server:stop, in that:
- they will not try to kill a process which won't shut down, but only fail the build
- they also use their own timeouts, which are 60 secs, instead of 120 secs. Given that JBoss configuration 'all' takes 45 secs to start on a fast machine, with no other processes running, this does not account for (i) slow hosts (ii) hosts with several jboss instances running (iii) complex configuration startups.
- the wait-on-shutdown target seems to be broken, as the log entry for system startup has changed from "[org.jboss.system.server.Server] Shutdown complete" to "[Server] Shutdown complete".
There are a number of targets which use <start-jboss>/<wait-on-host> and <stop-jboss>/<wait-on-shutdown> and so are suscecptible to timeouts.
They are:
* tests-compatibility-pooledInvokers
* tests-compatibility
* tests-jacc-securitymgr
* tests-jacc-security-external
* tests-jacc-security-allstarrole
What is worse, the latter three mix a server:start startup with <stop-jboss>/<wait-on-shutdown>.
Thus, in trying to fix the problem of the testsuite failing due to (i) servers failing to start up because other servers have not cleanly shutdown and (ii) server startups timing out and causing the build to fail, it might help by:
1. reworking the targets above to use server:start and server:stop, so that at least we have consistent behaviours being applied to stopping and starting servers.
2. re-investigate the probelm with 'all' not shutting down, which may be sue to OOME. One approach would be to introduce TRACE statements which trace the execution of steps between a client exec'ing Shutdown, and the shutdown() request being processed on the server side.
I'lll go ahead and make these changes to a temporary copy of the source tree and test them out as much as I can.
If anyone has any comments, suggestions, please feel free to comment.
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4029136#4029136
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4029136
19 years
[Design of Messaging on JBoss (Messaging/JBoss)] - Re: Test cases take a long time
by sergeypk
The problem goes away on JDK 1.5 (I used JDK 1.6 originally). I've enabled logs and found that this is what happens:
1. First 300 open/close pairs proceed without any delay.
It seems that JBoss Remoting tries to pool threads processing connections and 300 is the thread pool size. There seems to be some kind of a leak here in that the sockets are not closed when JBoss Messaging is done with the connections. I don't know yet whether Messaging or Remoting is responsible for it.
2. After the first 300 connections have been established, the thread pool is full, thus the 301st createConnection call blocks.
3. After 1 minute has passed, JRE 1.5 runs finalizers that clean up the resources, freeing the 300 threads and allowing the test to proceed for the next 300 connections.
JRE 1.6 does not run finalizers at this point (don't know the reason), thus under JRE 1.6 the test hangs indefinitely.
Thus there are three one-minute pauses during the test (after the 300th, 600th, and 900th iteration) on JDK 1.5, leading to the 3,5 minutes running time.
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4029117#4029117
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4029117
19 years
[Design of JBoss ESB] - Re: Embedded FTP Server and FTP Clients
by mark.little@jboss.com
"weston.price(a)jboss.com" wrote : anonymous wrote :
| | Weston, I think we can all agree that NIH is bad. However, so is believing that everything that has gone before is necessarily the right way of doing things.
| |
|
| Agreed. My point wasn't to assert that it's necessarily correct, or the right way. It was merely to put into play the idea that these things might be worth evaluating, that's it.
|
Absolutely. Even if existing implementations might not look suitable to a "loosely coupled pattern" (just using that as an example), it may be possible to leverage them (e.g., wrap them, or whatever), such that you only end up writing 10% new code. I'd much prefer that to us writing 90% new code.
anonymous wrote :
| Unfortunately, on the flip side of NIH, we also have the exact opposite issue: incorporating technologies that are not only not a good fit, but don't actually solve the initial problem that they were ostensibly designed to address.
|
Yes, I agree with you again. One thing we have to remember with JBossESB is that a lot of the code comes from the Rosetta acquisition and was implemented over 3 years ago. So for example I'm not sure where JBoss JCA was in those days ;-) But purely from a management side of things, if there's something similar/exactly the same elsewhere in JEMS (or other OSS communities), we should look at its applicability. I'd rather you had the headache of support than me!
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4029113#4029113
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4029113
19 years
[Design of JBoss ESB] - Re: Embedded FTP Server and FTP Clients
by mark.little@jboss.com
"bill.burke(a)jboss.com" wrote : Mark, this is what we've been arguing about privately over email. JBPM provides support for long-term flows, and because the flow definition/metadata can either be persisted or travel with the message, federation would be supported.
|
I'm pretty sure we've established that this is a good thing for us to support on the ESB.
anonymous wrote :
| In my mind, even compensating transactions could be covered in the process flow definition. You either rollback in a local transaction to your original save point in the message flow and start over, or you catch the problem and transition to the compensating part of the defined message flow.
|
Until I see an implementation of a process engine that uses a transaction engine as its core for compensations, I'm not convinced that compensations are truly supported. Something like WS-BA (or one of the other dozen or so compensation transaction models) require durability, coordination and reliability characteristics that are often just ignored by process engines (both commercial and open source), leaving a lot up to the user.
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4029110#4029110
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4029110
19 years
[Design of JBoss ESB] - Re: Embedded FTP Server and FTP Clients
by weston.price@jboss.com
anonymous wrote :
| Weston, people have to pay for the LIbrados stuff, don't they? It wouldn't be very hard at all to write our own FTP/SMTP/POP/IMap inbound/outbound connectors.
|
That's why I mentioned that Librados is a semi partner. I believe we have some type of relationship with them that might allow us to deploy their stuff in some version or another. At the very least, I am sure something could be worked out with them for ESB.
The question is never about the difficulty, it's about doing the bare amount of research to determine whether or not we can get something like this off the shelf.
Let's be realistic what we do is not rocket science, though we may like to think so. Like the ' 1000 monkey's writing Shakespeare' thing, throw enough clowns in front of a terminal, give them enough time and they will eventually write EJB3...oh wait...nevermind :-)
IMO we have a hard core case of NIH at JBoss. It permeates most projects and ends up costing us more than we ultimately realize until it's too late (you can insert one of 10 technologies we wrote from scatch in AS in this category).
Sure, I could sit down and write my own concurrency abstractions (ThreadPool, Semaphore etc) but why not just use what's in the VM? Similarly, we could sit down and write a bunch of adapters to handle FTP, SMTP, POP. but if we can possibly get a tested piece of code off the shelf, it may be beneficial to look into it.
BTW we already have POP/IMAP via the Mail adapter :-)
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4029106#4029106
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4029106
19 years