[jboss-as7-dev] Clustering tests
Brian Stansberry
brian.stansberry at redhat.com
Thu Mar 8 12:14:12 EST 2012
On 3/8/12 10:48 AM, Richard Achmatowicz wrote:
> Kabir, what exactly is admin mode? Is it the same server profile but
> with controls on communication from clients?
>
Simplest explanation:
[standalone at localhost:9999 /] :read-operation-description(name=reload)
{
"outcome" => "success",
"result" => {
"operation-name" => "reload",
"description" => "Reloads the server by shutting down all its
services and starting again. The JVM itself is not restarted.",
"request-properties" => {"admin-only" => {
"type" => BOOLEAN,
"description" => "Whether the server should start in
running mode ADMIN_ONLY when it restarts. An ADMIN_ONLY server will
start any configured management interfaces and accept management
requests, but will not start services used for handling end user requests.",
"expressions-allowed" => false,
"required" => false,
"nillable" => true
}},
"reply-properties" => {},
"read-only" => false
}
}
> The problem is that you're replacing a dead server (no communication
> channels) with a live server in admin mode (with live communication
> channels). So IIUC, you depend on the admin mode working correctly to
> block communications from the outside.
True. It's a tradeoff. I certainly wouldn't advocate doing this every
place we want to restart a server. But some places it might make sense.
If we end up stopping and starting servers 100 times over the course of
a clustering testsuite run, some of those may not really need a full
process restart.
If an admin-only server has a running JGroups channel, the JGroups
subsystem is broken. When you start in admin-only mode, the
Stage.RUNTIME handlers for subsystems should not execute.
> Also, if JGroups detected failed
> group members using a standard OS ping mechanism (which it doesn't),
> then this could change JGroups view of failed vs non-failed hosts - that
> is, if admin mode did not also block pings.
If JGroups is using simple OS pings (which it isn't) to validate the
existence of a working peer, then JGroups is badly broken. ;) And if it
started doing that and it found the socket for the channel open on an
admin-only peer, then the JGroups subsystem is broken.
Bottom line, I think this is a good thing to think about and keep in
mind as the clustering testsuite grows. It's not a no-brainer thing to
do, but it is one possible way to slow down the rapid increase in
testsuite execution time that we are seeing. Execution time is a real
issue that impacts quality, since a major quality factor is our ability
to have all patches manually reviewed and tested before they are merged.
>
>
> On 03/08/2012 11:28 AM, Radoslav Husar wrote:
>> On 03/06/2012 06:34 PM, Kabir Khan wrote:
>>> Hi,
>>>
>>> I've had a vague thought. I'm not familiar with the clustering tests
>>> bit from a glance there seems to be a lot of stopping and starting of
>>> servers going on? These tests run very slowly, so if all this
>> Hi Kabir,
>>
>> Thanks for the tough. Yes that is true and surely does contribute to the
>> slowness.
>>
>> The reason for the lots of stopping is because we need to use manual
>> containers (for simulating failover, state transfer, undeploy,
>> singleton, etc). The containers are set to manual mode and unmanaged
>> deployments.
>>
>> Unfortunately, this currently means that the containers are stopped
>> after *each* test class and then started again. ARQ does this. We would
>> also need to be able to order the test class executions. Most of these
>> cases (currently all) the running container could be re-used (even
>> though we shut it down in the middle of the test).
>>
>> For this we would probably need support in Arquillian. (class order +
>> dont turn off container after class?) Aslak, WDYT?
>>
>>> stopping/starting is contributing to the slowness, it could be an
>>> idea to explore changing the container so that
>>>
>>> stop -> executes a reload on the server so that it is reloaded in
>>> admin-only mode start -> executes a reload on the server so that is
>>> is reloaded in 'normal' mode
>>>
>>> Radoslav, do you see any issues with this?
>> How would this actually work? Use auto containers and use stop and start
>> to simulate up and down without having to do any fix in ARQ? Hmhm, thats
>> interesting.
>>
>> This would however prohibit use of stop by kill or custom "kill" and
>> probably use of different container confs as well.
>>
>> However, this seems to me too much of a workaround (for instance the
>> server's JVM never shuts down).
>>
>> Rado
>> _______________________________________________
>> jboss-as7-dev mailing list
>> jboss-as7-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
>
> _______________________________________________
> jboss-as7-dev mailing list
> jboss-as7-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
--
Brian Stansberry
Principal Software Engineer
JBoss by Red Hat
More information about the jboss-as7-dev
mailing list