[jboss-as7-dev] Clustering tests

Brian Stansberry brian.stansberry at redhat.com
Thu Mar 8 12:14:12 EST 2012


On 3/8/12 10:48 AM, Richard Achmatowicz wrote:
> Kabir, what exactly is admin mode? Is it the same server profile but
> with controls on communication from clients?
>

Simplest explanation:

[standalone at localhost:9999 /] :read-operation-description(name=reload)
{
     "outcome" => "success",
     "result" => {
         "operation-name" => "reload",
         "description" => "Reloads the server by shutting down all its 
services and starting again. The JVM itself is not restarted.",
         "request-properties" => {"admin-only" => {
             "type" => BOOLEAN,
             "description" => "Whether the server should start in 
running mode ADMIN_ONLY when it restarts. An ADMIN_ONLY server will 
start any configured management interfaces and accept management 
requests, but will not start services used for handling end user requests.",
             "expressions-allowed" => false,
             "required" => false,
             "nillable" => true
         }},
         "reply-properties" => {},
         "read-only" => false
     }
}

> The problem is that you're replacing a dead server (no communication
> channels) with a live server in admin mode (with live communication
> channels). So IIUC, you depend on the admin mode working correctly to
> block communications from the outside.

True. It's a tradeoff. I certainly wouldn't advocate doing this every 
place we want to restart a server. But some places it might make sense. 
If we end up stopping and starting servers 100 times over the course of 
a clustering testsuite run, some of those may not really need a full 
process restart.

If an admin-only server has a running JGroups channel, the JGroups 
subsystem is broken. When you start in admin-only mode, the 
Stage.RUNTIME handlers for subsystems should not execute.

> Also, if JGroups detected failed
> group members using a standard OS ping mechanism (which it doesn't),
> then this could change JGroups view of failed vs non-failed hosts - that
> is, if admin mode did not also block pings.

If JGroups is using simple OS pings (which it isn't) to validate the 
existence of a working peer, then JGroups is badly broken. ;) And if it 
started doing that and it found the socket for the channel open on an 
admin-only peer, then the JGroups subsystem is broken.

Bottom line, I think this is a good thing to think about and keep in 
mind as the clustering testsuite grows. It's not a no-brainer thing to 
do, but it is one possible way to slow down the rapid increase in 
testsuite execution time that we are seeing. Execution time is a real 
issue that impacts quality, since a major quality factor is our ability 
to have all patches manually reviewed and tested before they are merged.

>
>
> On 03/08/2012 11:28 AM, Radoslav Husar wrote:
>> On 03/06/2012 06:34 PM, Kabir Khan wrote:
>>> Hi,
>>>
>>> I've had a vague thought. I'm not familiar with the clustering tests
>>> bit from a glance there seems to be a lot of stopping and starting of
>>> servers going on? These tests run very slowly, so if all this
>> Hi Kabir,
>>
>> Thanks for the tough. Yes that is true and surely does contribute to the
>> slowness.
>>
>> The reason for the lots of stopping is because we need to use manual
>> containers (for simulating failover, state transfer, undeploy,
>> singleton, etc). The containers are set to manual mode and unmanaged
>> deployments.
>>
>> Unfortunately, this currently means that the containers are stopped
>> after *each* test class and then started again. ARQ does this. We would
>> also need to be able to order the test class executions. Most of these
>> cases (currently all) the running container could be re-used (even
>> though we shut it down in the middle of the test).
>>
>> For this we would probably need support in Arquillian. (class order +
>> dont turn off container after class?) Aslak, WDYT?
>>
>>> stopping/starting is contributing to the slowness, it could be an
>>> idea to explore changing the container so that
>>>
>>> stop ->    executes a reload on the server so that it is reloaded in
>>> admin-only mode start ->    executes a reload on the server so that is
>>> is reloaded in 'normal' mode
>>>
>>> Radoslav, do you see any issues with this?
>> How would this actually work? Use auto containers and use stop and start
>> to simulate up and down without having to do any fix in ARQ? Hmhm, thats
>> interesting.
>>
>> This would however prohibit use of stop by kill or custom "kill" and
>> probably use of different container confs as well.
>>
>> However, this seems to me too much of a workaround (for instance the
>> server's JVM never shuts down).
>>
>> Rado
>> _______________________________________________
>> jboss-as7-dev mailing list
>> jboss-as7-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
>
> _______________________________________________
> jboss-as7-dev mailing list
> jboss-as7-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev


-- 
Brian Stansberry
Principal Software Engineer
JBoss by Red Hat


More information about the jboss-as7-dev mailing list