[wildfly-dev] Management operation for legacy subsystem migration

Wed Apr 29 10:58:07 EDT 2015

On 4/29/15 9:00 AM, Jeff Mesnil wrote:
> With WildFly 9 and 10, we will have new subsystems that will replace some older subsystems (called legacy subsystems below).
>
> We have to deal with migrating these subsystems:
>
> * migrate from web (JBoss Web)     to undertow
> * migrate from messaging (HornetQ) to messaging-activemq (with Apache ActiveMQ Artemis)
> * migrate from jacorb              to iiop-openjdk
>
> These 3 tasks are about providing a management operation to perform one-time migration (i.e. the migration is an operation performed by the server on its management model).
>
> I have started to look at this from the messaging perspective.
>
> To constrain this task, I have added some requirements:
>
> 1. the legacy subsystem must be an empty shell and has no runtime
> => in WildFLy 10, /subsystem=messaging is only exposing its management model but there is no runtime (HornetQ server library is not included)

I don't see why this needs to be a requirement. In the jacorb case, it 
will be violated, as the jacorb subsystem can run as a kind of 
"compatibility subsystem", delegating to the iiop-openjdk runtime 
services as long as the config doesn't specify any settings that can't 
be translated.

> 2. the server must be in admin-only mode during migration
> => the server is not serving any client during migration.
> => the migration deals only with the server management model by creating the model for the new subsystem based on the legacy subsystem's model

+1 for server

It would be very nice to say the same for the DC, but it's worth 
discussing. Requiring the DC to be in admin-only creates a fair bit of 
usability downside. The whole domain reacts when the DC goes away. Once 
we have automatic failover of the DC, it may result in the failover DC 
taking over, introducing other problems. (Ken Wills -- this is one to 
think about in general re: DC failover.)

It's possible we could allow a migration to happen on a profile where 
there aren't any servers running. The basic domain rollout behavior 
would allow this without any change. If the HCs allowed the migrate op 
to run, and then it gets rolled out to any affected servers, the servers 
will reject the op (as they aren't admin-only.) So the op will be rolled 
back. (Except in an extreme corner case where the user specifies a 
rollout plan that says all servers can fail.)

> 3. Data are not moved during this migration operation
> => moving messages from HornetQ to ActiveMQ destinations is not performed during this management migration.
> => we already have process (such as using JMS bridges) to move messages from one messaging provider to another

+1

>
> Having these three requirements simplifies the migration task and sounds reasonable.
> Do you foresee any issues with having them?
>
> Given these requirements, the legacy subsystem would need to expose a :migrate operation (at the root of the subsystem) to perform the actual migration of the management model.
>
> Its pseudo code would be something like:
>
> * check the server is in admin-only mode
> defined any child resource)
> * :describe the legacy subsystem model
> * transform the legacy subsystem description to the new subsystem
> => if everything is successful
>    * create a composite operation to add the new messaging-activemq extension and all the transformed :add operations
>    * report the composite operation outcome to the user
> => else
>    * report the error(s) to the user
>

There needs to be validation as to whether the extension is already 
present; if so skip the add.

This is an edge case on a server but is fairly likely on a DC, where the 
extension may be used in profile-new, and now the users wants to migrate 
the subsystem in profile-old.

Also, we're going to need to change how the "composite" op works, as 
including an extension=foo:add in the same composite as steps that touch 
its subsystems won't work. There's a JIRA for this, but JIRA doesn't 
seem to be responding at the moment.

There's a workaround for that issue if necessary; just have the migrate 
op add a step that deals with the extension add and then a next step 
with the composite.

> It is possible that the legacy subsystem can not be fully migrated (e.g. if it defines an attribute that has no equivalent on the new subsystem). In that case, the :migrate operation reports the error(s) to the user.
> The user can then change the legacy subsystem model to remove the problematic resource/attributes and invoke :migrate again
>
> For the messaging subsystem, I expect that it will not be possible to fully migrate the replication configuration of the legacy subsystem to the new subsystem (the configuration has significantly changed between HornetQ and ActiveMQ, some configuration will be incompatible).
> In that case, I'd expect the user to migrate to the new messaging-activemq subsystem by discarding the legacy subsystem's replication configuration, invoke :migrate and then configure replication for the new subsystem.
>
> In my proof of concept, the :migrate operation has a dry-run boolean attribute. If set to true, the operation will not run the composite operation. It will instead return to the user the list of operations that will be executed when the :migrate operation is actually performed.
>

The return value should be the same regardless of the value of that 
attribute. Ops have single return value description.

We need to be careful about RBAC. This basically amounts to one op that 
then convinces the server to do a whole bunch of other stuff.

A simple thing there is to just mark the op as sensitive. That's 
conservative but inelegant.

I don't think that's necessary though. The return value from this op is 
a bunch of steps. That provides a lot of data the user may not be 
authorized to see. But, to provide that data, you're going to do a lot 
of reads, and the access control layer will reject the reads if the user 
is not authorized. So the user shouldn't be able to use this to see data 
they shouldn't.

As for writes, if the users isn't authorized to do the writes, then they 
will fail.

> I have talked to Tomek which is charge of the iiop migration and he has an additional requirement to emulate the legacy jacorb subsystem with the new iioop-openjdk subsystem. I have not this requirement for the messaging subsystem so I have not given much thought about it...
> Same goes for the web -> undertow migration.
>

Oh, I should have read the whole thing before starting to reply!

> It's also important to note that this operation to migrate the management model of a legacy subsystem to a new one is only one step of the whole migration story.
> For messaging, the workflow to upgrade an WFLY 9 server to WFLY 10 is made of several other steps (and I may have forgotten some)
>
> * install the new server
> * copy the old configuration (with the legacy messaging subsystem)
> * start the new server in admin-only mode
> * invoke /subsystem=messaging:migrate
>    => rinse and repeat by tweaking the legacy subsystem until the migration is successful
> * if migration of data can be done offline, do it now (the server is in admin-only mode, so it's ok)
> * reload the server to return to running mode with the new messaging subsystem
> * if the migration of data must be done offline, it can be done now
>    (e.g. create a new JMS bridge from the old running WFLY9/messaging server to this new WFLY10/messaging-activemq server)
> * if everything is fine, invoke /subsystem=messaging:remove to remote the legacy subsystem model.
>
> Any comment, critic, feedback?
>

Sounds good. We need to think in general about the domain case.

-- 
Brian Stansberry
Senior Principal Software Engineer
JBoss by Red Hat