We are basically talking about schema change management. Just as application code may
change from one release to another, schema changes might also happen. This includes things
like the addition/removal of tables and/or columns as well as possibly moving or
transforming data from one set of tables to another set of tables. Liquibase is a common,
popular tool for schema change management with relational databases. RHQ used a home grown
tool, very similar in concept. The discussion is around what tooling we have for
Cassandra.
On Jan 5, 2016, at 1:15 PM, John Doyle <jdoyle(a)redhat.com>
wrote:
Can we back up and describe what we're migrating from and to, and the event thats
triggered the migration? Maybe that's implicit in the question for everyone else, not
for me.
thx
~jd
On Mon, Jan 4, 2016 at 10:35 AM, John Sanda <jsanda(a)redhat.com
<mailto:jsanda@redhat.com>> wrote:
On Jan 4, 2016, at 5:43 AM, Juraci Paixão Kröhling <jpkroehling(a)redhat.com
<mailto:jpkroehling@redhat.com>> wrote:
>
> Team,
>
> What's the recommended approach for handling data migrations? Is there a
> library similar to liquibase?
>
> - Juca.
>
Liquibase is designed specifically for the RDBMS. When RHQ started moving to Cassandra, I
started working on a patch for Liquibase to add support for Cassandra. After some
discussion on the liquibase dev list, I eventually decided to abandon the effort because
of the amount of changes involved and because it became clearer that liquibase was not a
good fit because of it being very RDBMS-centric. We decided to implement our own solution
in RHQ to address our immediate needs. It has been a while since I have looked to see what
other solutions might be out there. I have come across something for Rails applications,
and I think someone may have tried to add support in Flyway.
There are some things that need to be taken into consideration. I will briefly discuss
some of those now.
* Should the migrations be done at installation/deployment time or at runtime?
This is probably the most important consideration because everything else in large part
stems from it. Some changes like adding/removing a column or adding/removing a table are
fast and efficient in Cassandra. I therefore think it is acceptable to do these types of
changes at deployment time. Other changes like adding data or moving/transforming data
that could be long running operations. While it increases application code complexity,
these changes should be done at runtime generally speaking.
* How should migrations be implemented?
With the RDBMS, we can easily manipulate, transform, and move data with SQL. That is not
the case with CQL. We have to resort to writing code on top of the driver to make the
changes. In some situations a better approach might be to generate new SSTables and stream
those into Cassandra. For larger data migrations is likely to be a faster as you
completely bypass the whole CQL layer. Ultimately, I think both approaches need to be an
option.
* Where should migration meta data be stored?
We need to keep track of migrations that have been applied. There might be migrations
that are specific to a particular environment, e.g., dev vs prod. Since we are trying to
avoid additional data stores, I think it makes sense to store migration meta data in
Cassandra. Maybe we a migrations keyspace that tracks the migrations for each of the
hawkular keyspaces.
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org <mailto:hawkular-dev@lists.jboss.org>
https://lists.jboss.org/mailman/listinfo/hawkular-dev
<
https://lists.jboss.org/mailman/listinfo/hawkular-dev>
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev