[hibernate-dev] [OGM] storing the column names in the entity keys for K/V stores

Thu Nov 27 03:13:47 EST 2014

2014-11-26 17:28 GMT+01:00 Emmanuel Bernard <emmanuel at hibernate.org>:

>
> > On 26 Nov 2014, at 15:21, Gunnar Morling <gunnar at hibernate.org> wrote:
> >
> > -11-26 12:42 GMT+01:00 Sanne Grinovero <sanne at hibernate.org <mailto:
> sanne at hibernate.org>>:
> > It looks like you're aiming at a "pure" mapping into primitives for
> > the datagrid.
> >
> > So it looks very beautiful and tempting to go for a model such as
> >  > cache.put( "identifier name", ...)
> > but it seems quite dangerous to me for the same reason that you store
> > (conceptually):
> >   {"firstname", "lastname" }, { "Emmanuel", "Bernard" }
> > rather than storing:
> >   { "Emmanuel", "Bernard" }
> >
> > Obviously the second one looks more natural in the storage, but you're
> > not really sure what these tokens were supposed to represent in case
> > someone decides to refactor the model.
> > I understand that it's now quite safe to remove the "tablename" in the
> > per-cache-table model, as entries would still be isolated: that was
> > the goal, but also it matches exactly the model proven by the RDBMs
> > model.
> > But there are implications in terms of flexibility and schema
> > evolution if we remove the "column names" and generally speaking it's
> > our only way of validating what an entry was supposed to model.
> >
> > Yes, evolution is a very strong argument indeed for sticking to the
> current approach. Without the column names (or some other form of
> descriptor as suggested below) we will not be able to recognize the version
> of a given key so we cannot apply any "migrations" to it, either upon
> loading or via some sort of batch run.
>
> Let me challenge that a bit even if I understand that there is a potential
> problem. type and id are the invariable part of the data you put in a
> datastore.
> So the data migration / morphing does happen on the *value* much more than
> on the key itself.
> You would be able to apply migrations in that case.
>

True, the need for evolution will be higher for the values, but can we
really completely rule it out for keys in stores without a fixed-schema? It
seems to be a restriction we'd apply, whereas a user otherwise would be
free to e.g. add a column to the key.

> >
> > Speaking of, like we don't normally store the "tablename" in a column
> > of a table in an RDBMs, we don't really store its column names either.
> > So an alternative solution which more closely matches the proven RDBMs
> > model would be to store the schema representation of the table in the
> > Cache:
> >
> > personsCache.put( SchemaGenerationId{1}, { ORDERED_ARRAY_STRATEGY,
> > "firstname", "lastname") );
> >
> > then you would need to store entries linking them to a specific
> > Schema, such as { "Emmanuel", "Bernard", SchemaGenerationId{1} }.
> >
> > such a SchemaGenerationId would be a cheap singleton (one per
> > "table"), and could be stored as efficiently as two integers (one for
> > the Marshaller id and one int for the schema generation id).
> >
> > ORDERED_ARRAY_STRATEGY could be an Enum, and give you some flexibility
> > among your proposals.  With the current model I'd stick to the Map as
> > they are the only one safe enough, but with a schema definition like
> > the above description I'd definitely want to use the ordered sequence
> > (array?) as it's far more efficient at all levels.
> > A benefit is that I suspect that you could then transactionally evolve
> > the schema, and it wouldn't be too hard for us to provide a tool to
> > perform an "online schema migration".
> >
> > That's an interesting idea. Or having a separate KeyDescriptor cache
> which holds an entry for each key type? Mixing the key definition and
> records using it within one cache seems a bit odd to me.
>
> It is interesting. But are we in the database business?
> If we are interested in this approach, maybe we should create a side
> project that offers schema atop the most common k/v?
>

It's a grey area. It'd basically be a way to describe the "schema" for each
single record in a more efficient manner. It'd not be a schema description
per table/cache.

I guess that's one of the general issues of K/V stores which don't know
much about the data; A document store at least know the syntactical
structure and could store field names via references to a shared constant
pool rather than persisting them within each document.

> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>