[hibernate-dev] [OGM] storing the column names in the entity keys for K/V stores

Gunnar Morling gunnar at hibernate.org
Wed Nov 26 09:21:58 EST 2014


2014-11-26 12:42 GMT+01:00 Sanne Grinovero <sanne at hibernate.org>:

> It looks like you're aiming at a "pure" mapping into primitives for
> the datagrid.
>
> So it looks very beautiful and tempting to go for a model such as
>  > cache.put( "identifier name", ...)
> but it seems quite dangerous to me for the same reason that you store
> (conceptually):
>   {"firstname", "lastname" }, { "Emmanuel", "Bernard" }
> rather than storing:
>   { "Emmanuel", "Bernard" }
>
> Obviously the second one looks more natural in the storage, but you're
> not really sure what these tokens were supposed to represent in case
> someone decides to refactor the model.
> I understand that it's now quite safe to remove the "tablename" in the
> per-cache-table model, as entries would still be isolated: that was
> the goal, but also it matches exactly the model proven by the RDBMs
> model.
> But there are implications in terms of flexibility and schema
> evolution if we remove the "column names" and generally speaking it's
> our only way of validating what an entry was supposed to model.
>

Yes, evolution is a very strong argument indeed for sticking to the current
approach. Without the column names (or some other form of descriptor as
suggested below) we will not be able to recognize the version of a given
key so we cannot apply any "migrations" to it, either upon loading or via
some sort of batch run.


>
> Speaking of, like we don't normally store the "tablename" in a column
> of a table in an RDBMs, we don't really store its column names either.
> So an alternative solution which more closely matches the proven RDBMs
> model would be to store the schema representation of the table in the
> Cache:
>
> personsCache.put( SchemaGenerationId{1}, { ORDERED_ARRAY_STRATEGY,
> "firstname", "lastname") );
>
> then you would need to store entries linking them to a specific
> Schema, such as { "Emmanuel", "Bernard", SchemaGenerationId{1} }.
>
> such a SchemaGenerationId would be a cheap singleton (one per
> "table"), and could be stored as efficiently as two integers (one for
> the Marshaller id and one int for the schema generation id).
>
> ORDERED_ARRAY_STRATEGY could be an Enum, and give you some flexibility
> among your proposals.  With the current model I'd stick to the Map as
> they are the only one safe enough, but with a schema definition like
> the above description I'd definitely want to use the ordered sequence
> (array?) as it's far more efficient at all levels.
> A benefit is that I suspect that you could then transactionally evolve
> the schema, and it wouldn't be too hard for us to provide a tool to
> perform an "online schema migration".
>

That's an interesting idea. Or having a separate KeyDescriptor cache which
holds an entry for each key type? Mixing the key definition and records
using it within one cache seems a bit odd to me.

You make a great point about making it easier to run native queries.
> Is that a new goal we have? It seems we have to define the goals we
> want, as the proper data abstraction goal seems to clash with it.
> I'd rather make a custom Query walker which understand how we store
> things in Infinispan, and keep the safety of our more verbose and less
> efficient storage model. For example an inspection tool connected to
> the grid could choose to not show the "SchemaId" tokens, but use them
> to be able to render the entry in some human understandable way, like
> by adding the column names on a table.
>
> Some more notes:
>  For HashMap there is a specialized Marshaller already. HashMaps are
> horrific to instantiate at runtime though, in terms of memory, and
> also not as efficient as arrays in terms of CPU of course.
>  We didn't mention javax.persistence.IdClass but I assume the same applies.
>
> Sanne
>
> On 25 November 2014 at 13:30, Emmanuel Bernard <emmanuel at hibernate.org>
> wrote:
> > Hi,
> >
> > With OGM-452 behind us which brings one cache per “table”, we now have
> > another decision in front of us.
> >
> > Should we use a synthetic key for the cache key (say a
> > PersistentEntityKey class containing the array of column names and the
> > array of column values)?
> > Or should we use the natural object key?
> >
> > == Natural entity key
> >
> > In the latter, things gets complicated quickly, let me explain:
> >
> > === Simple case
> >
> > For simple cases, the id is a simple property and the fit is very
> > natural
> >
> > [source]
> > --
> > @Entity
> > class User {
> >     @Id String name;
> >     ...
> > }
> >
> > //corresponds to
> > cache.put(name, mapRepresentingUser);
> > --
> >
> > === Embedded id
> >
> > If the identifier is an embedded id, you have several choices that all
> have
> > drawbacks.
> >
> > 1. use the embedded id class as key `cache.put( new Name("Emmanuel",
> "Bernard"), mapRepresentingUser );`
> > 2. use an array of property values `cache.put( new Object[] {"Emmanuel",
> "Bernard"}, mapRepresentingUser );`
> > 3. use a Map<String,Object> corresponding to the array `cache.put( new
> HashMap<String,Object>( {{ "firstname" -> "Emmanuel", "lastname"->"Bernard"
> } ), mapRepresentingUser );
> > 4. use an synthetic key `cache.put( new PersistentEntityKey( new
> String[] {"firstname", "lastname" }, new String[] { "Emmanuel", "Bernard" }
> ), mapRepresentingUser);`
> >
> > In 1, the problem is that we lose the proper data type abstraction
> > between the object model and the data stored. `Name` is a user class.
> >
> > In 2, I think the model is somewhat acceptable but a bit arbitrary.
> >
> > In 3, I suspect the map is pretty horrific to serialize - that could be
> > solved by a externalizer. But more importantly the order of the id
> > columns is lost - even though it might be recoverable with
> > EntityKeyMetadata?
> >
> > In 4, we expose the person querying the grid to our OGM specific type.
> > Aside from this, it is essentially like 4.
> >
> > === Entity key approach
> >
> > I really like the idea of the simple case be mapped directly, it makes
> > for *the* natural mapping one would have chosen. But as I explained, it
> > does not scale.
> > In the composite id case, I don't really know what to chose between 2, 3
> > and 4.
> >
> > So, should we go for the simple case if we can? Or favor consistency
> > between the simple and complex case?
> > And which of the complex case do we favor?
> >
> > == Association
> >
> > In the case of associations, it becomes a bit trickier because the
> > "simple case" where the association key is made of a single column is
> > quite uncommon. Association keys are one of these combinations:
> >
> > * the fk to the owning entity + the index or key of the List or Map
> > * the fk to the owning entity + the fk to the target entity (Set)
> > * the fk to the owning entity + the list of columns of the simple or
> > * embedded type (Set)
> > * the fk to the owning entity + the surrogate id of the Bag
> > * all columns in case of a non id backed bag
> >
> > All that to say that we are most of the time in the complex case of
> > EntityKey with one of the 4 choices.
> >
> > Any thoughts and preferences?
> >
> > Emmanuel
> > _______________________________________________
> > hibernate-dev mailing list
> > hibernate-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>


More information about the hibernate-dev mailing list