Re: [hibernate-dev] [OGM] storing the column names in the entity keys for K/V stores

Wednesday, 26 November 2014

...
 On 26 Nov 2014, at 15:21, Gunnar Morling <gunnar(a)hibernate.org&gt;
wrote:

 -11-26 12:42 GMT+01:00 Sanne Grinovero <sanne(a)hibernate.org
<mailto:sanne@hibernate.org>>:
 It looks like you're aiming at a "pure" mapping into primitives for
 the datagrid.

 So it looks very beautiful and tempting to go for a model such as
  > cache.put( "identifier name", ...)
 but it seems quite dangerous to me for the same reason that you store
 (conceptually):
   {"firstname", "lastname" }, { "Emmanuel",
"Bernard" }
 rather than storing:
   { "Emmanuel", "Bernard" }

 Obviously the second one looks more natural in the storage, but you're
 not really sure what these tokens were supposed to represent in case
 someone decides to refactor the model.
 I understand that it's now quite safe to remove the "tablename" in the
 per-cache-table model, as entries would still be isolated: that was
 the goal, but also it matches exactly the model proven by the RDBMs
 model.
 But there are implications in terms of flexibility and schema
 evolution if we remove the "column names" and generally speaking it's
 our only way of validating what an entry was supposed to model.

 Yes, evolution is a very strong argument indeed for sticking to the current approach.
Without the column names (or some other form of descriptor as suggested below) we will not
be able to recognize the version of a given key so we cannot apply any
"migrations" to it, either upon loading or via some sort of batch run. 
Let me challenge that a bit even if I understand that there is a potential problem. type
and id are the invariable part of the data you put in a datastore.
So the data migration / morphing does happen on the *value* much more than on the key
itself.
You would be able to apply migrations in that case.

...

 Speaking of, like we don't normally store the "tablename" in a column
 of a table in an RDBMs, we don't really store its column names either.
 So an alternative solution which more closely matches the proven RDBMs
 model would be to store the schema representation of the table in the
 Cache:

 personsCache.put( SchemaGenerationId{1}, { ORDERED_ARRAY_STRATEGY,
 "firstname", "lastname") );

 then you would need to store entries linking them to a specific
 Schema, such as { "Emmanuel", "Bernard", SchemaGenerationId{1} }.

 such a SchemaGenerationId would be a cheap singleton (one per
 "table"), and could be stored as efficiently as two integers (one for
 the Marshaller id and one int for the schema generation id).

 ORDERED_ARRAY_STRATEGY could be an Enum, and give you some flexibility
 among your proposals.  With the current model I'd stick to the Map as
 they are the only one safe enough, but with a schema definition like
 the above description I'd definitely want to use the ordered sequence
 (array?) as it's far more efficient at all levels.
 A benefit is that I suspect that you could then transactionally evolve
 the schema, and it wouldn't be too hard for us to provide a tool to
 perform an "online schema migration".

 That's an interesting idea. Or having a separate KeyDescriptor cache which holds an
entry for each key type? Mixing the key definition and records using it within one cache
seems a bit odd to me. 
It is interesting. But are we in the database business?
If we are interested in this approach, maybe we should create a side project that offers
schema atop the most common k/v?

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] [OGM] storing the column names in the entity keys for K/V stores