2014-11-26 17:28 GMT+01:00 Emmanuel Bernard <emmanuel(a)hibernate.org>:
> On 26 Nov 2014, at 15:21, Gunnar Morling <gunnar(a)hibernate.org> wrote:
>
> -11-26 12:42 GMT+01:00 Sanne Grinovero <sanne(a)hibernate.org <mailto:
sanne(a)hibernate.org>>:
> It looks like you're aiming at a "pure" mapping into primitives for
> the datagrid.
>
> So it looks very beautiful and tempting to go for a model such as
> > cache.put( "identifier name", ...)
> but it seems quite dangerous to me for the same reason that you store
> (conceptually):
> {"firstname", "lastname" }, { "Emmanuel",
"Bernard" }
> rather than storing:
> { "Emmanuel", "Bernard" }
>
> Obviously the second one looks more natural in the storage, but you're
> not really sure what these tokens were supposed to represent in case
> someone decides to refactor the model.
> I understand that it's now quite safe to remove the "tablename" in
the
> per-cache-table model, as entries would still be isolated: that was
> the goal, but also it matches exactly the model proven by the RDBMs
> model.
> But there are implications in terms of flexibility and schema
> evolution if we remove the "column names" and generally speaking it's
> our only way of validating what an entry was supposed to model.
>
> Yes, evolution is a very strong argument indeed for sticking to the
current approach. Without the column names (or some other form of
descriptor as suggested below) we will not be able to recognize the version
of a given key so we cannot apply any "migrations" to it, either upon
loading or via some sort of batch run.
Let me challenge that a bit even if I understand that there is a potential
problem. type and id are the invariable part of the data you put in a
datastore.
So the data migration / morphing does happen on the *value* much more than
on the key itself.
You would be able to apply migrations in that case.
True, the need for evolution will be higher for the values, but can we
really completely rule it out for keys in stores without a fixed-schema? It
seems to be a restriction we'd apply, whereas a user otherwise would be
free to e.g. add a column to the key.
>
> Speaking of, like we don't normally store the "tablename" in a column
> of a table in an RDBMs, we don't really store its column names either.
> So an alternative solution which more closely matches the proven RDBMs
> model would be to store the schema representation of the table in the
> Cache:
>
> personsCache.put( SchemaGenerationId{1}, { ORDERED_ARRAY_STRATEGY,
> "firstname", "lastname") );
>
> then you would need to store entries linking them to a specific
> Schema, such as { "Emmanuel", "Bernard", SchemaGenerationId{1}
}.
>
> such a SchemaGenerationId would be a cheap singleton (one per
> "table"), and could be stored as efficiently as two integers (one for
> the Marshaller id and one int for the schema generation id).
>
> ORDERED_ARRAY_STRATEGY could be an Enum, and give you some flexibility
> among your proposals. With the current model I'd stick to the Map as
> they are the only one safe enough, but with a schema definition like
> the above description I'd definitely want to use the ordered sequence
> (array?) as it's far more efficient at all levels.
> A benefit is that I suspect that you could then transactionally evolve
> the schema, and it wouldn't be too hard for us to provide a tool to
> perform an "online schema migration".
>
> That's an interesting idea. Or having a separate KeyDescriptor cache
which holds an entry for each key type? Mixing the key definition and
records using it within one cache seems a bit odd to me.
It is interesting. But are we in the database business?
If we are interested in this approach, maybe we should create a side
project that offers schema atop the most common k/v?
It's a grey area. It'd basically be a way to describe the "schema" for
each
single record in a more efficient manner. It'd not be a schema description
per table/cache.
I guess that's one of the general issues of K/V stores which don't know
much about the data; A document store at least know the syntactical
structure and could store field names via references to a shared constant
pool rather than persisting them within each document.
_______________________________________________
hibernate-dev mailing list
hibernate-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev