[hibernate-dev] [OGM] Thoughts for the Infinispan / Hot Rod dialect

Wed Jul 29 06:29:51 EDT 2015

On 29 July 2015 at 11:12, Gunnar Morling <gunnar at hibernate.org> wrote:
>
>
> 2015-07-28 16:07 GMT+02:00 Sanne Grinovero <sanne at hibernate.org>:
>>
>> Hi all,
>> with Infinispan in embedded mode we used AtomicMaps and
>> FineGrainedAtomicMaps as an alternative way to map attributes and
>> relations.
>>
>> In particular the relations are interesting because in SQL world one
>> would run a query on junction tables, and on Infinispan embedded
>> queries would only be an option on Hibernate Search / Infinispan Query
>> annotated attributes, and also the AtomicMaps allow us to only load
>> the section of relevant data (like on a RDBMs).
>> The difference between the two kinds of AtomicMaps is in the locking
>> level, each similar to the same kind of locking we'd normally have.
>>
>> On Hot Rod, AtomicMaps are not available so we opened (a long time
>> ago) a feature request to implement them for Hot Rod - at least Java
>> clients. Still, we don't have transactions in this case either so the
>> locking benefits are also unavailable.
>>
>> I think that in the case of Hot Rod clients we should not use
>> AtomicMaps, but rather resort to a protobuf schema generation, and
>> essentially use the more traditional "query on jointables" approach.
>
>
> The alternative would be to use RemoteCache directly and store the tuple
> representations of entities/associations, right? But then, IIUC, queries
> would not work?

Queries would not work, but also you'd not have relations.
Finally, you'd also not be storing attributes of a single entity but
you'd be essentially serializing the entity in a blob whose encoding
is highly dependant on the OGM version, the Infinispan version, etc..
storing "as is" in a RemoteCache would introduce several drawbacks
which we've been able to avoid so far.

>
>>
>> Hot Rod nowadays supports queries, and they can be indexed or non
>> indexed so we could enable indexing on the ad-hoc tables we build for
>> relations mapping, have the user "opt in" for additional indexes, and
>> still allow some level of querying for the fields which have not been
>> indexed; of course w/o join operations.
>>
>> We can generate an appropriate schema and upload it to Hot Rod
>> automatically; that sounds like a great usability improvement for all
>> Java clients dealing with Infinispan/remote, as its schema ads quite
>> some stuff to the learning curve.
>> Still, this automatic generation is a new and challenging field; some
>> notes:
>>  - protobuf schemas are generational -> more effective if you can
>> generate the new one based on the existing one
>>  - there's a Java encoder by Adrian here:
>> https://github.com/infinispan/protostream
>>  - Typically one would need to assign a stable sequence id to each field
>>  - previous points will likely want us to dump the output resource
>> somewhere, maybe even persist on Infinispan?
>
>
> That, or one does it via a build step (e.g. through an annotation processor)
> so the user manages the schemas as part of their application?

That's similar to Adrian's approach, he built an helper to map Java
pojos to Hot Rod.
It would make annotations mandatory on each field though, as you need
for example to allocate an id to each attribute and keep it stable.
We might want to reuse that, but AFAIK its not fully fit for ORM's use
case, for example no relations other than embedding.

>>
>>
>> On a very different topic, we also typically require from a
>> GridDialect implementor to provide sequence generation capability. I
>> don't see a solution for that over Hot Rod, it doesn't currently have
>> any safe incremental id, but when I asked today I was told that
>> Infinispan 8 might have it; Tristan pointed out you can upload a
>> script and have it run on the server, which in turn has access to the
>> transactions API so this should be possible but doesn't look too easy.
>
>
> Wouldn't a table-like strategy work? I.e. a sequence field which the
> application itself manages? It's not perfect but it's what we use for other
> stores without native sequences.

I really don't see how you can guarantee that two application
instances don't get the same number; you'd need either transactions,
or locks or atomic operations and you have none of these.
In fact, I was planning to ask you about the javadoc comment I've
found on GridDialect.supportsSequences() .. I doubt that's a reusable
approach beyond the specific semantics of Neo4J?

Needless to say, not giving a *strong* guarantee of uniqueness on the
primary key would be a critical issue.

>
>> Finally, we'll need using the distributed remote iterator for
>> GridDialect#forEachTuple.
>>
>> So my conclusion is that to support Hot Rod we'd be better off to make
>> a completely different GridDialect than the one for Infinispan
>> embedded, as I can hardly see some shared code.
>
>
> +1
>
>> Would you agree on try basing the approach on a brand new dialect and
>> on protobuf schema generation?
>
>
> All the protobuf business sounds a bit scary. Is this the way ISPN remote is
> used in practice? If so, I guess there is no way around it. What about the
> REST interface btw.? When using that, we may be able to share code with the
> CouchDB (and soon Redis) backends.

It's an interesting idea as we could represent all entities as JSON
documents, but you have no query capabilities over REST.
Mapping many-to-many relations becomes tricky as you also don't have
atomic operations.. I guess we could have data loss on a concurrent
write of junction tables? Or would you map all relations as nested
keys in the same document of the owning entity?

>
>>
>> In terms of features, we can implement
>> them all except:
>>  - transactions & locks
>>  - join queries
>
>
> Sounds good, although the lack of TX hurts. It again may lead to situations
> where parts of a flush cycle have been applied when coming to an error,
> leaving the user with the task of cleaning up the potential mess.

Right but you have no TX in any option based on Hot Rod. I'm trying to
brainstorm what's our best mapping option, *given* you have no TX &
co.
That might change in future, but it has been "in the planning" since 5
years so we'd better assume it's not going to happen.

-- Sanne