[hibernate-dev] Contributing to OGM / Cassandra

Fri Aug 15 06:24:44 EDT 2014

Hi John,

First off, sorry again for the late response.

2014-08-07 10:10 GMT+02:00 John Worrell <jlesinge at gmail.com>:

> Hi Emmanuel et al.,
>
> My apologies for the log radio silence. I've taken a look at the code-base
> on Jon Halliday's repo, and have set up a nick on freenode - #jlesinge.
>
> On the time-series question I was wondering how you envisaged the data
> stored: I tend to think of a single row under an primary key with an
> object-instance per column. Now what we have typically done (generally the
> data has been immutable) is to store the data serialized as a blob (JSON or
> XML), but I understand you do not favour this approach. With this sort of
> model I imagine the collection is then all the objects stored in the row,
> and the challenge is to page through the objects in the row.
>

I cannot really comment on the time-series question, I'll leave that to
Emmanuel.

You're right though that data should not be stored as BLOBs or any other
"non-natural" representation. Querying and interaction with other
applications using the same store would then be a problem.

>
> An approach we have often taken is to create multiple copies of data in
> different (obviously works well only for immutable objects) or better to
> create a table of keys to a main table where in either approach the
> row-keys are effectively a foreign-key and there is column per  object
> associated through the foreign-key.

Could you maybe give an example for how this would look like?

> Another approach though might be to use
> a column with type list (or set, or map) to contain keys to the associated
> objects - this would be a little like the extensions Oracle have for
> mapping 1-* associations, though with the caveat that a column of
> collection type may only contain 64k elements. I wondered if some though
> had been given to this strategy (which I must admit I have not yet used
> myself).
>

A very good question, unfortunately my knowledge of data modeling with
Cassandra is still a bit limited.

Storing "foreign keys" in collection columns seems like a good idea. It's
somewhat similar to the "in entity" mode we have for MongoDB. Do list
columns support null values? I think we'd need that for ordered collections
containing nulls. Another question is how to deal with compound map keys.

For the document stores (MongoDB, CouchDB) we offer an alternative
"association document" mode which persists association information not
within the referencing entity but within separate entity documents,
circumventing similar issues with the max size of documents. IIUC, that's
somewhat similar to the first mode you describe. It might make sense to
support both modes in a similar fashion for Cassandra, configurable per
association.

Out of interest, how are associations handled in the branch created by
Jonathan?

What concerns de-normalization, some thoughts have been made, it's planned
for the 4.2 release at this point.

It seems very likely that different mapping strategies should be
> specifiable, but then I have still to understand how these might fit with
> treiid.
>

What is "treiid"? Do you mean Teiid (http://teiid.jboss.org/)?

+1 for making different strategies configurable where it makes sense.
That's what we do for other stores as well. You might want to have a look
at the AssociationStorage option. Currently that's specific to document
stores, but it might make sense to further generify it.

>
> Can I ask about assumptions: is it fair to assume that for Cassandra, OGM
> will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? This would
> certainly make life simpler.
>

Yes, I think that's fair to assume. We still can add support for earlier
versions later on, should there be the need for it.

>
> An issue I don't see addressed is the choice of consistency-level (read or
> write) and I wondered if there was a plan for this? Assumptions can be made
> on a per table basis, but, certainly for ad hoc queries, it is important
>  think to have the flexibility to specify on a per-query basis.
>

Configuring it on a per-table basis seems sensible. You can have a look at
how we do it for MongoDB (read preference, write concern) [1]. There is a
generic option mechanism which allows to add store-specific options and let
the user configure them via annotations or API, globally, per entity or per
property.

Specifying options per query is still an open issue. We plan to support
options specific to one Session [2] which will override the otherwise
defined settings, but per operation is something different yet. The main
challenge is that the existing APIs (createQuery() etc.) don't accept any
additional context, so we need to find a way to establish such option
context valid to one operation somehow.

> Those are my thoughts so far... I'll see about doing a POC of some of what
> I have described above
>

Awesome. Looking forward to it very much.

If you want to discuss anything specific in the code, just let us know. If
you like, you also can send an "early review" pull request as the basis for
discussion of the general approach.

Do you have any branch newer than the original one from Jonathan already on
GitHub? I could then take a look to make myself acquainted with the current
state.

Cheers,
>
> John
>

Many thanks for your help and with best regards,

--Gunnar

[1]
https://docs.jboss.org/hibernate/ogm/4.1/reference/en-US/html_single/#_configuring_mongodb
[2] https://hibernate.atlassian.net/browse/OGM-343

>
>
> On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge at gmail.com> wrote:
>
> > Hi Emmanuel,
> >
> > I'll take a look at what is there, and I'll get up and running on IRC.
> >
> > I'll particularly look at the time-series issue - non-trivial I think.
> >
> > Cheers,
> >
> > John
> >
> >
> > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard <
> emmanuel at hibernate.org>
> > wrote:
> >
> >> Hi John,
> >>
> >> I thought I had replied to you on Friday but apparently the email never
> >> went through :/
> >>
> >> That is good news :)
> >> Jonathan worked on a Cassandra prototype but had to drop due to other
> >> duties. He pushed everything at
> >> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra
> >>
> >> Have a look at what he has done and come ask any question to Gunnar,
> >> Davide or me. There are a bunch of moving pieces. We are mostly on
> >> freenode’s #hibernate-dev ( you need a freenode login
> >> http://freenode.net/faq.shtml#nicksetup ). If you are allergic to IRC,
> >> let me know and we will find alternatives.
> >>
> >> The most interesting challenge will be to see how we can map time series
> >> into a collection and make sure we let the user decide how much he
> wants to
> >> load.
> >>
> >> Emmanuel
> >>
> >> On 16 Jul 2014, at 13:17, John Worrell <jlesinge at gmail.com> wrote:
> >>
> >> > Hi,
> >> >
> >> > I'm interested in contributing to the Cassandra module of
> Hibernate-OGM
> >> -
> >> > what would be the baest way to go about this?
> >> >
> >> > Thanks,
> >> >
> >> > John
> >> > _______________________________________________
> >> > hibernate-dev mailing list
> >> > hibernate-dev at lists.jboss.org
> >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
> >>
> >>
> >
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>