[hibernate-dev] Contributing to OGM / Cassandra

Tue Sep 9 05:08:25 EDT 2014

Hi Gunnar,

Many thanks for the reply - I'll yank down the master... assume it is
merged back to the Jon Halliday fork otherwise I'll need to mess about a
bit. Also had some issues with getting connected to C*, understandable, but
also wrt adding <class> tags for the Dog / Breed classes in the
persistence.xml file. not sure whether that is intended to be needed.

Cheers,

John

On Tue, Sep 9, 2014 at 9:59 AM, Gunnar Morling <gunnar at hibernate.org> wrote:

> Hi John,
>
> 2014-09-09 10:33 GMT+02:00 John Worrell <jlesinge at gmail.com>:
>
>> Hi Emmanuel & Gunnar,
>>
>> Many thanks for your detailed responses - and nice to chat with Gunnar a
>> week or so back. Again I have to apologise for radio silence - my day job
>> suddenly ate all my waking functional time - so progress has been very
>> slow.
>>
>
> No worries, we are very glad about your help.
>
> I'm getting deeper into the code now, and starting a POC... which is
>> leading me to some more detailed questions. Basically, what I am doing is
>> to run the examples and to look at things that seem to be missing, and toi
>> understand the data that is being passed around in the various options
>> classes, so I can make a more informed implementation
>>
>
> Sounds very reasonable. I also can recommend to take a look at the MongoDB
> dialect and the persistent representations it creates in the datastore as
> it can comfortably be browsed e.g. using the mongo command line client.
> That's how I got to understand many things of the interaction between
> engine and dialects.
>
> If you have any ideas where the dialect SPI documentation can be improved
> to facilitate an easier understanding of how pieces work together, let me
> know.
>
> The key question in my mind at the moment is that of the relationship
>> between the base Hibernate Dialect class and the GridDialect interface
>
>
> OGM has its own pseudo implementation of ORM's Dialect contract,
> OgmDialect, but this should hardly ever play a role during OGM development.
> OGM's main contract towards dialects is GridDialect.
>
> The reason for exposing GridDialect on the pseudo OgmDialect is that it is
> our backdoor to make GridDialect available to
> PersistentNoSqlIdentifierGenerator implementations. Atm. there is no way to
> inject the GridDialect in a more straight-forward way due to some
> limitations in the way we integrate with the ORM engine.
>
>
>> - I
>> look at the OgmTableGenerator which is attempting to access a CF / table
>> that is not yet created - I figured I understand what was happening here,
>> and make appropriate extensions / fixes first. So, currently fighting my
>> way through generating the sequence tables, and wondering why
>> OgmSequnceGenerator wraps OgmtableGenerator.
>>
>
> Just to be sure, are you looking at the latest master? There have been
> some changes around these generator classes recently, they are in a much
> cleaner state than they used to be.
>
> The reason for the wrapping is that when using the SEQUENCE strategy in
> cases where the store actually does not natively support sequences, we fall
> back to TABLE. Currently we only support a "native" SEQUENCE strategy for
> Neo4j which allows to map sequences as nodes in a reasonable manner,
> whereas all the other dialects use the table fallback.
> GridDialect#supportsSequences() is evaluated to find out whether the
> delegation needs to be done or not.
>
> You also could take a look at Neo4jSequenceGenerator which creates the
> sequence nodes in the datastore based on the registered
> PersistentNoSqlIdentifierGenerators. Note that this checks via instanceof
> for OgmSequenceGenerator/OgmTableGenerator atm. As we don't want to expose
> these types on the dialect SPI, I'm looking into ways for allowing the
> distinction of the two in a more abstract way, mainly based on
> IdSourceKeyMetadata.
>
> Hope that helps, I'll be very happy to answer any follow-up questions.
> Thanks again for your help with the Cassandra dialect, I'm looking forward
> to this dialect very much!
>
>
>>
>> Cheers,
>>
>> John
>>
>
> --Gunnar
>
>
>>
>>
>> On Fri, Aug 22, 2014 at 5:25 PM, Emmanuel Bernard <emmanuel at hibernate.org
>> >
>> wrote:
>>
>> > On Thu 2014-08-07  9:10, John Worrell wrote:
>> > > Hi Emmanuel et al.,
>> > >
>> > > My apologies for the log radio silence. I've taken a look at the
>> > code-base
>> > > on Jon Halliday's repo, and have set up a nick on freenode -
>> #jlesinge.
>> >
>> > No worries I was on holidays.
>> > And you email was the few lucky ones that I had to delay as it required
>> > thinking ;)
>> >
>> > >
>> > > On the time-series question I was wondering how you envisaged the data
>> > > stored: I tend to think of a single row under an primary key with an
>> > > object-instance per column. Now what we have typically done (generally
>> > the
>> > > data has been immutable) is to store the data serialized as a blob
>> (JSON
>> > or
>> > > XML), but I understand you do not favour this approach. With this
>> sort of
>> > > model I imagine the collection is then all the objects stored in the
>> row,
>> > > and the challenge is to page through the objects in the row.
>> >
>> > Actually it is one of the valid strategies.
>> > If I understand you well, you want to create:
>> >
>> > - one row per time series generating object (say a thermometer)
>> > - the column names of that row would be a timestamp of time at bay
>> > - the value would be a JSON structure containing the data at bay for
>> >   that specific time.
>> >
>> > That is one of the valid approach. But I think we need to support
>> > several:
>> >
>> > - simple column if the data is literally a single element (temperature)
>> > - JSON structure for more complex data per time event
>> > - key pointing to the detailed data somewhere else in the cluster
>> >
>> > The latest would be done in two phases, you load all the keys you are
>> > interested in matching your time range and then do a multiget of sort to
>> > load the data.
>> >
>> > It seems datastax tends to recommend 1 or 2 (denormalization FTW).
>> >
>> > I don't know but there is also the notion of super column which is a
>> > grouping of columns that might also address our composite problem
>> > assuming they can be used for dynamic column families.
>> >
>> > http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
>> >
>> >
>> http://planetcassandra.org/blog/post/getting-started-with-time-series-data-modeling/
>> > http://www.datastax.com/docs/1.0/ddl/column_family
>> >
>> > >
>> > > An approach we have often taken is to create multiple copies of data
>> in
>> > > different (obviously works well only for immutable objects) or better
>> to
>> >
>> > Yes, that is a feature that I would like OGM to automate for the user.
>> > It declaratively defines the denormalization approaches he wants and the
>> > engine does the persistence.
>> > Next the query engine uses that knowledge to find the best path (or only
>> > possible path in the case of Cassandra :) )
>> >
>> > > create a table of keys to a main table where in either approach the
>> > > row-keys are effectively a foreign-key and there is column per  object
>> > > associated through the foreign-key. Another approach though might be
>> to
>> > use
>> > > a column with type list (or set, or map) to contain keys to the
>> > associated
>> > > objects - this would be a little like the extensions Oracle have for
>> > > mapping 1-* associations, though with the caveat that a column of
>> > > collection type may only contain 64k elements. I wondered if some
>> though
>> > > had been given to this strategy (which I must admit I have not yet
>> used
>> > > myself).
>> >
>> > I am not aware of that approach.
>> >
>> > >
>> > > It seems very likely that different mapping strategies should be
>> > > specifiable, but then I have still to understand how these might fit
>> with
>> > > treiid.
>> >
>> > Forget Teiid for now. We will likely start with the HQL->Walker and do
>> > our own proto query engine before layering Teiid.
>> >
>> > >
>> > > Can I ask about assumptions: is it fair to assume that for Cassandra,
>> OGM
>> > > will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? This
>> would
>> > > certainly make life simpler.
>> >
>> > Yes that's fine.
>> >
>> > >
>> > > An issue I don't see addressed is the choice of consistency-level
>> (read
>> > or
>> > > write) and I wondered if there was a plan for this? Assumptions can be
>> > made
>> > > on a per table basis, but, certainly for ad hoc queries, it is
>> important
>> > >  think to have the flexibility to specify on a per-query basis.
>> >
>> > That's planned. We have an option system that allow for entity /
>> > property overriding of a global setting. While not implemented, we will
>> > also have the ability to override setting per session / query.
>> > That was the plan all along.
>> >
>> > >
>> > > Those are my thoughts so far... I'll see about doing a POC of some of
>> > what
>> > > I have described above
>> >
>> > Thanks :)
>> >
>> > >
>> > > Cheers,
>> > >
>> > > John
>> > >
>> > >
>> > > On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge at gmail.com>
>> > wrote:
>> > >
>> > > > Hi Emmanuel,
>> > > >
>> > > > I'll take a look at what is there, and I'll get up and running on
>> IRC.
>> > > >
>> > > > I'll particularly look at the time-series issue - non-trivial I
>> think.
>> > > >
>> > > > Cheers,
>> > > >
>> > > > John
>> > > >
>> > > >
>> > > > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard <
>> > emmanuel at hibernate.org>
>> > > > wrote:
>> > > >
>> > > >> Hi John,
>> > > >>
>> > > >> I thought I had replied to you on Friday but apparently the email
>> > never
>> > > >> went through :/
>> > > >>
>> > > >> That is good news :)
>> > > >> Jonathan worked on a Cassandra prototype but had to drop due to
>> other
>> > > >> duties. He pushed everything at
>> > > >> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra
>> > > >>
>> > > >> Have a look at what he has done and come ask any question to
>> Gunnar,
>> > > >> Davide or me. There are a bunch of moving pieces. We are mostly on
>> > > >> freenode’s #hibernate-dev ( you need a freenode login
>> > > >> http://freenode.net/faq.shtml#nicksetup ). If you are allergic to
>> > IRC,
>> > > >> let me know and we will find alternatives.
>> > > >>
>> > > >> The most interesting challenge will be to see how we can map time
>> > series
>> > > >> into a collection and make sure we let the user decide how much he
>> > wants to
>> > > >> load.
>> > > >>
>> > > >> Emmanuel
>> > > >>
>> > > >> On 16 Jul 2014, at 13:17, John Worrell <jlesinge at gmail.com> wrote:
>> > > >>
>> > > >> > Hi,
>> > > >> >
>> > > >> > I'm interested in contributing to the Cassandra module of
>> > Hibernate-OGM
>> > > >> -
>> > > >> > what would be the baest way to go about this?
>> > > >> >
>> > > >> > Thanks,
>> > > >> >
>> > > >> > John
>> > > >> > _______________________________________________
>> > > >> > hibernate-dev mailing list
>> > > >> > hibernate-dev at lists.jboss.org
>> > > >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> > > >>
>> > > >>
>> > > >
>> >
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>
>
>