[hibernate-dev] Contributing to OGM / Cassandra

John Worrell jlesinge at gmail.com
Thu Sep 11 04:56:53 EDT 2014


Hi Gunnar (& Emmanuel),

Thanks again for the info. Chugging on slowly when I get the time.

The sequences are an interesting problem: C* does not supply built-in
functionality to create sequences. That leaves us with an interesting
problem because the standard approach of creating a sequence table would
seem to hobble the "write fast" that C* users know and love. Alternatives
to the use of a C* table to generate sequences then bring us face to face
with the problem of generating id.s on multiple nodes (I assume here that
C* is being used in a distributed environment) - we use a home-grown
implementation of twitter snow-flake for this purpose.

Cheers,

John

On Tue, Sep 9, 2014 at 1:06 PM, Gunnar Morling <gunnar at hibernate.org> wrote:

> Hi,
>
> 2014-09-09 12:55 GMT+02:00 John Worrell <jlesinge at gmail.com>:
>
>> Hi Gunnar,
>>
>> Wrt the <class> tags - partly it is an issue with Eclipse JPA which
>> complains if the <class> tags are absent, but I think it *may* actually not
>> make any difference to the examples - the real issue lies with the code not
>> picking up the sequences to generate properly, and as you point out that
>> may now be fixed in the latest master.
>>
>
> To provide some more details, it's a dialect-specific implementation of
> the SchemaDefiner contract which is in charge of the schema initialization.
> The specific implementation type is to be returned from
> DatastoreProvider#getSchemaDefinerType(). The SchemaDefiner is invoked by
> the engine after session factory initialization (eventually it will only be
> invoked if required so by the "hbm2ddl.auto" setting).
>
> That contract is still experimental at this time, we need to flesh it out
> in more detail, also based on the feedback what's needed for Cassandra (as
> it is the first store with a fixed schema).
>
> Does Cassandra have any counterpart to physical sequences as e.g. in
> Oracle? If not (and it can not be emulated in a meaningful way as we do for
> Neo4j), GridDialect#supportsSequences() would have to return false, and the
> table-based strategy needs to be implemented.
>
> I'll look at a rebase.
>>
>> Thanks,
>>
>> John
>>
>
> Hth,
>
> --Gunnar
>
>
>> On Tue, Sep 9, 2014 at 10:36 AM, Gunnar Morling <gunnar at hibernate.org>
>> wrote:
>>
>>> Hi,
>>>
>>> 2014-09-09 11:08 GMT+02:00 John Worrell <jlesinge at gmail.com>:
>>>
>>>> Hi Gunnar,
>>>>
>>>> Many thanks for the reply - I'll yank down the master... assume it is
>>>> merged back to the Jon Halliday fork otherwise I'll need to mess about a
>>>> bit.
>>>>
>>>
>>> Not sure when Jon's branch was updated for the last time.
>>>
>>> Probably you need to rebase (we prefer to work with rebases rather than
>>> merge commits) your local branch onto the latest master from upstream.
>>> There have been some changes around GridDialect in the last time, mainly
>>> about query execution and id generation. Nothing dramatic, though.
>>>
>>>
>>>> Also had some issues with getting connected to C*, understandable, but
>>>> also wrt adding <class> tags for the Dog / Breed classes in the
>>>> persistence.xml file. not sure whether that is intended to be needed.
>>>>
>>>
>>> You mean the classes from the "Getting Started" example, right? The
>>> <class> tags should not be required, the example runs as is e.g. on
>>> Infinispan. What happens if you don't add those?
>>>
>>> Cheers,
>>>>
>>>> John
>>>>
>>>
>>> --Gunnar
>>>
>>> On Tue, Sep 9, 2014 at 9:59 AM, Gunnar Morling <gunnar at hibernate.org>
>>>> wrote:
>>>>
>>>>> Hi John,
>>>>>
>>>>> 2014-09-09 10:33 GMT+02:00 John Worrell <jlesinge at gmail.com>:
>>>>>
>>>>>> Hi Emmanuel & Gunnar,
>>>>>>
>>>>>> Many thanks for your detailed responses - and nice to chat with
>>>>>> Gunnar a
>>>>>> week or so back. Again I have to apologise for radio silence - my day
>>>>>> job
>>>>>> suddenly ate all my waking functional time - so progress has been
>>>>>> very slow.
>>>>>>
>>>>>
>>>>> No worries, we are very glad about your help.
>>>>>
>>>>> I'm getting deeper into the code now, and starting a POC... which is
>>>>>> leading me to some more detailed questions. Basically, what I am
>>>>>> doing is
>>>>>> to run the examples and to look at things that seem to be missing,
>>>>>> and toi
>>>>>> understand the data that is being passed around in the various options
>>>>>> classes, so I can make a more informed implementation
>>>>>>
>>>>>
>>>>> Sounds very reasonable. I also can recommend to take a look at the
>>>>> MongoDB dialect and the persistent representations it creates in the
>>>>> datastore as it can comfortably be browsed e.g. using the mongo command
>>>>> line client. That's how I got to understand many things of the interaction
>>>>> between engine and dialects.
>>>>>
>>>>> If you have any ideas where the dialect SPI documentation can be
>>>>> improved to facilitate an easier understanding of how pieces work together,
>>>>> let me know.
>>>>>
>>>>> The key question in my mind at the moment is that of the relationship
>>>>>> between the base Hibernate Dialect class and the GridDialect interface
>>>>>
>>>>>
>>>>> OGM has its own pseudo implementation of ORM's Dialect contract,
>>>>> OgmDialect, but this should hardly ever play a role during OGM development.
>>>>> OGM's main contract towards dialects is GridDialect.
>>>>>
>>>>> The reason for exposing GridDialect on the pseudo OgmDialect is that
>>>>> it is our backdoor to make GridDialect available to
>>>>> PersistentNoSqlIdentifierGenerator implementations. Atm. there is no way to
>>>>> inject the GridDialect in a more straight-forward way due to some
>>>>> limitations in the way we integrate with the ORM engine.
>>>>>
>>>>>
>>>>>> - I
>>>>>> look at the OgmTableGenerator which is attempting to access a CF /
>>>>>> table
>>>>>> that is not yet created - I figured I understand what was happening
>>>>>> here,
>>>>>> and make appropriate extensions / fixes first. So, currently fighting
>>>>>> my
>>>>>> way through generating the sequence tables, and wondering why
>>>>>> OgmSequnceGenerator wraps OgmtableGenerator.
>>>>>>
>>>>>
>>>>> Just to be sure, are you looking at the latest master? There have been
>>>>> some changes around these generator classes recently, they are in a much
>>>>> cleaner state than they used to be.
>>>>>
>>>>> The reason for the wrapping is that when using the SEQUENCE strategy
>>>>> in cases where the store actually does not natively support sequences, we
>>>>> fall back to TABLE. Currently we only support a "native" SEQUENCE strategy
>>>>> for Neo4j which allows to map sequences as nodes in a reasonable manner,
>>>>> whereas all the other dialects use the table fallback.
>>>>> GridDialect#supportsSequences() is evaluated to find out whether the
>>>>> delegation needs to be done or not.
>>>>>
>>>>> You also could take a look at Neo4jSequenceGenerator which creates the
>>>>> sequence nodes in the datastore based on the registered
>>>>> PersistentNoSqlIdentifierGenerators. Note that this checks via instanceof
>>>>> for OgmSequenceGenerator/OgmTableGenerator atm. As we don't want to expose
>>>>> these types on the dialect SPI, I'm looking into ways for allowing the
>>>>> distinction of the two in a more abstract way, mainly based on
>>>>> IdSourceKeyMetadata.
>>>>>
>>>>> Hope that helps, I'll be very happy to answer any follow-up questions.
>>>>> Thanks again for your help with the Cassandra dialect, I'm looking forward
>>>>> to this dialect very much!
>>>>>
>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> John
>>>>>>
>>>>>
>>>>> --Gunnar
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 22, 2014 at 5:25 PM, Emmanuel Bernard <
>>>>>> emmanuel at hibernate.org>
>>>>>> wrote:
>>>>>>
>>>>>> > On Thu 2014-08-07  9:10, John Worrell wrote:
>>>>>> > > Hi Emmanuel et al.,
>>>>>> > >
>>>>>> > > My apologies for the log radio silence. I've taken a look at the
>>>>>> > code-base
>>>>>> > > on Jon Halliday's repo, and have set up a nick on freenode -
>>>>>> #jlesinge.
>>>>>> >
>>>>>> > No worries I was on holidays.
>>>>>> > And you email was the few lucky ones that I had to delay as it
>>>>>> required
>>>>>> > thinking ;)
>>>>>> >
>>>>>> > >
>>>>>> > > On the time-series question I was wondering how you envisaged the
>>>>>> data
>>>>>> > > stored: I tend to think of a single row under an primary key with
>>>>>> an
>>>>>> > > object-instance per column. Now what we have typically done
>>>>>> (generally
>>>>>> > the
>>>>>> > > data has been immutable) is to store the data serialized as a
>>>>>> blob (JSON
>>>>>> > or
>>>>>> > > XML), but I understand you do not favour this approach. With this
>>>>>> sort of
>>>>>> > > model I imagine the collection is then all the objects stored in
>>>>>> the row,
>>>>>> > > and the challenge is to page through the objects in the row.
>>>>>> >
>>>>>> > Actually it is one of the valid strategies.
>>>>>> > If I understand you well, you want to create:
>>>>>> >
>>>>>> > - one row per time series generating object (say a thermometer)
>>>>>> > - the column names of that row would be a timestamp of time at bay
>>>>>> > - the value would be a JSON structure containing the data at bay for
>>>>>> >   that specific time.
>>>>>> >
>>>>>> > That is one of the valid approach. But I think we need to support
>>>>>> > several:
>>>>>> >
>>>>>> > - simple column if the data is literally a single element
>>>>>> (temperature)
>>>>>> > - JSON structure for more complex data per time event
>>>>>> > - key pointing to the detailed data somewhere else in the cluster
>>>>>> >
>>>>>> > The latest would be done in two phases, you load all the keys you
>>>>>> are
>>>>>> > interested in matching your time range and then do a multiget of
>>>>>> sort to
>>>>>> > load the data.
>>>>>> >
>>>>>> > It seems datastax tends to recommend 1 or 2 (denormalization FTW).
>>>>>> >
>>>>>> > I don't know but there is also the notion of super column which is a
>>>>>> > grouping of columns that might also address our composite problem
>>>>>> > assuming they can be used for dynamic column families.
>>>>>> >
>>>>>> >
>>>>>> http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
>>>>>> >
>>>>>> >
>>>>>> http://planetcassandra.org/blog/post/getting-started-with-time-series-data-modeling/
>>>>>> > http://www.datastax.com/docs/1.0/ddl/column_family
>>>>>> >
>>>>>> > >
>>>>>> > > An approach we have often taken is to create multiple copies of
>>>>>> data in
>>>>>> > > different (obviously works well only for immutable objects) or
>>>>>> better to
>>>>>> >
>>>>>> > Yes, that is a feature that I would like OGM to automate for the
>>>>>> user.
>>>>>> > It declaratively defines the denormalization approaches he wants
>>>>>> and the
>>>>>> > engine does the persistence.
>>>>>> > Next the query engine uses that knowledge to find the best path (or
>>>>>> only
>>>>>> > possible path in the case of Cassandra :) )
>>>>>> >
>>>>>> > > create a table of keys to a main table where in either approach
>>>>>> the
>>>>>> > > row-keys are effectively a foreign-key and there is column per
>>>>>> object
>>>>>> > > associated through the foreign-key. Another approach though might
>>>>>> be to
>>>>>> > use
>>>>>> > > a column with type list (or set, or map) to contain keys to the
>>>>>> > associated
>>>>>> > > objects - this would be a little like the extensions Oracle have
>>>>>> for
>>>>>> > > mapping 1-* associations, though with the caveat that a column of
>>>>>> > > collection type may only contain 64k elements. I wondered if some
>>>>>> though
>>>>>> > > had been given to this strategy (which I must admit I have not
>>>>>> yet used
>>>>>> > > myself).
>>>>>> >
>>>>>> > I am not aware of that approach.
>>>>>> >
>>>>>> > >
>>>>>> > > It seems very likely that different mapping strategies should be
>>>>>> > > specifiable, but then I have still to understand how these might
>>>>>> fit with
>>>>>> > > treiid.
>>>>>> >
>>>>>> > Forget Teiid for now. We will likely start with the HQL->Walker and
>>>>>> do
>>>>>> > our own proto query engine before layering Teiid.
>>>>>> >
>>>>>> > >
>>>>>> > > Can I ask about assumptions: is it fair to assume that for
>>>>>> Cassandra, OGM
>>>>>> > > will target only CQL 3 (which means Cassandra 2 or maybe 1.2)?
>>>>>> This would
>>>>>> > > certainly make life simpler.
>>>>>> >
>>>>>> > Yes that's fine.
>>>>>> >
>>>>>> > >
>>>>>> > > An issue I don't see addressed is the choice of consistency-level
>>>>>> (read
>>>>>> > or
>>>>>> > > write) and I wondered if there was a plan for this? Assumptions
>>>>>> can be
>>>>>> > made
>>>>>> > > on a per table basis, but, certainly for ad hoc queries, it is
>>>>>> important
>>>>>> > >  think to have the flexibility to specify on a per-query basis.
>>>>>> >
>>>>>> > That's planned. We have an option system that allow for entity /
>>>>>> > property overriding of a global setting. While not implemented, we
>>>>>> will
>>>>>> > also have the ability to override setting per session / query.
>>>>>> > That was the plan all along.
>>>>>> >
>>>>>> > >
>>>>>> > > Those are my thoughts so far... I'll see about doing a POC of
>>>>>> some of
>>>>>> > what
>>>>>> > > I have described above
>>>>>> >
>>>>>> > Thanks :)
>>>>>> >
>>>>>> > >
>>>>>> > > Cheers,
>>>>>> > >
>>>>>> > > John
>>>>>> > >
>>>>>> > >
>>>>>> > > On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge at gmail.com
>>>>>> >
>>>>>> > wrote:
>>>>>> > >
>>>>>> > > > Hi Emmanuel,
>>>>>> > > >
>>>>>> > > > I'll take a look at what is there, and I'll get up and running
>>>>>> on IRC.
>>>>>> > > >
>>>>>> > > > I'll particularly look at the time-series issue - non-trivial I
>>>>>> think.
>>>>>> > > >
>>>>>> > > > Cheers,
>>>>>> > > >
>>>>>> > > > John
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard <
>>>>>> > emmanuel at hibernate.org>
>>>>>> > > > wrote:
>>>>>> > > >
>>>>>> > > >> Hi John,
>>>>>> > > >>
>>>>>> > > >> I thought I had replied to you on Friday but apparently the
>>>>>> email
>>>>>> > never
>>>>>> > > >> went through :/
>>>>>> > > >>
>>>>>> > > >> That is good news :)
>>>>>> > > >> Jonathan worked on a Cassandra prototype but had to drop due
>>>>>> to other
>>>>>> > > >> duties. He pushed everything at
>>>>>> > > >>
>>>>>> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra
>>>>>> > > >>
>>>>>> > > >> Have a look at what he has done and come ask any question to
>>>>>> Gunnar,
>>>>>> > > >> Davide or me. There are a bunch of moving pieces. We are
>>>>>> mostly on
>>>>>> > > >> freenode’s #hibernate-dev ( you need a freenode login
>>>>>> > > >> http://freenode.net/faq.shtml#nicksetup ). If you are
>>>>>> allergic to
>>>>>> > IRC,
>>>>>> > > >> let me know and we will find alternatives.
>>>>>> > > >>
>>>>>> > > >> The most interesting challenge will be to see how we can map
>>>>>> time
>>>>>> > series
>>>>>> > > >> into a collection and make sure we let the user decide how
>>>>>> much he
>>>>>> > wants to
>>>>>> > > >> load.
>>>>>> > > >>
>>>>>> > > >> Emmanuel
>>>>>> > > >>
>>>>>> > > >> On 16 Jul 2014, at 13:17, John Worrell <jlesinge at gmail.com>
>>>>>> wrote:
>>>>>> > > >>
>>>>>> > > >> > Hi,
>>>>>> > > >> >
>>>>>> > > >> > I'm interested in contributing to the Cassandra module of
>>>>>> > Hibernate-OGM
>>>>>> > > >> -
>>>>>> > > >> > what would be the baest way to go about this?
>>>>>> > > >> >
>>>>>> > > >> > Thanks,
>>>>>> > > >> >
>>>>>> > > >> > John
>>>>>> > > >> > _______________________________________________
>>>>>> > > >> > hibernate-dev mailing list
>>>>>> > > >> > hibernate-dev at lists.jboss.org
>>>>>> > > >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>>>> > > >>
>>>>>> > > >>
>>>>>> > > >
>>>>>> >
>>>>>> _______________________________________________
>>>>>> hibernate-dev mailing list
>>>>>> hibernate-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


More information about the hibernate-dev mailing list