Re: [hibernate-dev] Contributing to OGM / Cassandra

Tuesday, 9 September 2014

Hi John,

2014-09-09 10:33 GMT+02:00 John Worrell <jlesinge(a)gmail.com&gt;:

...
 Hi Emmanuel & Gunnar,

 Many thanks for your detailed responses - and nice to chat with Gunnar a
 week or so back. Again I have to apologise for radio silence - my day job
 suddenly ate all my waking functional time - so progress has been very
 slow.

No worries, we are very glad about your help.

I'm getting deeper into the code now, and starting a POC... which is
...
 leading me to some more detailed questions. Basically, what I am
doing is
 to run the examples and to look at things that seem to be missing, and toi
 understand the data that is being passed around in the various options
 classes, so I can make a more informed implementation

Sounds very reasonable. I also can recommend to take a look at the MongoDB
dialect and the persistent representations it creates in the datastore as
it can comfortably be browsed e.g. using the mongo command line client.
That's how I got to understand many things of the interaction between
engine and dialects.

If you have any ideas where the dialect SPI documentation can be improved
to facilitate an easier understanding of how pieces work together, let me
know.

The key question in my mind at the moment is that of the relationship
...
 between the base Hibernate Dialect class and the GridDialect
interface 

OGM has its own pseudo implementation of ORM's Dialect contract,
OgmDialect, but this should hardly ever play a role during OGM development.
OGM's main contract towards dialects is GridDialect.

The reason for exposing GridDialect on the pseudo OgmDialect is that it is
our backdoor to make GridDialect available to
PersistentNoSqlIdentifierGenerator implementations. Atm. there is no way to
inject the GridDialect in a more straight-forward way due to some
limitations in the way we integrate with the ORM engine.

...
 - I
 look at the OgmTableGenerator which is attempting to access a CF / table
 that is not yet created - I figured I understand what was happening here,
 and make appropriate extensions / fixes first. So, currently fighting my
 way through generating the sequence tables, and wondering why
 OgmSequnceGenerator wraps OgmtableGenerator.

Just to be sure, are you looking at the latest master? There have been some
changes around these generator classes recently, they are in a much cleaner
state than they used to be.

The reason for the wrapping is that when using the SEQUENCE strategy in
cases where the store actually does not natively support sequences, we fall
back to TABLE. Currently we only support a "native" SEQUENCE strategy for
Neo4j which allows to map sequences as nodes in a reasonable manner,
whereas all the other dialects use the table fallback.
GridDialect#supportsSequences() is evaluated to find out whether the
delegation needs to be done or not.

You also could take a look at Neo4jSequenceGenerator which creates the
sequence nodes in the datastore based on the registered
PersistentNoSqlIdentifierGenerators. Note that this checks via instanceof
for OgmSequenceGenerator/OgmTableGenerator atm. As we don't want to expose
these types on the dialect SPI, I'm looking into ways for allowing the
distinction of the two in a more abstract way, mainly based on
IdSourceKeyMetadata.

Hope that helps, I'll be very happy to answer any follow-up questions.
Thanks again for your help with the Cassandra dialect, I'm looking forward
to this dialect very much!

...

 Cheers,

 John

--Gunnar

...

 On Fri, Aug 22, 2014 at 5:25 PM, Emmanuel Bernard <emmanuel(a)hibernate.org&gt;
 wrote:

 > On Thu 2014-08-07  9:10, John Worrell wrote:
 > > Hi Emmanuel et al.,
 > >
 > > My apologies for the log radio silence. I've taken a look at the
 > code-base
 > > on Jon Halliday's repo, and have set up a nick on freenode - #jlesinge.
 >
 > No worries I was on holidays.
 > And you email was the few lucky ones that I had to delay as it required
 > thinking ;)
 >
 > >
 > > On the time-series question I was wondering how you envisaged the data
 > > stored: I tend to think of a single row under an primary key with an
 > > object-instance per column. Now what we have typically done (generally
 > the
 > > data has been immutable) is to store the data serialized as a blob
 (JSON
 > or
 > > XML), but I understand you do not favour this approach. With this sort
 of
 > > model I imagine the collection is then all the objects stored in the
 row,
 > > and the challenge is to page through the objects in the row.
 >
 > Actually it is one of the valid strategies.
 > If I understand you well, you want to create:
 >
 > - one row per time series generating object (say a thermometer)
 > - the column names of that row would be a timestamp of time at bay
 > - the value would be a JSON structure containing the data at bay for
 >   that specific time.
 >
 > That is one of the valid approach. But I think we need to support
 > several:
 >
 > - simple column if the data is literally a single element (temperature)
 > - JSON structure for more complex data per time event
 > - key pointing to the detailed data somewhere else in the cluster
 >
 > The latest would be done in two phases, you load all the keys you are
 > interested in matching your time range and then do a multiget of sort to
 > load the data.
 >
 > It seems datastax tends to recommend 1 or 2 (denormalization FTW).
 >
 > I don't know but there is also the notion of super column which is a
 > grouping of columns that might also address our composite problem
 > assuming they can be used for dynamic column families.
 >
 > http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
 >
 >
 http://planetcassandra.org/blog/post/getting-started-with-time-series-dat...
 > http://www.datastax.com/docs/1.0/ddl/column_family
 >
 > >
 > > An approach we have often taken is to create multiple copies of data in
 > > different (obviously works well only for immutable objects) or better
 to
 >
 > Yes, that is a feature that I would like OGM to automate for the user.
 > It declaratively defines the denormalization approaches he wants and the
 > engine does the persistence.
 > Next the query engine uses that knowledge to find the best path (or only
 > possible path in the case of Cassandra :) )
 >
 > > create a table of keys to a main table where in either approach the
 > > row-keys are effectively a foreign-key and there is column per  object
 > > associated through the foreign-key. Another approach though might be to
 > use
 > > a column with type list (or set, or map) to contain keys to the
 > associated
 > > objects - this would be a little like the extensions Oracle have for
 > > mapping 1-* associations, though with the caveat that a column of
 > > collection type may only contain 64k elements. I wondered if some
 though
 > > had been given to this strategy (which I must admit I have not yet used
 > > myself).
 >
 > I am not aware of that approach.
 >
 > >
 > > It seems very likely that different mapping strategies should be
 > > specifiable, but then I have still to understand how these might fit
 with
 > > treiid.
 >
 > Forget Teiid for now. We will likely start with the HQL->Walker and do
 > our own proto query engine before layering Teiid.
 >
 > >
 > > Can I ask about assumptions: is it fair to assume that for Cassandra,
 OGM
 > > will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? This
 would
 > > certainly make life simpler.
 >
 > Yes that's fine.
 >
 > >
 > > An issue I don't see addressed is the choice of consistency-level (read
 > or
 > > write) and I wondered if there was a plan for this? Assumptions can be
 > made
 > > on a per table basis, but, certainly for ad hoc queries, it is
 important
 > >  think to have the flexibility to specify on a per-query basis.
 >
 > That's planned. We have an option system that allow for entity /
 > property overriding of a global setting. While not implemented, we will
 > also have the ability to override setting per session / query.
 > That was the plan all along.
 >
 > >
 > > Those are my thoughts so far... I'll see about doing a POC of some of
 > what
 > > I have described above
 >
 > Thanks :)
 >
 > >
 > > Cheers,
 > >
 > > John
 > >
 > >
 > > On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge(a)gmail.com&gt;
 > wrote:
 > >
 > > > Hi Emmanuel,
 > > >
 > > > I'll take a look at what is there, and I'll get up and running on
 IRC.
 > > >
 > > > I'll particularly look at the time-series issue - non-trivial I
 think.
 > > >
 > > > Cheers,
 > > >
 > > > John
 > > >
 > > >
 > > > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard <
 > emmanuel(a)hibernate.org&gt;
 > > > wrote:
 > > >
 > > >> Hi John,
 > > >>
 > > >> I thought I had replied to you on Friday but apparently the email
 > never
 > > >> went through :/
 > > >>
 > > >> That is good news :)
 > > >> Jonathan worked on a Cassandra prototype but had to drop due to
 other
 > > >> duties. He pushed everything at
 > > >> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra
 > > >>
 > > >> Have a look at what he has done and come ask any question to Gunnar,
 > > >> Davide or me. There are a bunch of moving pieces. We are mostly on
 > > >> freenode’s #hibernate-dev ( you need a freenode login
 > > >> http://freenode.net/faq.shtml#nicksetup ). If you are allergic to
 > IRC,
 > > >> let me know and we will find alternatives.
 > > >>
 > > >> The most interesting challenge will be to see how we can map time
 > series
 > > >> into a collection and make sure we let the user decide how much he
 > wants to
 > > >> load.
 > > >>
 > > >> Emmanuel
 > > >>
 > > >> On 16 Jul 2014, at 13:17, John Worrell <jlesinge(a)gmail.com&gt;
wrote:
 > > >>
 > > >> > Hi,
 > > >> >
 > > >> > I'm interested in contributing to the Cassandra module of
 > Hibernate-OGM
 > > >> -
 > > >> > what would be the baest way to go about this?
 > > >> >
 > > >> > Thanks,
 > > >> >
 > > >> > John
 > > >> > _______________________________________________
 > > >> > hibernate-dev mailing list
 > > >> > hibernate-dev(a)lists.jboss.org
 > > >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
 > > >>
 > > >>
 > > >
 >
 _______________________________________________
 hibernate-dev mailing list
 hibernate-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] Contributing to OGM / Cassandra