Contributing to OGM / Cassandra

DuplicateMappingException in ORM...

GitHub Split View

John Worrell

Wednesday, 16 July 2014 Wed, 16 Jul '14

1:17 p.m.

Hi, I'm interested in contributing to the Cassandra module of Hibernate-OGM - what would be the baest way to go about this? Thanks, John

Show replies by date

Emmanuel Bernard

Monday, 21 July Mon, 21 Jul

7:06 a.m.

Hi John, I thought I had replied to you on Friday but apparently the email never went through :/ That is good news :) Jonathan worked on a Cassandra prototype but had to drop due to other duties. He pushed everything at https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra Have a look at what he has done and come ask any question to Gunnar, Davide or me. There are a bunch of moving pieces. We are mostly on freenode’s #hibernate-dev ( you need a freenode login http://freenode.net/faq.shtml#nicksetup ). If you are allergic to IRC, let me know and we will find alternatives. The most interesting challenge will be to see how we can map time series into a collection and make sure we let the user decide how much he wants to load. Emmanuel On 16 Jul 2014, at 13:17, John Worrell <jlesinge(a)gmail.com> wrote:

...

Hi, I'm interested in contributing to the Cassandra module of Hibernate-OGM - what would be the baest way to go about this? Thanks, John _______________________________________________ hibernate-dev mailing list hibernate-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev

John Worrell

10:48 a.m.

...

John Worrell

Thursday, 7 August Thu, 7 Aug

3:10 a.m.

Hi Emmanuel et al., My apologies for the log radio silence. I've taken a look at the code-base on Jon Halliday's repo, and have set up a nick on freenode - #jlesinge. On the time-series question I was wondering how you envisaged the data stored: I tend to think of a single row under an primary key with an object-instance per column. Now what we have typically done (generally the data has been immutable) is to store the data serialized as a blob (JSON or XML), but I understand you do not favour this approach. With this sort of model I imagine the collection is then all the objects stored in the row, and the challenge is to page through the objects in the row. An approach we have often taken is to create multiple copies of data in different (obviously works well only for immutable objects) or better to create a table of keys to a main table where in either approach the row-keys are effectively a foreign-key and there is column per object associated through the foreign-key. Another approach though might be to use a column with type list (or set, or map) to contain keys to the associated objects - this would be a little like the extensions Oracle have for mapping 1-* associations, though with the caveat that a column of collection type may only contain 64k elements. I wondered if some though had been given to this strategy (which I must admit I have not yet used myself). It seems very likely that different mapping strategies should be specifiable, but then I have still to understand how these might fit with treiid. Can I ask about assumptions: is it fair to assume that for Cassandra, OGM will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? This would certainly make life simpler. An issue I don't see addressed is the choice of consistency-level (read or write) and I wondered if there was a plan for this? Assumptions can be made on a per table basis, but, certainly for ad hoc queries, it is important think to have the flexibility to specify on a per-query basis. Those are my thoughts so far... I'll see about doing a POC of some of what I have described above Cheers, John On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge(a)gmail.com> wrote:

...

Hi Emmanuel, I'll take a look at what is there, and I'll get up and running on IRC. I'll particularly look at the time-series issue - non-trivial I think. Cheers, John On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote: > Hi John, > > I thought I had replied to you on Friday but apparently the email never > went through :/ > > That is good news :) > Jonathan worked on a Cassandra prototype but had to drop due to other > duties. He pushed everything at > https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra > > Have a look at what he has done and come ask any question to Gunnar, > Davide or me. There are a bunch of moving pieces. We are mostly on > freenode’s #hibernate-dev ( you need a freenode login > http://freenode.net/faq.shtml#nicksetup ). If you are allergic to IRC, > let me know and we will find alternatives. > > The most interesting challenge will be to see how we can map time series > into a collection and make sure we let the user decide how much he wants to > load. > > Emmanuel > > On 16 Jul 2014, at 13:17, John Worrell <jlesinge(a)gmail.com> wrote: > > > Hi, > > > > I'm interested in contributing to the Cassandra module of Hibernate-OGM > - > > what would be the baest way to go about this? > > > > Thanks, > > > > John > > _______________________________________________ > > hibernate-dev mailing list > > hibernate-dev(a)lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/hibernate-dev > >

Gunnar Morling

Friday, 15 August Fri, 15 Aug

5:24 a.m.

Hi John, First off, sorry again for the late response. 2014-08-07 10:10 GMT+02:00 John Worrell <jlesinge(a)gmail.com>:

...

I cannot really comment on the time-series question, I'll leave that to Emmanuel. You're right though that data should not be stored as BLOBs or any other "non-natural" representation. Querying and interaction with other applications using the same store would then be a problem.

...

An approach we have often taken is to create multiple copies of data in different (obviously works well only for immutable objects) or better to create a table of keys to a main table where in either approach the row-keys are effectively a foreign-key and there is column per object associated through the foreign-key.

Could you maybe give an example for how this would look like?

...

Another approach though might be to use a column with type list (or set, or map) to contain keys to the associated objects - this would be a little like the extensions Oracle have for mapping 1-* associations, though with the caveat that a column of collection type may only contain 64k elements. I wondered if some though had been given to this strategy (which I must admit I have not yet used myself).

A very good question, unfortunately my knowledge of data modeling with Cassandra is still a bit limited. Storing "foreign keys" in collection columns seems like a good idea. It's somewhat similar to the "in entity" mode we have for MongoDB. Do list columns support null values? I think we'd need that for ordered collections containing nulls. Another question is how to deal with compound map keys. For the document stores (MongoDB, CouchDB) we offer an alternative "association document" mode which persists association information not within the referencing entity but within separate entity documents, circumventing similar issues with the max size of documents. IIUC, that's somewhat similar to the first mode you describe. It might make sense to support both modes in a similar fashion for Cassandra, configurable per association. Out of interest, how are associations handled in the branch created by Jonathan? What concerns de-normalization, some thoughts have been made, it's planned for the 4.2 release at this point. It seems very likely that different mapping strategies should be

...

specifiable, but then I have still to understand how these might fit with treiid.

What is "treiid"? Do you mean Teiid (http://teiid.jboss.org/)? +1 for making different strategies configurable where it makes sense. That's what we do for other stores as well. You might want to have a look at the AssociationStorage option. Currently that's specific to document stores, but it might make sense to further generify it.

...

Can I ask about assumptions: is it fair to assume that for Cassandra, OGM will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? This would certainly make life simpler.

Yes, I think that's fair to assume. We still can add support for earlier versions later on, should there be the need for it.

...

An issue I don't see addressed is the choice of consistency-level (read or write) and I wondered if there was a plan for this? Assumptions can be made on a per table basis, but, certainly for ad hoc queries, it is important think to have the flexibility to specify on a per-query basis.

Configuring it on a per-table basis seems sensible. You can have a look at how we do it for MongoDB (read preference, write concern) [1]. There is a generic option mechanism which allows to add store-specific options and let the user configure them via annotations or API, globally, per entity or per property. Specifying options per query is still an open issue. We plan to support options specific to one Session [2] which will override the otherwise defined settings, but per operation is something different yet. The main challenge is that the existing APIs (createQuery() etc.) don't accept any additional context, so we need to find a way to establish such option context valid to one operation somehow.

...

Those are my thoughts so far... I'll see about doing a POC of some of what I have described above

Awesome. Looking forward to it very much. If you want to discuss anything specific in the code, just let us know. If you like, you also can send an "early review" pull request as the basis for discussion of the general approach. Do you have any branch newer than the original one from Jonathan already on GitHub? I could then take a look to make myself acquainted with the current state. Cheers,

...

John

Many thanks for your help and with best regards, --Gunnar [1] https://docs.jboss.org/hibernate/ogm/4.1/reference/en-US/html_single/#_co... [2] https://hibernate.atlassian.net/browse/OGM-343

...

On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge(a)gmail.com> wrote: > Hi Emmanuel, > > I'll take a look at what is there, and I'll get up and running on IRC. > > I'll particularly look at the time-series issue - non-trivial I think. > > Cheers, > > John > > > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard < emmanuel(a)hibernate.org> > wrote: > >> Hi John, >> >> I thought I had replied to you on Friday but apparently the email never >> went through :/ >> >> That is good news :) >> Jonathan worked on a Cassandra prototype but had to drop due to other >> duties. He pushed everything at >> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra >> >> Have a look at what he has done and come ask any question to Gunnar, >> Davide or me. There are a bunch of moving pieces. We are mostly on >> freenode’s #hibernate-dev ( you need a freenode login >> http://freenode.net/faq.shtml#nicksetup ). If you are allergic to IRC, >> let me know and we will find alternatives. >> >> The most interesting challenge will be to see how we can map time series >> into a collection and make sure we let the user decide how much he wants to >> load. >> >> Emmanuel >> >> On 16 Jul 2014, at 13:17, John Worrell <jlesinge(a)gmail.com> wrote: >> >> > Hi, >> > >> > I'm interested in contributing to the Cassandra module of Hibernate-OGM >> - >> > what would be the baest way to go about this? >> > >> > Thanks, >> > >> > John >> > _______________________________________________ >> > hibernate-dev mailing list >> > hibernate-dev(a)lists.jboss.org >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev >> >> > _______________________________________________ hibernate-dev mailing list hibernate-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev

Emmanuel Bernard

Friday, 22 August Fri, 22 Aug

11:25 a.m.

On Thu 2014-08-07 9:10, John Worrell wrote:

...

Hi Emmanuel et al., My apologies for the log radio silence. I've taken a look at the code-base on Jon Halliday's repo, and have set up a nick on freenode - #jlesinge.

No worries I was on holidays. And you email was the few lucky ones that I had to delay as it required thinking ;)

...

On the time-series question I was wondering how you envisaged the data stored: I tend to think of a single row under an primary key with an object-instance per column. Now what we have typically done (generally the data has been immutable) is to store the data serialized as a blob (JSON or XML), but I understand you do not favour this approach. With this sort of model I imagine the collection is then all the objects stored in the row, and the challenge is to page through the objects in the row.

Actually it is one of the valid strategies. If I understand you well, you want to create: - one row per time series generating object (say a thermometer) - the column names of that row would be a timestamp of time at bay - the value would be a JSON structure containing the data at bay for that specific time. That is one of the valid approach. But I think we need to support several: - simple column if the data is literally a single element (temperature) - JSON structure for more complex data per time event - key pointing to the detailed data somewhere else in the cluster The latest would be done in two phases, you load all the keys you are interested in matching your time range and then do a multiget of sort to load the data. It seems datastax tends to recommend 1 or 2 (denormalization FTW). I don't know but there is also the notion of super column which is a grouping of columns that might also address our composite problem assuming they can be used for dynamic column families. http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra http://planetcassandra.org/blog/post/getting-started-with-time-series-dat... http://www.datastax.com/docs/1.0/ddl/column_family

...

An approach we have often taken is to create multiple copies of data in different (obviously works well only for immutable objects) or better to

Yes, that is a feature that I would like OGM to automate for the user. It declaratively defines the denormalization approaches he wants and the engine does the persistence. Next the query engine uses that knowledge to find the best path (or only possible path in the case of Cassandra :) )

...

create a table of keys to a main table where in either approach the row-keys are effectively a foreign-key and there is column per object associated through the foreign-key. Another approach though might be to use a column with type list (or set, or map) to contain keys to the associated objects - this would be a little like the extensions Oracle have for mapping 1-* associations, though with the caveat that a column of collection type may only contain 64k elements. I wondered if some though had been given to this strategy (which I must admit I have not yet used myself).

I am not aware of that approach.

...

It seems very likely that different mapping strategies should be specifiable, but then I have still to understand how these might fit with treiid.

Forget Teiid for now. We will likely start with the HQL->Walker and do our own proto query engine before layering Teiid.

...

Can I ask about assumptions: is it fair to assume that for Cassandra, OGM will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? This would certainly make life simpler.

Yes that's fine.

...

That's planned. We have an option system that allow for entity / property overriding of a global setting. While not implemented, we will also have the ability to override setting per session / query. That was the plan all along.

...

Those are my thoughts so far... I'll see about doing a POC of some of what I have described above

Thanks :)

...

Cheers, John On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge(a)gmail.com> wrote: > Hi Emmanuel, > > I'll take a look at what is there, and I'll get up and running on IRC. > > I'll particularly look at the time-series issue - non-trivial I think. > > Cheers, > > John > > > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard <emmanuel(a)hibernate.org> > wrote: > >> Hi John, >> >> I thought I had replied to you on Friday but apparently the email never >> went through :/ >> >> That is good news :) >> Jonathan worked on a Cassandra prototype but had to drop due to other >> duties. He pushed everything at >> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra >> >> Have a look at what he has done and come ask any question to Gunnar, >> Davide or me. There are a bunch of moving pieces. We are mostly on >> freenode’s #hibernate-dev ( you need a freenode login >> http://freenode.net/faq.shtml#nicksetup ). If you are allergic to IRC, >> let me know and we will find alternatives. >> >> The most interesting challenge will be to see how we can map time series >> into a collection and make sure we let the user decide how much he wants to >> load. >> >> Emmanuel >> >> On 16 Jul 2014, at 13:17, John Worrell <jlesinge(a)gmail.com> wrote: >> >> > Hi, >> > >> > I'm interested in contributing to the Cassandra module of Hibernate-OGM >> - >> > what would be the baest way to go about this? >> > >> > Thanks, >> > >> > John >> > _______________________________________________ >> > hibernate-dev mailing list >> > hibernate-dev(a)lists.jboss.org >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev >> >> >

John Worrell

Tuesday, 9 September Tue, 9 Sep

3:33 a.m.

Hi Emmanuel & Gunnar, Many thanks for your detailed responses - and nice to chat with Gunnar a week or so back. Again I have to apologise for radio silence - my day job suddenly ate all my waking functional time - so progress has been very slow. I'm getting deeper into the code now, and starting a POC... which is leading me to some more detailed questions. Basically, what I am doing is to run the examples and to look at things that seem to be missing, and toi understand the data that is being passed around in the various options classes, so I can make a more informed implementation The key question in my mind at the moment is that of the relationship between the base Hibernate Dialect class and the GridDialect interface - I look at the OgmTableGenerator which is attempting to access a CF / table that is not yet created - I figured I understand what was happening here, and make appropriate extensions / fixes first. So, currently fighting my way through generating the sequence tables, and wondering why OgmSequnceGenerator wraps OgmtableGenerator. Cheers, John On Fri, Aug 22, 2014 at 5:25 PM, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote:

...

On Thu 2014-08-07 9:10, John Worrell wrote: > Hi Emmanuel et al., > > My apologies for the log radio silence. I've taken a look at the code-base > on Jon Halliday's repo, and have set up a nick on freenode - #jlesinge. No worries I was on holidays. And you email was the few lucky ones that I had to delay as it required thinking ;) > > On the time-series question I was wondering how you envisaged the data > stored: I tend to think of a single row under an primary key with an > object-instance per column. Now what we have typically done (generally the > data has been immutable) is to store the data serialized as a blob (JSON or > XML), but I understand you do not favour this approach. With this sort of > model I imagine the collection is then all the objects stored in the row, > and the challenge is to page through the objects in the row. Actually it is one of the valid strategies. If I understand you well, you want to create: - one row per time series generating object (say a thermometer) - the column names of that row would be a timestamp of time at bay - the value would be a JSON structure containing the data at bay for that specific time. That is one of the valid approach. But I think we need to support several: - simple column if the data is literally a single element (temperature) - JSON structure for more complex data per time event - key pointing to the detailed data somewhere else in the cluster The latest would be done in two phases, you load all the keys you are interested in matching your time range and then do a multiget of sort to load the data. It seems datastax tends to recommend 1 or 2 (denormalization FTW). I don't know but there is also the notion of super column which is a grouping of columns that might also address our composite problem assuming they can be used for dynamic column families. http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra http://planetcassandra.org/blog/post/getting-started-with-time-series-dat... http://www.datastax.com/docs/1.0/ddl/column_family > > An approach we have often taken is to create multiple copies of data in > different (obviously works well only for immutable objects) or better to Yes, that is a feature that I would like OGM to automate for the user. It declaratively defines the denormalization approaches he wants and the engine does the persistence. Next the query engine uses that knowledge to find the best path (or only possible path in the case of Cassandra :) ) > create a table of keys to a main table where in either approach the > row-keys are effectively a foreign-key and there is column per object > associated through the foreign-key. Another approach though might be to use > a column with type list (or set, or map) to contain keys to the associated > objects - this would be a little like the extensions Oracle have for > mapping 1-* associations, though with the caveat that a column of > collection type may only contain 64k elements. I wondered if some though > had been given to this strategy (which I must admit I have not yet used > myself). I am not aware of that approach. > > It seems very likely that different mapping strategies should be > specifiable, but then I have still to understand how these might fit with > treiid. Forget Teiid for now. We will likely start with the HQL->Walker and do our own proto query engine before layering Teiid. > > Can I ask about assumptions: is it fair to assume that for Cassandra, OGM > will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? This would > certainly make life simpler. Yes that's fine. > > An issue I don't see addressed is the choice of consistency-level (read or > write) and I wondered if there was a plan for this? Assumptions can be made > on a per table basis, but, certainly for ad hoc queries, it is important > think to have the flexibility to specify on a per-query basis. That's planned. We have an option system that allow for entity / property overriding of a global setting. While not implemented, we will also have the ability to override setting per session / query. That was the plan all along. > > Those are my thoughts so far... I'll see about doing a POC of some of what > I have described above Thanks :) > > Cheers, > > John > > > On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge(a)gmail.com> wrote: > > > Hi Emmanuel, > > > > I'll take a look at what is there, and I'll get up and running on IRC. > > > > I'll particularly look at the time-series issue - non-trivial I think. > > > > Cheers, > > > > John > > > > > > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard < emmanuel(a)hibernate.org> > > wrote: > > > >> Hi John, > >> > >> I thought I had replied to you on Friday but apparently the email never > >> went through :/ > >> > >> That is good news :) > >> Jonathan worked on a Cassandra prototype but had to drop due to other > >> duties. He pushed everything at > >> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra > >> > >> Have a look at what he has done and come ask any question to Gunnar, > >> Davide or me. There are a bunch of moving pieces. We are mostly on > >> freenode’s #hibernate-dev ( you need a freenode login > >> http://freenode.net/faq.shtml#nicksetup ). If you are allergic to IRC, > >> let me know and we will find alternatives. > >> > >> The most interesting challenge will be to see how we can map time series > >> into a collection and make sure we let the user decide how much he wants to > >> load. > >> > >> Emmanuel > >> > >> On 16 Jul 2014, at 13:17, John Worrell <jlesinge(a)gmail.com> wrote: > >> > >> > Hi, > >> > > >> > I'm interested in contributing to the Cassandra module of Hibernate-OGM > >> - > >> > what would be the baest way to go about this? > >> > > >> > Thanks, > >> > > >> > John > >> > _______________________________________________ > >> > hibernate-dev mailing list > >> > hibernate-dev(a)lists.jboss.org > >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev > >> > >> > >

Gunnar Morling

3:59 a.m.

Hi John, 2014-09-09 10:33 GMT+02:00 John Worrell <jlesinge(a)gmail.com>:

...

No worries, we are very glad about your help. I'm getting deeper into the code now, and starting a POC... which is

...

leading me to some more detailed questions. Basically, what I am doing is to run the examples and to look at things that seem to be missing, and toi understand the data that is being passed around in the various options classes, so I can make a more informed implementation

Sounds very reasonable. I also can recommend to take a look at the MongoDB dialect and the persistent representations it creates in the datastore as it can comfortably be browsed e.g. using the mongo command line client. That's how I got to understand many things of the interaction between engine and dialects. If you have any ideas where the dialect SPI documentation can be improved to facilitate an easier understanding of how pieces work together, let me know. The key question in my mind at the moment is that of the relationship

...

between the base Hibernate Dialect class and the GridDialect interface

OGM has its own pseudo implementation of ORM's Dialect contract, OgmDialect, but this should hardly ever play a role during OGM development. OGM's main contract towards dialects is GridDialect. The reason for exposing GridDialect on the pseudo OgmDialect is that it is our backdoor to make GridDialect available to PersistentNoSqlIdentifierGenerator implementations. Atm. there is no way to inject the GridDialect in a more straight-forward way due to some limitations in the way we integrate with the ORM engine.

...

- I look at the OgmTableGenerator which is attempting to access a CF / table that is not yet created - I figured I understand what was happening here, and make appropriate extensions / fixes first. So, currently fighting my way through generating the sequence tables, and wondering why OgmSequnceGenerator wraps OgmtableGenerator.

Just to be sure, are you looking at the latest master? There have been some changes around these generator classes recently, they are in a much cleaner state than they used to be. The reason for the wrapping is that when using the SEQUENCE strategy in cases where the store actually does not natively support sequences, we fall back to TABLE. Currently we only support a "native" SEQUENCE strategy for Neo4j which allows to map sequences as nodes in a reasonable manner, whereas all the other dialects use the table fallback. GridDialect#supportsSequences() is evaluated to find out whether the delegation needs to be done or not. You also could take a look at Neo4jSequenceGenerator which creates the sequence nodes in the datastore based on the registered PersistentNoSqlIdentifierGenerators. Note that this checks via instanceof for OgmSequenceGenerator/OgmTableGenerator atm. As we don't want to expose these types on the dialect SPI, I'm looking into ways for allowing the distinction of the two in a more abstract way, mainly based on IdSourceKeyMetadata. Hope that helps, I'll be very happy to answer any follow-up questions. Thanks again for your help with the Cassandra dialect, I'm looking forward to this dialect very much!

...

Cheers, John

--Gunnar

...

On Fri, Aug 22, 2014 at 5:25 PM, Emmanuel Bernard <emmanuel(a)hibernate.org> wrote: > On Thu 2014-08-07 9:10, John Worrell wrote: > > Hi Emmanuel et al., > > > > My apologies for the log radio silence. I've taken a look at the > code-base > > on Jon Halliday's repo, and have set up a nick on freenode - #jlesinge. > > No worries I was on holidays. > And you email was the few lucky ones that I had to delay as it required > thinking ;) > > > > > On the time-series question I was wondering how you envisaged the data > > stored: I tend to think of a single row under an primary key with an > > object-instance per column. Now what we have typically done (generally > the > > data has been immutable) is to store the data serialized as a blob (JSON > or > > XML), but I understand you do not favour this approach. With this sort of > > model I imagine the collection is then all the objects stored in the row, > > and the challenge is to page through the objects in the row. > > Actually it is one of the valid strategies. > If I understand you well, you want to create: > > - one row per time series generating object (say a thermometer) > - the column names of that row would be a timestamp of time at bay > - the value would be a JSON structure containing the data at bay for > that specific time. > > That is one of the valid approach. But I think we need to support > several: > > - simple column if the data is literally a single element (temperature) > - JSON structure for more complex data per time event > - key pointing to the detailed data somewhere else in the cluster > > The latest would be done in two phases, you load all the keys you are > interested in matching your time range and then do a multiget of sort to > load the data. > > It seems datastax tends to recommend 1 or 2 (denormalization FTW). > > I don't know but there is also the notion of super column which is a > grouping of columns that might also address our composite problem > assuming they can be used for dynamic column families. > > http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra > > http://planetcassandra.org/blog/post/getting-started-with-time-series-dat... > http://www.datastax.com/docs/1.0/ddl/column_family > > > > > An approach we have often taken is to create multiple copies of data in > > different (obviously works well only for immutable objects) or better to > > Yes, that is a feature that I would like OGM to automate for the user. > It declaratively defines the denormalization approaches he wants and the > engine does the persistence. > Next the query engine uses that knowledge to find the best path (or only > possible path in the case of Cassandra :) ) > > > create a table of keys to a main table where in either approach the > > row-keys are effectively a foreign-key and there is column per object > > associated through the foreign-key. Another approach though might be to > use > > a column with type list (or set, or map) to contain keys to the > associated > > objects - this would be a little like the extensions Oracle have for > > mapping 1-* associations, though with the caveat that a column of > > collection type may only contain 64k elements. I wondered if some though > > had been given to this strategy (which I must admit I have not yet used > > myself). > > I am not aware of that approach. > > > > > It seems very likely that different mapping strategies should be > > specifiable, but then I have still to understand how these might fit with > > treiid. > > Forget Teiid for now. We will likely start with the HQL->Walker and do > our own proto query engine before layering Teiid. > > > > > Can I ask about assumptions: is it fair to assume that for Cassandra, OGM > > will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? This would > > certainly make life simpler. > > Yes that's fine. > > > > > An issue I don't see addressed is the choice of consistency-level (read > or > > write) and I wondered if there was a plan for this? Assumptions can be > made > > on a per table basis, but, certainly for ad hoc queries, it is important > > think to have the flexibility to specify on a per-query basis. > > That's planned. We have an option system that allow for entity / > property overriding of a global setting. While not implemented, we will > also have the ability to override setting per session / query. > That was the plan all along. > > > > > Those are my thoughts so far... I'll see about doing a POC of some of > what > > I have described above > > Thanks :) > > > > > Cheers, > > > > John > > > > > > On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge(a)gmail.com> > wrote: > > > > > Hi Emmanuel, > > > > > > I'll take a look at what is there, and I'll get up and running on IRC. > > > > > > I'll particularly look at the time-series issue - non-trivial I think. > > > > > > Cheers, > > > > > > John > > > > > > > > > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard < > emmanuel(a)hibernate.org> > > > wrote: > > > > > >> Hi John, > > >> > > >> I thought I had replied to you on Friday but apparently the email > never > > >> went through :/ > > >> > > >> That is good news :) > > >> Jonathan worked on a Cassandra prototype but had to drop due to other > > >> duties. He pushed everything at > > >> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra > > >> > > >> Have a look at what he has done and come ask any question to Gunnar, > > >> Davide or me. There are a bunch of moving pieces. We are mostly on > > >> freenode’s #hibernate-dev ( you need a freenode login > > >> http://freenode.net/faq.shtml#nicksetup ). If you are allergic to > IRC, > > >> let me know and we will find alternatives. > > >> > > >> The most interesting challenge will be to see how we can map time > series > > >> into a collection and make sure we let the user decide how much he > wants to > > >> load. > > >> > > >> Emmanuel > > >> > > >> On 16 Jul 2014, at 13:17, John Worrell <jlesinge(a)gmail.com> wrote: > > >> > > >> > Hi, > > >> > > > >> > I'm interested in contributing to the Cassandra module of > Hibernate-OGM > > >> - > > >> > what would be the baest way to go about this? > > >> > > > >> > Thanks, > > >> > > > >> > John > > >> > _______________________________________________ > > >> > hibernate-dev mailing list > > >> > hibernate-dev(a)lists.jboss.org > > >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev > > >> > > >> > > > > _______________________________________________ hibernate-dev mailing list hibernate-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev

John Worrell

4:08 a.m.

Hi Gunnar, Many thanks for the reply - I'll yank down the master... assume it is merged back to the Jon Halliday fork otherwise I'll need to mess about a bit. Also had some issues with getting connected to C*, understandable, but also wrt adding <class> tags for the Dog / Breed classes in the persistence.xml file. not sure whether that is intended to be needed. Cheers, John On Tue, Sep 9, 2014 at 9:59 AM, Gunnar Morling <gunnar(a)hibernate.org> wrote:

...

Hi John, 2014-09-09 10:33 GMT+02:00 John Worrell <jlesinge(a)gmail.com>: > Hi Emmanuel & Gunnar, > > Many thanks for your detailed responses - and nice to chat with Gunnar a > week or so back. Again I have to apologise for radio silence - my day job > suddenly ate all my waking functional time - so progress has been very > slow. > No worries, we are very glad about your help. I'm getting deeper into the code now, and starting a POC... which is > leading me to some more detailed questions. Basically, what I am doing is > to run the examples and to look at things that seem to be missing, and toi > understand the data that is being passed around in the various options > classes, so I can make a more informed implementation > Sounds very reasonable. I also can recommend to take a look at the MongoDB dialect and the persistent representations it creates in the datastore as it can comfortably be browsed e.g. using the mongo command line client. That's how I got to understand many things of the interaction between engine and dialects. If you have any ideas where the dialect SPI documentation can be improved to facilitate an easier understanding of how pieces work together, let me know. The key question in my mind at the moment is that of the relationship > between the base Hibernate Dialect class and the GridDialect interface OGM has its own pseudo implementation of ORM's Dialect contract, OgmDialect, but this should hardly ever play a role during OGM development. OGM's main contract towards dialects is GridDialect. The reason for exposing GridDialect on the pseudo OgmDialect is that it is our backdoor to make GridDialect available to PersistentNoSqlIdentifierGenerator implementations. Atm. there is no way to inject the GridDialect in a more straight-forward way due to some limitations in the way we integrate with the ORM engine. > - I > look at the OgmTableGenerator which is attempting to access a CF / table > that is not yet created - I figured I understand what was happening here, > and make appropriate extensions / fixes first. So, currently fighting my > way through generating the sequence tables, and wondering why > OgmSequnceGenerator wraps OgmtableGenerator. > Just to be sure, are you looking at the latest master? There have been some changes around these generator classes recently, they are in a much cleaner state than they used to be. The reason for the wrapping is that when using the SEQUENCE strategy in cases where the store actually does not natively support sequences, we fall back to TABLE. Currently we only support a "native" SEQUENCE strategy for Neo4j which allows to map sequences as nodes in a reasonable manner, whereas all the other dialects use the table fallback. GridDialect#supportsSequences() is evaluated to find out whether the delegation needs to be done or not. You also could take a look at Neo4jSequenceGenerator which creates the sequence nodes in the datastore based on the registered PersistentNoSqlIdentifierGenerators. Note that this checks via instanceof for OgmSequenceGenerator/OgmTableGenerator atm. As we don't want to expose these types on the dialect SPI, I'm looking into ways for allowing the distinction of the two in a more abstract way, mainly based on IdSourceKeyMetadata. Hope that helps, I'll be very happy to answer any follow-up questions. Thanks again for your help with the Cassandra dialect, I'm looking forward to this dialect very much! > > Cheers, > > John > --Gunnar > > > On Fri, Aug 22, 2014 at 5:25 PM, Emmanuel Bernard <emmanuel(a)hibernate.org > > > wrote: > > > On Thu 2014-08-07 9:10, John Worrell wrote: > > > Hi Emmanuel et al., > > > > > > My apologies for the log radio silence. I've taken a look at the > > code-base > > > on Jon Halliday's repo, and have set up a nick on freenode - > #jlesinge. > > > > No worries I was on holidays. > > And you email was the few lucky ones that I had to delay as it required > > thinking ;) > > > > > > > > On the time-series question I was wondering how you envisaged the data > > > stored: I tend to think of a single row under an primary key with an > > > object-instance per column. Now what we have typically done (generally > > the > > > data has been immutable) is to store the data serialized as a blob > (JSON > > or > > > XML), but I understand you do not favour this approach. With this > sort of > > > model I imagine the collection is then all the objects stored in the > row, > > > and the challenge is to page through the objects in the row. > > > > Actually it is one of the valid strategies. > > If I understand you well, you want to create: > > > > - one row per time series generating object (say a thermometer) > > - the column names of that row would be a timestamp of time at bay > > - the value would be a JSON structure containing the data at bay for > > that specific time. > > > > That is one of the valid approach. But I think we need to support > > several: > > > > - simple column if the data is literally a single element (temperature) > > - JSON structure for more complex data per time event > > - key pointing to the detailed data somewhere else in the cluster > > > > The latest would be done in two phases, you load all the keys you are > > interested in matching your time range and then do a multiget of sort to > > load the data. > > > > It seems datastax tends to recommend 1 or 2 (denormalization FTW). > > > > I don't know but there is also the notion of super column which is a > > grouping of columns that might also address our composite problem > > assuming they can be used for dynamic column families. > > > > http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra > > > > > http://planetcassandra.org/blog/post/getting-started-with-time-series-dat... > > http://www.datastax.com/docs/1.0/ddl/column_family > > > > > > > > An approach we have often taken is to create multiple copies of data > in > > > different (obviously works well only for immutable objects) or better > to > > > > Yes, that is a feature that I would like OGM to automate for the user. > > It declaratively defines the denormalization approaches he wants and the > > engine does the persistence. > > Next the query engine uses that knowledge to find the best path (or only > > possible path in the case of Cassandra :) ) > > > > > create a table of keys to a main table where in either approach the > > > row-keys are effectively a foreign-key and there is column per object > > > associated through the foreign-key. Another approach though might be > to > > use > > > a column with type list (or set, or map) to contain keys to the > > associated > > > objects - this would be a little like the extensions Oracle have for > > > mapping 1-* associations, though with the caveat that a column of > > > collection type may only contain 64k elements. I wondered if some > though > > > had been given to this strategy (which I must admit I have not yet > used > > > myself). > > > > I am not aware of that approach. > > > > > > > > It seems very likely that different mapping strategies should be > > > specifiable, but then I have still to understand how these might fit > with > > > treiid. > > > > Forget Teiid for now. We will likely start with the HQL->Walker and do > > our own proto query engine before layering Teiid. > > > > > > > > Can I ask about assumptions: is it fair to assume that for Cassandra, > OGM > > > will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? This > would > > > certainly make life simpler. > > > > Yes that's fine. > > > > > > > > An issue I don't see addressed is the choice of consistency-level > (read > > or > > > write) and I wondered if there was a plan for this? Assumptions can be > > made > > > on a per table basis, but, certainly for ad hoc queries, it is > important > > > think to have the flexibility to specify on a per-query basis. > > > > That's planned. We have an option system that allow for entity / > > property overriding of a global setting. While not implemented, we will > > also have the ability to override setting per session / query. > > That was the plan all along. > > > > > > > > Those are my thoughts so far... I'll see about doing a POC of some of > > what > > > I have described above > > > > Thanks :) > > > > > > > > Cheers, > > > > > > John > > > > > > > > > On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge(a)gmail.com> > > wrote: > > > > > > > Hi Emmanuel, > > > > > > > > I'll take a look at what is there, and I'll get up and running on > IRC. > > > > > > > > I'll particularly look at the time-series issue - non-trivial I > think. > > > > > > > > Cheers, > > > > > > > > John > > > > > > > > > > > > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard < > > emmanuel(a)hibernate.org> > > > > wrote: > > > > > > > >> Hi John, > > > >> > > > >> I thought I had replied to you on Friday but apparently the email > > never > > > >> went through :/ > > > >> > > > >> That is good news :) > > > >> Jonathan worked on a Cassandra prototype but had to drop due to > other > > > >> duties. He pushed everything at > > > >> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra > > > >> > > > >> Have a look at what he has done and come ask any question to > Gunnar, > > > >> Davide or me. There are a bunch of moving pieces. We are mostly on > > > >> freenode’s #hibernate-dev ( you need a freenode login > > > >> http://freenode.net/faq.shtml#nicksetup ). If you are allergic to > > IRC, > > > >> let me know and we will find alternatives. > > > >> > > > >> The most interesting challenge will be to see how we can map time > > series > > > >> into a collection and make sure we let the user decide how much he > > wants to > > > >> load. > > > >> > > > >> Emmanuel > > > >> > > > >> On 16 Jul 2014, at 13:17, John Worrell <jlesinge(a)gmail.com> wrote: > > > >> > > > >> > Hi, > > > >> > > > > >> > I'm interested in contributing to the Cassandra module of > > Hibernate-OGM > > > >> - > > > >> > what would be the baest way to go about this? > > > >> > > > > >> > Thanks, > > > >> > > > > >> > John > > > >> > _______________________________________________ > > > >> > hibernate-dev mailing list > > > >> > hibernate-dev(a)lists.jboss.org > > > >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev > > > >> > > > >> > > > > > > > _______________________________________________ > hibernate-dev mailing list > hibernate-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hibernate-dev >

Gunnar Morling

4:36 a.m.

Hi, 2014-09-09 11:08 GMT+02:00 John Worrell <jlesinge(a)gmail.com>:

...

Hi Gunnar, Many thanks for the reply - I'll yank down the master... assume it is merged back to the Jon Halliday fork otherwise I'll need to mess about a bit.

Not sure when Jon's branch was updated for the last time. Probably you need to rebase (we prefer to work with rebases rather than merge commits) your local branch onto the latest master from upstream. There have been some changes around GridDialect in the last time, mainly about query execution and id generation. Nothing dramatic, though.

...

Also had some issues with getting connected to C*, understandable, but also wrt adding <class> tags for the Dog / Breed classes in the persistence.xml file. not sure whether that is intended to be needed.

You mean the classes from the "Getting Started" example, right? The <class> tags should not be required, the example runs as is e.g. on Infinispan. What happens if you don't add those? Cheers,

...

John

--Gunnar On Tue, Sep 9, 2014 at 9:59 AM, Gunnar Morling <gunnar(a)hibernate.org> wrote:

...

> Hi John, > > 2014-09-09 10:33 GMT+02:00 John Worrell <jlesinge(a)gmail.com>: > >> Hi Emmanuel & Gunnar, >> >> Many thanks for your detailed responses - and nice to chat with Gunnar a >> week or so back. Again I have to apologise for radio silence - my day job >> suddenly ate all my waking functional time - so progress has been very >> slow. >> > > No worries, we are very glad about your help. > > I'm getting deeper into the code now, and starting a POC... which is >> leading me to some more detailed questions. Basically, what I am doing is >> to run the examples and to look at things that seem to be missing, and >> toi >> understand the data that is being passed around in the various options >> classes, so I can make a more informed implementation >> > > Sounds very reasonable. I also can recommend to take a look at the > MongoDB dialect and the persistent representations it creates in the > datastore as it can comfortably be browsed e.g. using the mongo command > line client. That's how I got to understand many things of the interaction > between engine and dialects. > > If you have any ideas where the dialect SPI documentation can be improved > to facilitate an easier understanding of how pieces work together, let me > know. > > The key question in my mind at the moment is that of the relationship >> between the base Hibernate Dialect class and the GridDialect interface > > > OGM has its own pseudo implementation of ORM's Dialect contract, > OgmDialect, but this should hardly ever play a role during OGM development. > OGM's main contract towards dialects is GridDialect. > > The reason for exposing GridDialect on the pseudo OgmDialect is that it > is our backdoor to make GridDialect available to > PersistentNoSqlIdentifierGenerator implementations. Atm. there is no way to > inject the GridDialect in a more straight-forward way due to some > limitations in the way we integrate with the ORM engine. > > >> - I >> look at the OgmTableGenerator which is attempting to access a CF / table >> that is not yet created - I figured I understand what was happening here, >> and make appropriate extensions / fixes first. So, currently fighting my >> way through generating the sequence tables, and wondering why >> OgmSequnceGenerator wraps OgmtableGenerator. >> > > Just to be sure, are you looking at the latest master? There have been > some changes around these generator classes recently, they are in a much > cleaner state than they used to be. > > The reason for the wrapping is that when using the SEQUENCE strategy in > cases where the store actually does not natively support sequences, we fall > back to TABLE. Currently we only support a "native" SEQUENCE strategy for > Neo4j which allows to map sequences as nodes in a reasonable manner, > whereas all the other dialects use the table fallback. > GridDialect#supportsSequences() is evaluated to find out whether the > delegation needs to be done or not. > > You also could take a look at Neo4jSequenceGenerator which creates the > sequence nodes in the datastore based on the registered > PersistentNoSqlIdentifierGenerators. Note that this checks via instanceof > for OgmSequenceGenerator/OgmTableGenerator atm. As we don't want to expose > these types on the dialect SPI, I'm looking into ways for allowing the > distinction of the two in a more abstract way, mainly based on > IdSourceKeyMetadata. > > Hope that helps, I'll be very happy to answer any follow-up questions. > Thanks again for your help with the Cassandra dialect, I'm looking forward > to this dialect very much! > > >> >> Cheers, >> >> John >> > > --Gunnar > > >> >> >> On Fri, Aug 22, 2014 at 5:25 PM, Emmanuel Bernard < >> emmanuel(a)hibernate.org> >> wrote: >> >> > On Thu 2014-08-07 9:10, John Worrell wrote: >> > > Hi Emmanuel et al., >> > > >> > > My apologies for the log radio silence. I've taken a look at the >> > code-base >> > > on Jon Halliday's repo, and have set up a nick on freenode - >> #jlesinge. >> > >> > No worries I was on holidays. >> > And you email was the few lucky ones that I had to delay as it required >> > thinking ;) >> > >> > > >> > > On the time-series question I was wondering how you envisaged the >> data >> > > stored: I tend to think of a single row under an primary key with an >> > > object-instance per column. Now what we have typically done >> (generally >> > the >> > > data has been immutable) is to store the data serialized as a blob >> (JSON >> > or >> > > XML), but I understand you do not favour this approach. With this >> sort of >> > > model I imagine the collection is then all the objects stored in the >> row, >> > > and the challenge is to page through the objects in the row. >> > >> > Actually it is one of the valid strategies. >> > If I understand you well, you want to create: >> > >> > - one row per time series generating object (say a thermometer) >> > - the column names of that row would be a timestamp of time at bay >> > - the value would be a JSON structure containing the data at bay for >> > that specific time. >> > >> > That is one of the valid approach. But I think we need to support >> > several: >> > >> > - simple column if the data is literally a single element (temperature) >> > - JSON structure for more complex data per time event >> > - key pointing to the detailed data somewhere else in the cluster >> > >> > The latest would be done in two phases, you load all the keys you are >> > interested in matching your time range and then do a multiget of sort >> to >> > load the data. >> > >> > It seems datastax tends to recommend 1 or 2 (denormalization FTW). >> > >> > I don't know but there is also the notion of super column which is a >> > grouping of columns that might also address our composite problem >> > assuming they can be used for dynamic column families. >> > >> > http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra >> > >> > >> http://planetcassandra.org/blog/post/getting-started-with-time-series-dat... >> > http://www.datastax.com/docs/1.0/ddl/column_family >> > >> > > >> > > An approach we have often taken is to create multiple copies of data >> in >> > > different (obviously works well only for immutable objects) or >> better to >> > >> > Yes, that is a feature that I would like OGM to automate for the user. >> > It declaratively defines the denormalization approaches he wants and >> the >> > engine does the persistence. >> > Next the query engine uses that knowledge to find the best path (or >> only >> > possible path in the case of Cassandra :) ) >> > >> > > create a table of keys to a main table where in either approach the >> > > row-keys are effectively a foreign-key and there is column per >> object >> > > associated through the foreign-key. Another approach though might be >> to >> > use >> > > a column with type list (or set, or map) to contain keys to the >> > associated >> > > objects - this would be a little like the extensions Oracle have for >> > > mapping 1-* associations, though with the caveat that a column of >> > > collection type may only contain 64k elements. I wondered if some >> though >> > > had been given to this strategy (which I must admit I have not yet >> used >> > > myself). >> > >> > I am not aware of that approach. >> > >> > > >> > > It seems very likely that different mapping strategies should be >> > > specifiable, but then I have still to understand how these might fit >> with >> > > treiid. >> > >> > Forget Teiid for now. We will likely start with the HQL->Walker and do >> > our own proto query engine before layering Teiid. >> > >> > > >> > > Can I ask about assumptions: is it fair to assume that for >> Cassandra, OGM >> > > will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? This >> would >> > > certainly make life simpler. >> > >> > Yes that's fine. >> > >> > > >> > > An issue I don't see addressed is the choice of consistency-level >> (read >> > or >> > > write) and I wondered if there was a plan for this? Assumptions can >> be >> > made >> > > on a per table basis, but, certainly for ad hoc queries, it is >> important >> > > think to have the flexibility to specify on a per-query basis. >> > >> > That's planned. We have an option system that allow for entity / >> > property overriding of a global setting. While not implemented, we will >> > also have the ability to override setting per session / query. >> > That was the plan all along. >> > >> > > >> > > Those are my thoughts so far... I'll see about doing a POC of some of >> > what >> > > I have described above >> > >> > Thanks :) >> > >> > > >> > > Cheers, >> > > >> > > John >> > > >> > > >> > > On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge(a)gmail.com> >> > wrote: >> > > >> > > > Hi Emmanuel, >> > > > >> > > > I'll take a look at what is there, and I'll get up and running on >> IRC. >> > > > >> > > > I'll particularly look at the time-series issue - non-trivial I >> think. >> > > > >> > > > Cheers, >> > > > >> > > > John >> > > > >> > > > >> > > > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard < >> > emmanuel(a)hibernate.org> >> > > > wrote: >> > > > >> > > >> Hi John, >> > > >> >> > > >> I thought I had replied to you on Friday but apparently the email >> > never >> > > >> went through :/ >> > > >> >> > > >> That is good news :) >> > > >> Jonathan worked on a Cassandra prototype but had to drop due to >> other >> > > >> duties. He pushed everything at >> > > >> >> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra >> > > >> >> > > >> Have a look at what he has done and come ask any question to >> Gunnar, >> > > >> Davide or me. There are a bunch of moving pieces. We are mostly on >> > > >> freenode’s #hibernate-dev ( you need a freenode login >> > > >> http://freenode.net/faq.shtml#nicksetup ). If you are allergic to >> > IRC, >> > > >> let me know and we will find alternatives. >> > > >> >> > > >> The most interesting challenge will be to see how we can map time >> > series >> > > >> into a collection and make sure we let the user decide how much he >> > wants to >> > > >> load. >> > > >> >> > > >> Emmanuel >> > > >> >> > > >> On 16 Jul 2014, at 13:17, John Worrell <jlesinge(a)gmail.com> >> wrote: >> > > >> >> > > >> > Hi, >> > > >> > >> > > >> > I'm interested in contributing to the Cassandra module of >> > Hibernate-OGM >> > > >> - >> > > >> > what would be the baest way to go about this? >> > > >> > >> > > >> > Thanks, >> > > >> > >> > > >> > John >> > > >> > _______________________________________________ >> > > >> > hibernate-dev mailing list >> > > >> > hibernate-dev(a)lists.jboss.org >> > > >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev >> > > >> >> > > >> >> > > > >> > >> _______________________________________________ >> hibernate-dev mailing list >> hibernate-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/hibernate-dev >> > >

John Worrell

5:55 a.m.

Hi Gunnar, Wrt the <class> tags - partly it is an issue with Eclipse JPA which complains if the <class> tags are absent, but I think it *may* actually not make any difference to the examples - the real issue lies with the code not picking up the sequences to generate properly, and as you point out that may now be fixed in the latest master. I'll look at a rebase. Thanks, John On Tue, Sep 9, 2014 at 10:36 AM, Gunnar Morling <gunnar(a)hibernate.org> wrote:

...

Hi, 2014-09-09 11:08 GMT+02:00 John Worrell <jlesinge(a)gmail.com>: > Hi Gunnar, > > Many thanks for the reply - I'll yank down the master... assume it is > merged back to the Jon Halliday fork otherwise I'll need to mess about a > bit. > Not sure when Jon's branch was updated for the last time. Probably you need to rebase (we prefer to work with rebases rather than merge commits) your local branch onto the latest master from upstream. There have been some changes around GridDialect in the last time, mainly about query execution and id generation. Nothing dramatic, though. > Also had some issues with getting connected to C*, understandable, but > also wrt adding <class> tags for the Dog / Breed classes in the > persistence.xml file. not sure whether that is intended to be needed. > You mean the classes from the "Getting Started" example, right? The <class> tags should not be required, the example runs as is e.g. on Infinispan. What happens if you don't add those? Cheers, > > John > --Gunnar On Tue, Sep 9, 2014 at 9:59 AM, Gunnar Morling <gunnar(a)hibernate.org> > wrote: > >> Hi John, >> >> 2014-09-09 10:33 GMT+02:00 John Worrell <jlesinge(a)gmail.com>: >> >>> Hi Emmanuel & Gunnar, >>> >>> Many thanks for your detailed responses - and nice to chat with Gunnar a >>> week or so back. Again I have to apologise for radio silence - my day >>> job >>> suddenly ate all my waking functional time - so progress has been very >>> slow. >>> >> >> No worries, we are very glad about your help. >> >> I'm getting deeper into the code now, and starting a POC... which is >>> leading me to some more detailed questions. Basically, what I am doing >>> is >>> to run the examples and to look at things that seem to be missing, and >>> toi >>> understand the data that is being passed around in the various options >>> classes, so I can make a more informed implementation >>> >> >> Sounds very reasonable. I also can recommend to take a look at the >> MongoDB dialect and the persistent representations it creates in the >> datastore as it can comfortably be browsed e.g. using the mongo command >> line client. That's how I got to understand many things of the interaction >> between engine and dialects. >> >> If you have any ideas where the dialect SPI documentation can be >> improved to facilitate an easier understanding of how pieces work together, >> let me know. >> >> The key question in my mind at the moment is that of the relationship >>> between the base Hibernate Dialect class and the GridDialect interface >> >> >> OGM has its own pseudo implementation of ORM's Dialect contract, >> OgmDialect, but this should hardly ever play a role during OGM development. >> OGM's main contract towards dialects is GridDialect. >> >> The reason for exposing GridDialect on the pseudo OgmDialect is that it >> is our backdoor to make GridDialect available to >> PersistentNoSqlIdentifierGenerator implementations. Atm. there is no way to >> inject the GridDialect in a more straight-forward way due to some >> limitations in the way we integrate with the ORM engine. >> >> >>> - I >>> look at the OgmTableGenerator which is attempting to access a CF / table >>> that is not yet created - I figured I understand what was happening >>> here, >>> and make appropriate extensions / fixes first. So, currently fighting my >>> way through generating the sequence tables, and wondering why >>> OgmSequnceGenerator wraps OgmtableGenerator. >>> >> >> Just to be sure, are you looking at the latest master? There have been >> some changes around these generator classes recently, they are in a much >> cleaner state than they used to be. >> >> The reason for the wrapping is that when using the SEQUENCE strategy in >> cases where the store actually does not natively support sequences, we fall >> back to TABLE. Currently we only support a "native" SEQUENCE strategy for >> Neo4j which allows to map sequences as nodes in a reasonable manner, >> whereas all the other dialects use the table fallback. >> GridDialect#supportsSequences() is evaluated to find out whether the >> delegation needs to be done or not. >> >> You also could take a look at Neo4jSequenceGenerator which creates the >> sequence nodes in the datastore based on the registered >> PersistentNoSqlIdentifierGenerators. Note that this checks via instanceof >> for OgmSequenceGenerator/OgmTableGenerator atm. As we don't want to expose >> these types on the dialect SPI, I'm looking into ways for allowing the >> distinction of the two in a more abstract way, mainly based on >> IdSourceKeyMetadata. >> >> Hope that helps, I'll be very happy to answer any follow-up questions. >> Thanks again for your help with the Cassandra dialect, I'm looking forward >> to this dialect very much! >> >> >>> >>> Cheers, >>> >>> John >>> >> >> --Gunnar >> >> >>> >>> >>> On Fri, Aug 22, 2014 at 5:25 PM, Emmanuel Bernard < >>> emmanuel(a)hibernate.org> >>> wrote: >>> >>> > On Thu 2014-08-07 9:10, John Worrell wrote: >>> > > Hi Emmanuel et al., >>> > > >>> > > My apologies for the log radio silence. I've taken a look at the >>> > code-base >>> > > on Jon Halliday's repo, and have set up a nick on freenode - >>> #jlesinge. >>> > >>> > No worries I was on holidays. >>> > And you email was the few lucky ones that I had to delay as it >>> required >>> > thinking ;) >>> > >>> > > >>> > > On the time-series question I was wondering how you envisaged the >>> data >>> > > stored: I tend to think of a single row under an primary key with an >>> > > object-instance per column. Now what we have typically done >>> (generally >>> > the >>> > > data has been immutable) is to store the data serialized as a blob >>> (JSON >>> > or >>> > > XML), but I understand you do not favour this approach. With this >>> sort of >>> > > model I imagine the collection is then all the objects stored in >>> the row, >>> > > and the challenge is to page through the objects in the row. >>> > >>> > Actually it is one of the valid strategies. >>> > If I understand you well, you want to create: >>> > >>> > - one row per time series generating object (say a thermometer) >>> > - the column names of that row would be a timestamp of time at bay >>> > - the value would be a JSON structure containing the data at bay for >>> > that specific time. >>> > >>> > That is one of the valid approach. But I think we need to support >>> > several: >>> > >>> > - simple column if the data is literally a single element >>> (temperature) >>> > - JSON structure for more complex data per time event >>> > - key pointing to the detailed data somewhere else in the cluster >>> > >>> > The latest would be done in two phases, you load all the keys you are >>> > interested in matching your time range and then do a multiget of sort >>> to >>> > load the data. >>> > >>> > It seems datastax tends to recommend 1 or 2 (denormalization FTW). >>> > >>> > I don't know but there is also the notion of super column which is a >>> > grouping of columns that might also address our composite problem >>> > assuming they can be used for dynamic column families. >>> > >>> > http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra >>> > >>> > >>> http://planetcassandra.org/blog/post/getting-started-with-time-series-dat... >>> > http://www.datastax.com/docs/1.0/ddl/column_family >>> > >>> > > >>> > > An approach we have often taken is to create multiple copies of >>> data in >>> > > different (obviously works well only for immutable objects) or >>> better to >>> > >>> > Yes, that is a feature that I would like OGM to automate for the user. >>> > It declaratively defines the denormalization approaches he wants and >>> the >>> > engine does the persistence. >>> > Next the query engine uses that knowledge to find the best path (or >>> only >>> > possible path in the case of Cassandra :) ) >>> > >>> > > create a table of keys to a main table where in either approach the >>> > > row-keys are effectively a foreign-key and there is column per >>> object >>> > > associated through the foreign-key. Another approach though might >>> be to >>> > use >>> > > a column with type list (or set, or map) to contain keys to the >>> > associated >>> > > objects - this would be a little like the extensions Oracle have for >>> > > mapping 1-* associations, though with the caveat that a column of >>> > > collection type may only contain 64k elements. I wondered if some >>> though >>> > > had been given to this strategy (which I must admit I have not yet >>> used >>> > > myself). >>> > >>> > I am not aware of that approach. >>> > >>> > > >>> > > It seems very likely that different mapping strategies should be >>> > > specifiable, but then I have still to understand how these might >>> fit with >>> > > treiid. >>> > >>> > Forget Teiid for now. We will likely start with the HQL->Walker and do >>> > our own proto query engine before layering Teiid. >>> > >>> > > >>> > > Can I ask about assumptions: is it fair to assume that for >>> Cassandra, OGM >>> > > will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? This >>> would >>> > > certainly make life simpler. >>> > >>> > Yes that's fine. >>> > >>> > > >>> > > An issue I don't see addressed is the choice of consistency-level >>> (read >>> > or >>> > > write) and I wondered if there was a plan for this? Assumptions can >>> be >>> > made >>> > > on a per table basis, but, certainly for ad hoc queries, it is >>> important >>> > > think to have the flexibility to specify on a per-query basis. >>> > >>> > That's planned. We have an option system that allow for entity / >>> > property overriding of a global setting. While not implemented, we >>> will >>> > also have the ability to override setting per session / query. >>> > That was the plan all along. >>> > >>> > > >>> > > Those are my thoughts so far... I'll see about doing a POC of some >>> of >>> > what >>> > > I have described above >>> > >>> > Thanks :) >>> > >>> > > >>> > > Cheers, >>> > > >>> > > John >>> > > >>> > > >>> > > On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge(a)gmail.com> >>> > wrote: >>> > > >>> > > > Hi Emmanuel, >>> > > > >>> > > > I'll take a look at what is there, and I'll get up and running on >>> IRC. >>> > > > >>> > > > I'll particularly look at the time-series issue - non-trivial I >>> think. >>> > > > >>> > > > Cheers, >>> > > > >>> > > > John >>> > > > >>> > > > >>> > > > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard < >>> > emmanuel(a)hibernate.org> >>> > > > wrote: >>> > > > >>> > > >> Hi John, >>> > > >> >>> > > >> I thought I had replied to you on Friday but apparently the email >>> > never >>> > > >> went through :/ >>> > > >> >>> > > >> That is good news :) >>> > > >> Jonathan worked on a Cassandra prototype but had to drop due to >>> other >>> > > >> duties. He pushed everything at >>> > > >> >>> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra >>> > > >> >>> > > >> Have a look at what he has done and come ask any question to >>> Gunnar, >>> > > >> Davide or me. There are a bunch of moving pieces. We are mostly >>> on >>> > > >> freenode’s #hibernate-dev ( you need a freenode login >>> > > >> http://freenode.net/faq.shtml#nicksetup ). If you are allergic >>> to >>> > IRC, >>> > > >> let me know and we will find alternatives. >>> > > >> >>> > > >> The most interesting challenge will be to see how we can map time >>> > series >>> > > >> into a collection and make sure we let the user decide how much >>> he >>> > wants to >>> > > >> load. >>> > > >> >>> > > >> Emmanuel >>> > > >> >>> > > >> On 16 Jul 2014, at 13:17, John Worrell <jlesinge(a)gmail.com> >>> wrote: >>> > > >> >>> > > >> > Hi, >>> > > >> > >>> > > >> > I'm interested in contributing to the Cassandra module of >>> > Hibernate-OGM >>> > > >> - >>> > > >> > what would be the baest way to go about this? >>> > > >> > >>> > > >> > Thanks, >>> > > >> > >>> > > >> > John >>> > > >> > _______________________________________________ >>> > > >> > hibernate-dev mailing list >>> > > >> > hibernate-dev(a)lists.jboss.org >>> > > >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev >>> > > >> >>> > > >> >>> > > > >>> > >>> _______________________________________________ >>> hibernate-dev mailing list >>> hibernate-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/hibernate-dev >>> >> >> >

Gunnar Morling

7:06 a.m.

Hi, 2014-09-09 12:55 GMT+02:00 John Worrell <jlesinge(a)gmail.com>:

...

To provide some more details, it's a dialect-specific implementation of the SchemaDefiner contract which is in charge of the schema initialization. The specific implementation type is to be returned from DatastoreProvider#getSchemaDefinerType(). The SchemaDefiner is invoked by the engine after session factory initialization (eventually it will only be invoked if required so by the "hbm2ddl.auto" setting). That contract is still experimental at this time, we need to flesh it out in more detail, also based on the feedback what's needed for Cassandra (as it is the first store with a fixed schema). Does Cassandra have any counterpart to physical sequences as e.g. in Oracle? If not (and it can not be emulated in a meaningful way as we do for Neo4j), GridDialect#supportsSequences() would have to return false, and the table-based strategy needs to be implemented. I'll look at a rebase.

...

Thanks, John

Hth, --Gunnar

...

On Tue, Sep 9, 2014 at 10:36 AM, Gunnar Morling <gunnar(a)hibernate.org> wrote: > Hi, > > 2014-09-09 11:08 GMT+02:00 John Worrell <jlesinge(a)gmail.com>: > >> Hi Gunnar, >> >> Many thanks for the reply - I'll yank down the master... assume it is >> merged back to the Jon Halliday fork otherwise I'll need to mess about a >> bit. >> > > Not sure when Jon's branch was updated for the last time. > > Probably you need to rebase (we prefer to work with rebases rather than > merge commits) your local branch onto the latest master from upstream. > There have been some changes around GridDialect in the last time, mainly > about query execution and id generation. Nothing dramatic, though. > > >> Also had some issues with getting connected to C*, understandable, but >> also wrt adding <class> tags for the Dog / Breed classes in the >> persistence.xml file. not sure whether that is intended to be needed. >> > > You mean the classes from the "Getting Started" example, right? The > <class> tags should not be required, the example runs as is e.g. on > Infinispan. What happens if you don't add those? > > Cheers, >> >> John >> > > --Gunnar > > On Tue, Sep 9, 2014 at 9:59 AM, Gunnar Morling <gunnar(a)hibernate.org> >> wrote: >> >>> Hi John, >>> >>> 2014-09-09 10:33 GMT+02:00 John Worrell <jlesinge(a)gmail.com>: >>> >>>> Hi Emmanuel & Gunnar, >>>> >>>> Many thanks for your detailed responses - and nice to chat with Gunnar >>>> a >>>> week or so back. Again I have to apologise for radio silence - my day >>>> job >>>> suddenly ate all my waking functional time - so progress has been very >>>> slow. >>>> >>> >>> No worries, we are very glad about your help. >>> >>> I'm getting deeper into the code now, and starting a POC... which is >>>> leading me to some more detailed questions. Basically, what I am doing >>>> is >>>> to run the examples and to look at things that seem to be missing, and >>>> toi >>>> understand the data that is being passed around in the various options >>>> classes, so I can make a more informed implementation >>>> >>> >>> Sounds very reasonable. I also can recommend to take a look at the >>> MongoDB dialect and the persistent representations it creates in the >>> datastore as it can comfortably be browsed e.g. using the mongo command >>> line client. That's how I got to understand many things of the interaction >>> between engine and dialects. >>> >>> If you have any ideas where the dialect SPI documentation can be >>> improved to facilitate an easier understanding of how pieces work together, >>> let me know. >>> >>> The key question in my mind at the moment is that of the relationship >>>> between the base Hibernate Dialect class and the GridDialect interface >>> >>> >>> OGM has its own pseudo implementation of ORM's Dialect contract, >>> OgmDialect, but this should hardly ever play a role during OGM development. >>> OGM's main contract towards dialects is GridDialect. >>> >>> The reason for exposing GridDialect on the pseudo OgmDialect is that it >>> is our backdoor to make GridDialect available to >>> PersistentNoSqlIdentifierGenerator implementations. Atm. there is no way to >>> inject the GridDialect in a more straight-forward way due to some >>> limitations in the way we integrate with the ORM engine. >>> >>> >>>> - I >>>> look at the OgmTableGenerator which is attempting to access a CF / >>>> table >>>> that is not yet created - I figured I understand what was happening >>>> here, >>>> and make appropriate extensions / fixes first. So, currently fighting >>>> my >>>> way through generating the sequence tables, and wondering why >>>> OgmSequnceGenerator wraps OgmtableGenerator. >>>> >>> >>> Just to be sure, are you looking at the latest master? There have been >>> some changes around these generator classes recently, they are in a much >>> cleaner state than they used to be. >>> >>> The reason for the wrapping is that when using the SEQUENCE strategy in >>> cases where the store actually does not natively support sequences, we fall >>> back to TABLE. Currently we only support a "native" SEQUENCE strategy for >>> Neo4j which allows to map sequences as nodes in a reasonable manner, >>> whereas all the other dialects use the table fallback. >>> GridDialect#supportsSequences() is evaluated to find out whether the >>> delegation needs to be done or not. >>> >>> You also could take a look at Neo4jSequenceGenerator which creates the >>> sequence nodes in the datastore based on the registered >>> PersistentNoSqlIdentifierGenerators. Note that this checks via instanceof >>> for OgmSequenceGenerator/OgmTableGenerator atm. As we don't want to expose >>> these types on the dialect SPI, I'm looking into ways for allowing the >>> distinction of the two in a more abstract way, mainly based on >>> IdSourceKeyMetadata. >>> >>> Hope that helps, I'll be very happy to answer any follow-up questions. >>> Thanks again for your help with the Cassandra dialect, I'm looking forward >>> to this dialect very much! >>> >>> >>>> >>>> Cheers, >>>> >>>> John >>>> >>> >>> --Gunnar >>> >>> >>>> >>>> >>>> On Fri, Aug 22, 2014 at 5:25 PM, Emmanuel Bernard < >>>> emmanuel(a)hibernate.org> >>>> wrote: >>>> >>>> > On Thu 2014-08-07 9:10, John Worrell wrote: >>>> > > Hi Emmanuel et al., >>>> > > >>>> > > My apologies for the log radio silence. I've taken a look at the >>>> > code-base >>>> > > on Jon Halliday's repo, and have set up a nick on freenode - >>>> #jlesinge. >>>> > >>>> > No worries I was on holidays. >>>> > And you email was the few lucky ones that I had to delay as it >>>> required >>>> > thinking ;) >>>> > >>>> > > >>>> > > On the time-series question I was wondering how you envisaged the >>>> data >>>> > > stored: I tend to think of a single row under an primary key with >>>> an >>>> > > object-instance per column. Now what we have typically done >>>> (generally >>>> > the >>>> > > data has been immutable) is to store the data serialized as a blob >>>> (JSON >>>> > or >>>> > > XML), but I understand you do not favour this approach. With this >>>> sort of >>>> > > model I imagine the collection is then all the objects stored in >>>> the row, >>>> > > and the challenge is to page through the objects in the row. >>>> > >>>> > Actually it is one of the valid strategies. >>>> > If I understand you well, you want to create: >>>> > >>>> > - one row per time series generating object (say a thermometer) >>>> > - the column names of that row would be a timestamp of time at bay >>>> > - the value would be a JSON structure containing the data at bay for >>>> > that specific time. >>>> > >>>> > That is one of the valid approach. But I think we need to support >>>> > several: >>>> > >>>> > - simple column if the data is literally a single element >>>> (temperature) >>>> > - JSON structure for more complex data per time event >>>> > - key pointing to the detailed data somewhere else in the cluster >>>> > >>>> > The latest would be done in two phases, you load all the keys you are >>>> > interested in matching your time range and then do a multiget of >>>> sort to >>>> > load the data. >>>> > >>>> > It seems datastax tends to recommend 1 or 2 (denormalization FTW). >>>> > >>>> > I don't know but there is also the notion of super column which is a >>>> > grouping of columns that might also address our composite problem >>>> > assuming they can be used for dynamic column families. >>>> > >>>> > http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra >>>> > >>>> > >>>> http://planetcassandra.org/blog/post/getting-started-with-time-series-dat... >>>> > http://www.datastax.com/docs/1.0/ddl/column_family >>>> > >>>> > > >>>> > > An approach we have often taken is to create multiple copies of >>>> data in >>>> > > different (obviously works well only for immutable objects) or >>>> better to >>>> > >>>> > Yes, that is a feature that I would like OGM to automate for the >>>> user. >>>> > It declaratively defines the denormalization approaches he wants and >>>> the >>>> > engine does the persistence. >>>> > Next the query engine uses that knowledge to find the best path (or >>>> only >>>> > possible path in the case of Cassandra :) ) >>>> > >>>> > > create a table of keys to a main table where in either approach the >>>> > > row-keys are effectively a foreign-key and there is column per >>>> object >>>> > > associated through the foreign-key. Another approach though might >>>> be to >>>> > use >>>> > > a column with type list (or set, or map) to contain keys to the >>>> > associated >>>> > > objects - this would be a little like the extensions Oracle have >>>> for >>>> > > mapping 1-* associations, though with the caveat that a column of >>>> > > collection type may only contain 64k elements. I wondered if some >>>> though >>>> > > had been given to this strategy (which I must admit I have not yet >>>> used >>>> > > myself). >>>> > >>>> > I am not aware of that approach. >>>> > >>>> > > >>>> > > It seems very likely that different mapping strategies should be >>>> > > specifiable, but then I have still to understand how these might >>>> fit with >>>> > > treiid. >>>> > >>>> > Forget Teiid for now. We will likely start with the HQL->Walker and >>>> do >>>> > our own proto query engine before layering Teiid. >>>> > >>>> > > >>>> > > Can I ask about assumptions: is it fair to assume that for >>>> Cassandra, OGM >>>> > > will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? >>>> This would >>>> > > certainly make life simpler. >>>> > >>>> > Yes that's fine. >>>> > >>>> > > >>>> > > An issue I don't see addressed is the choice of consistency-level >>>> (read >>>> > or >>>> > > write) and I wondered if there was a plan for this? Assumptions >>>> can be >>>> > made >>>> > > on a per table basis, but, certainly for ad hoc queries, it is >>>> important >>>> > > think to have the flexibility to specify on a per-query basis. >>>> > >>>> > That's planned. We have an option system that allow for entity / >>>> > property overriding of a global setting. While not implemented, we >>>> will >>>> > also have the ability to override setting per session / query. >>>> > That was the plan all along. >>>> > >>>> > > >>>> > > Those are my thoughts so far... I'll see about doing a POC of some >>>> of >>>> > what >>>> > > I have described above >>>> > >>>> > Thanks :) >>>> > >>>> > > >>>> > > Cheers, >>>> > > >>>> > > John >>>> > > >>>> > > >>>> > > On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge(a)gmail.com> >>>> > wrote: >>>> > > >>>> > > > Hi Emmanuel, >>>> > > > >>>> > > > I'll take a look at what is there, and I'll get up and running >>>> on IRC. >>>> > > > >>>> > > > I'll particularly look at the time-series issue - non-trivial I >>>> think. >>>> > > > >>>> > > > Cheers, >>>> > > > >>>> > > > John >>>> > > > >>>> > > > >>>> > > > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard < >>>> > emmanuel(a)hibernate.org> >>>> > > > wrote: >>>> > > > >>>> > > >> Hi John, >>>> > > >> >>>> > > >> I thought I had replied to you on Friday but apparently the >>>> email >>>> > never >>>> > > >> went through :/ >>>> > > >> >>>> > > >> That is good news :) >>>> > > >> Jonathan worked on a Cassandra prototype but had to drop due to >>>> other >>>> > > >> duties. He pushed everything at >>>> > > >> >>>> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra >>>> > > >> >>>> > > >> Have a look at what he has done and come ask any question to >>>> Gunnar, >>>> > > >> Davide or me. There are a bunch of moving pieces. We are mostly >>>> on >>>> > > >> freenode’s #hibernate-dev ( you need a freenode login >>>> > > >> http://freenode.net/faq.shtml#nicksetup ). If you are allergic >>>> to >>>> > IRC, >>>> > > >> let me know and we will find alternatives. >>>> > > >> >>>> > > >> The most interesting challenge will be to see how we can map >>>> time >>>> > series >>>> > > >> into a collection and make sure we let the user decide how much >>>> he >>>> > wants to >>>> > > >> load. >>>> > > >> >>>> > > >> Emmanuel >>>> > > >> >>>> > > >> On 16 Jul 2014, at 13:17, John Worrell <jlesinge(a)gmail.com> >>>> wrote: >>>> > > >> >>>> > > >> > Hi, >>>> > > >> > >>>> > > >> > I'm interested in contributing to the Cassandra module of >>>> > Hibernate-OGM >>>> > > >> - >>>> > > >> > what would be the baest way to go about this? >>>> > > >> > >>>> > > >> > Thanks, >>>> > > >> > >>>> > > >> > John >>>> > > >> > _______________________________________________ >>>> > > >> > hibernate-dev mailing list >>>> > > >> > hibernate-dev(a)lists.jboss.org >>>> > > >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev >>>> > > >> >>>> > > >> >>>> > > > >>>> > >>>> _______________________________________________ >>>> hibernate-dev mailing list >>>> hibernate-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev >>>> >>> >>> >> >

John Worrell

Thursday, 11 September Thu, 11 Sep

3:56 a.m.

Hi Gunnar (& Emmanuel), Thanks again for the info. Chugging on slowly when I get the time. The sequences are an interesting problem: C* does not supply built-in functionality to create sequences. That leaves us with an interesting problem because the standard approach of creating a sequence table would seem to hobble the "write fast" that C* users know and love. Alternatives to the use of a C* table to generate sequences then bring us face to face with the problem of generating id.s on multiple nodes (I assume here that C* is being used in a distributed environment) - we use a home-grown implementation of twitter snow-flake for this purpose. Cheers, John On Tue, Sep 9, 2014 at 1:06 PM, Gunnar Morling <gunnar(a)hibernate.org> wrote:

...

Hi, 2014-09-09 12:55 GMT+02:00 John Worrell <jlesinge(a)gmail.com>: > Hi Gunnar, > > Wrt the <class> tags - partly it is an issue with Eclipse JPA which > complains if the <class> tags are absent, but I think it *may* actually not > make any difference to the examples - the real issue lies with the code not > picking up the sequences to generate properly, and as you point out that > may now be fixed in the latest master. > To provide some more details, it's a dialect-specific implementation of the SchemaDefiner contract which is in charge of the schema initialization. The specific implementation type is to be returned from DatastoreProvider#getSchemaDefinerType(). The SchemaDefiner is invoked by the engine after session factory initialization (eventually it will only be invoked if required so by the "hbm2ddl.auto" setting). That contract is still experimental at this time, we need to flesh it out in more detail, also based on the feedback what's needed for Cassandra (as it is the first store with a fixed schema). Does Cassandra have any counterpart to physical sequences as e.g. in Oracle? If not (and it can not be emulated in a meaningful way as we do for Neo4j), GridDialect#supportsSequences() would have to return false, and the table-based strategy needs to be implemented. I'll look at a rebase. > > Thanks, > > John > Hth, --Gunnar > On Tue, Sep 9, 2014 at 10:36 AM, Gunnar Morling <gunnar(a)hibernate.org> > wrote: > >> Hi, >> >> 2014-09-09 11:08 GMT+02:00 John Worrell <jlesinge(a)gmail.com>: >> >>> Hi Gunnar, >>> >>> Many thanks for the reply - I'll yank down the master... assume it is >>> merged back to the Jon Halliday fork otherwise I'll need to mess about a >>> bit. >>> >> >> Not sure when Jon's branch was updated for the last time. >> >> Probably you need to rebase (we prefer to work with rebases rather than >> merge commits) your local branch onto the latest master from upstream. >> There have been some changes around GridDialect in the last time, mainly >> about query execution and id generation. Nothing dramatic, though. >> >> >>> Also had some issues with getting connected to C*, understandable, but >>> also wrt adding <class> tags for the Dog / Breed classes in the >>> persistence.xml file. not sure whether that is intended to be needed. >>> >> >> You mean the classes from the "Getting Started" example, right? The >> <class> tags should not be required, the example runs as is e.g. on >> Infinispan. What happens if you don't add those? >> >> Cheers, >>> >>> John >>> >> >> --Gunnar >> >> On Tue, Sep 9, 2014 at 9:59 AM, Gunnar Morling <gunnar(a)hibernate.org> >>> wrote: >>> >>>> Hi John, >>>> >>>> 2014-09-09 10:33 GMT+02:00 John Worrell <jlesinge(a)gmail.com>: >>>> >>>>> Hi Emmanuel & Gunnar, >>>>> >>>>> Many thanks for your detailed responses - and nice to chat with >>>>> Gunnar a >>>>> week or so back. Again I have to apologise for radio silence - my day >>>>> job >>>>> suddenly ate all my waking functional time - so progress has been >>>>> very slow. >>>>> >>>> >>>> No worries, we are very glad about your help. >>>> >>>> I'm getting deeper into the code now, and starting a POC... which is >>>>> leading me to some more detailed questions. Basically, what I am >>>>> doing is >>>>> to run the examples and to look at things that seem to be missing, >>>>> and toi >>>>> understand the data that is being passed around in the various options >>>>> classes, so I can make a more informed implementation >>>>> >>>> >>>> Sounds very reasonable. I also can recommend to take a look at the >>>> MongoDB dialect and the persistent representations it creates in the >>>> datastore as it can comfortably be browsed e.g. using the mongo command >>>> line client. That's how I got to understand many things of the interaction >>>> between engine and dialects. >>>> >>>> If you have any ideas where the dialect SPI documentation can be >>>> improved to facilitate an easier understanding of how pieces work together, >>>> let me know. >>>> >>>> The key question in my mind at the moment is that of the relationship >>>>> between the base Hibernate Dialect class and the GridDialect interface >>>> >>>> >>>> OGM has its own pseudo implementation of ORM's Dialect contract, >>>> OgmDialect, but this should hardly ever play a role during OGM development. >>>> OGM's main contract towards dialects is GridDialect. >>>> >>>> The reason for exposing GridDialect on the pseudo OgmDialect is that >>>> it is our backdoor to make GridDialect available to >>>> PersistentNoSqlIdentifierGenerator implementations. Atm. there is no way to >>>> inject the GridDialect in a more straight-forward way due to some >>>> limitations in the way we integrate with the ORM engine. >>>> >>>> >>>>> - I >>>>> look at the OgmTableGenerator which is attempting to access a CF / >>>>> table >>>>> that is not yet created - I figured I understand what was happening >>>>> here, >>>>> and make appropriate extensions / fixes first. So, currently fighting >>>>> my >>>>> way through generating the sequence tables, and wondering why >>>>> OgmSequnceGenerator wraps OgmtableGenerator. >>>>> >>>> >>>> Just to be sure, are you looking at the latest master? There have been >>>> some changes around these generator classes recently, they are in a much >>>> cleaner state than they used to be. >>>> >>>> The reason for the wrapping is that when using the SEQUENCE strategy >>>> in cases where the store actually does not natively support sequences, we >>>> fall back to TABLE. Currently we only support a "native" SEQUENCE strategy >>>> for Neo4j which allows to map sequences as nodes in a reasonable manner, >>>> whereas all the other dialects use the table fallback. >>>> GridDialect#supportsSequences() is evaluated to find out whether the >>>> delegation needs to be done or not. >>>> >>>> You also could take a look at Neo4jSequenceGenerator which creates the >>>> sequence nodes in the datastore based on the registered >>>> PersistentNoSqlIdentifierGenerators. Note that this checks via instanceof >>>> for OgmSequenceGenerator/OgmTableGenerator atm. As we don't want to expose >>>> these types on the dialect SPI, I'm looking into ways for allowing the >>>> distinction of the two in a more abstract way, mainly based on >>>> IdSourceKeyMetadata. >>>> >>>> Hope that helps, I'll be very happy to answer any follow-up questions. >>>> Thanks again for your help with the Cassandra dialect, I'm looking forward >>>> to this dialect very much! >>>> >>>> >>>>> >>>>> Cheers, >>>>> >>>>> John >>>>> >>>> >>>> --Gunnar >>>> >>>> >>>>> >>>>> >>>>> On Fri, Aug 22, 2014 at 5:25 PM, Emmanuel Bernard < >>>>> emmanuel(a)hibernate.org> >>>>> wrote: >>>>> >>>>> > On Thu 2014-08-07 9:10, John Worrell wrote: >>>>> > > Hi Emmanuel et al., >>>>> > > >>>>> > > My apologies for the log radio silence. I've taken a look at the >>>>> > code-base >>>>> > > on Jon Halliday's repo, and have set up a nick on freenode - >>>>> #jlesinge. >>>>> > >>>>> > No worries I was on holidays. >>>>> > And you email was the few lucky ones that I had to delay as it >>>>> required >>>>> > thinking ;) >>>>> > >>>>> > > >>>>> > > On the time-series question I was wondering how you envisaged the >>>>> data >>>>> > > stored: I tend to think of a single row under an primary key with >>>>> an >>>>> > > object-instance per column. Now what we have typically done >>>>> (generally >>>>> > the >>>>> > > data has been immutable) is to store the data serialized as a >>>>> blob (JSON >>>>> > or >>>>> > > XML), but I understand you do not favour this approach. With this >>>>> sort of >>>>> > > model I imagine the collection is then all the objects stored in >>>>> the row, >>>>> > > and the challenge is to page through the objects in the row. >>>>> > >>>>> > Actually it is one of the valid strategies. >>>>> > If I understand you well, you want to create: >>>>> > >>>>> > - one row per time series generating object (say a thermometer) >>>>> > - the column names of that row would be a timestamp of time at bay >>>>> > - the value would be a JSON structure containing the data at bay for >>>>> > that specific time. >>>>> > >>>>> > That is one of the valid approach. But I think we need to support >>>>> > several: >>>>> > >>>>> > - simple column if the data is literally a single element >>>>> (temperature) >>>>> > - JSON structure for more complex data per time event >>>>> > - key pointing to the detailed data somewhere else in the cluster >>>>> > >>>>> > The latest would be done in two phases, you load all the keys you >>>>> are >>>>> > interested in matching your time range and then do a multiget of >>>>> sort to >>>>> > load the data. >>>>> > >>>>> > It seems datastax tends to recommend 1 or 2 (denormalization FTW). >>>>> > >>>>> > I don't know but there is also the notion of super column which is a >>>>> > grouping of columns that might also address our composite problem >>>>> > assuming they can be used for dynamic column families. >>>>> > >>>>> > >>>>> http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra >>>>> > >>>>> > >>>>> http://planetcassandra.org/blog/post/getting-started-with-time-series-dat... >>>>> > http://www.datastax.com/docs/1.0/ddl/column_family >>>>> > >>>>> > > >>>>> > > An approach we have often taken is to create multiple copies of >>>>> data in >>>>> > > different (obviously works well only for immutable objects) or >>>>> better to >>>>> > >>>>> > Yes, that is a feature that I would like OGM to automate for the >>>>> user. >>>>> > It declaratively defines the denormalization approaches he wants >>>>> and the >>>>> > engine does the persistence. >>>>> > Next the query engine uses that knowledge to find the best path (or >>>>> only >>>>> > possible path in the case of Cassandra :) ) >>>>> > >>>>> > > create a table of keys to a main table where in either approach >>>>> the >>>>> > > row-keys are effectively a foreign-key and there is column per >>>>> object >>>>> > > associated through the foreign-key. Another approach though might >>>>> be to >>>>> > use >>>>> > > a column with type list (or set, or map) to contain keys to the >>>>> > associated >>>>> > > objects - this would be a little like the extensions Oracle have >>>>> for >>>>> > > mapping 1-* associations, though with the caveat that a column of >>>>> > > collection type may only contain 64k elements. I wondered if some >>>>> though >>>>> > > had been given to this strategy (which I must admit I have not >>>>> yet used >>>>> > > myself). >>>>> > >>>>> > I am not aware of that approach. >>>>> > >>>>> > > >>>>> > > It seems very likely that different mapping strategies should be >>>>> > > specifiable, but then I have still to understand how these might >>>>> fit with >>>>> > > treiid. >>>>> > >>>>> > Forget Teiid for now. We will likely start with the HQL->Walker and >>>>> do >>>>> > our own proto query engine before layering Teiid. >>>>> > >>>>> > > >>>>> > > Can I ask about assumptions: is it fair to assume that for >>>>> Cassandra, OGM >>>>> > > will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? >>>>> This would >>>>> > > certainly make life simpler. >>>>> > >>>>> > Yes that's fine. >>>>> > >>>>> > > >>>>> > > An issue I don't see addressed is the choice of consistency-level >>>>> (read >>>>> > or >>>>> > > write) and I wondered if there was a plan for this? Assumptions >>>>> can be >>>>> > made >>>>> > > on a per table basis, but, certainly for ad hoc queries, it is >>>>> important >>>>> > > think to have the flexibility to specify on a per-query basis. >>>>> > >>>>> > That's planned. We have an option system that allow for entity / >>>>> > property overriding of a global setting. While not implemented, we >>>>> will >>>>> > also have the ability to override setting per session / query. >>>>> > That was the plan all along. >>>>> > >>>>> > > >>>>> > > Those are my thoughts so far... I'll see about doing a POC of >>>>> some of >>>>> > what >>>>> > > I have described above >>>>> > >>>>> > Thanks :) >>>>> > >>>>> > > >>>>> > > Cheers, >>>>> > > >>>>> > > John >>>>> > > >>>>> > > >>>>> > > On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge(a)gmail.com >>>>> > >>>>> > wrote: >>>>> > > >>>>> > > > Hi Emmanuel, >>>>> > > > >>>>> > > > I'll take a look at what is there, and I'll get up and running >>>>> on IRC. >>>>> > > > >>>>> > > > I'll particularly look at the time-series issue - non-trivial I >>>>> think. >>>>> > > > >>>>> > > > Cheers, >>>>> > > > >>>>> > > > John >>>>> > > > >>>>> > > > >>>>> > > > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard < >>>>> > emmanuel(a)hibernate.org> >>>>> > > > wrote: >>>>> > > > >>>>> > > >> Hi John, >>>>> > > >> >>>>> > > >> I thought I had replied to you on Friday but apparently the >>>>> email >>>>> > never >>>>> > > >> went through :/ >>>>> > > >> >>>>> > > >> That is good news :) >>>>> > > >> Jonathan worked on a Cassandra prototype but had to drop due >>>>> to other >>>>> > > >> duties. He pushed everything at >>>>> > > >> >>>>> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra >>>>> > > >> >>>>> > > >> Have a look at what he has done and come ask any question to >>>>> Gunnar, >>>>> > > >> Davide or me. There are a bunch of moving pieces. We are >>>>> mostly on >>>>> > > >> freenode’s #hibernate-dev ( you need a freenode login >>>>> > > >> http://freenode.net/faq.shtml#nicksetup ). If you are >>>>> allergic to >>>>> > IRC, >>>>> > > >> let me know and we will find alternatives. >>>>> > > >> >>>>> > > >> The most interesting challenge will be to see how we can map >>>>> time >>>>> > series >>>>> > > >> into a collection and make sure we let the user decide how >>>>> much he >>>>> > wants to >>>>> > > >> load. >>>>> > > >> >>>>> > > >> Emmanuel >>>>> > > >> >>>>> > > >> On 16 Jul 2014, at 13:17, John Worrell <jlesinge(a)gmail.com> >>>>> wrote: >>>>> > > >> >>>>> > > >> > Hi, >>>>> > > >> > >>>>> > > >> > I'm interested in contributing to the Cassandra module of >>>>> > Hibernate-OGM >>>>> > > >> - >>>>> > > >> > what would be the baest way to go about this? >>>>> > > >> > >>>>> > > >> > Thanks, >>>>> > > >> > >>>>> > > >> > John >>>>> > > >> > _______________________________________________ >>>>> > > >> > hibernate-dev mailing list >>>>> > > >> > hibernate-dev(a)lists.jboss.org >>>>> > > >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev >>>>> > > >> >>>>> > > >> >>>>> > > > >>>>> > >>>>> _______________________________________________ >>>>> hibernate-dev mailing list >>>>> hibernate-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev >>>>> >>>> >>>> >>> >> >

4342

days inactive

4399

days old

hibernate-dev@lists.jboss.org

Manage subscription

12 comments

3 participants

tags (0)

participants (3)

Emmanuel Bernard
Gunnar Morling
John Worrell

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Contributing to OGM / Cassandra