Hi John,
First off, sorry again for the late response.
2014-08-07 10:10 GMT+02:00 John Worrell <jlesinge(a)gmail.com>:
Hi Emmanuel et al.,
My apologies for the log radio silence. I've taken a look at the code-base
on Jon Halliday's repo, and have set up a nick on freenode - #jlesinge.
On the time-series question I was wondering how you envisaged the data
stored: I tend to think of a single row under an primary key with an
object-instance per column. Now what we have typically done (generally the
data has been immutable) is to store the data serialized as a blob (JSON or
XML), but I understand you do not favour this approach. With this sort of
model I imagine the collection is then all the objects stored in the row,
and the challenge is to page through the objects in the row.
I cannot really comment on the time-series question, I'll leave that to
Emmanuel.
You're right though that data should not be stored as BLOBs or any other
"non-natural" representation. Querying and interaction with other
applications using the same store would then be a problem.
An approach we have often taken is to create multiple copies of data in
different (obviously works well only for immutable objects) or better to
create a table of keys to a main table where in either approach the
row-keys are effectively a foreign-key and there is column per object
associated through the foreign-key.
Could you maybe give an example for how this would look like?
Another approach though might be to use
a column with type list (or set, or map) to contain keys to the associated
objects - this would be a little like the extensions Oracle have for
mapping 1-* associations, though with the caveat that a column of
collection type may only contain 64k elements. I wondered if some though
had been given to this strategy (which I must admit I have not yet used
myself).
A very good question, unfortunately my knowledge of data modeling with
Cassandra is still a bit limited.
Storing "foreign keys" in collection columns seems like a good idea. It's
somewhat similar to the "in entity" mode we have for MongoDB. Do list
columns support null values? I think we'd need that for ordered collections
containing nulls. Another question is how to deal with compound map keys.
For the document stores (MongoDB, CouchDB) we offer an alternative
"association document" mode which persists association information not
within the referencing entity but within separate entity documents,
circumventing similar issues with the max size of documents. IIUC, that's
somewhat similar to the first mode you describe. It might make sense to
support both modes in a similar fashion for Cassandra, configurable per
association.
Out of interest, how are associations handled in the branch created by
Jonathan?
What concerns de-normalization, some thoughts have been made, it's planned
for the 4.2 release at this point.
It seems very likely that different mapping strategies should be
specifiable, but then I have still to understand how these might fit
with
treiid.
What is "treiid"? Do you mean Teiid (
http://teiid.jboss.org/)?
+1 for making different strategies configurable where it makes sense.
That's what we do for other stores as well. You might want to have a look
at the AssociationStorage option. Currently that's specific to document
stores, but it might make sense to further generify it.
Can I ask about assumptions: is it fair to assume that for Cassandra, OGM
will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? This would
certainly make life simpler.
Yes, I think that's fair to assume. We still can add support for earlier
versions later on, should there be the need for it.
An issue I don't see addressed is the choice of consistency-level (read or
write) and I wondered if there was a plan for this? Assumptions can be made
on a per table basis, but, certainly for ad hoc queries, it is important
think to have the flexibility to specify on a per-query basis.
Configuring it on a per-table basis seems sensible. You can have a look at
how we do it for MongoDB (read preference, write concern) [1]. There is a
generic option mechanism which allows to add store-specific options and let
the user configure them via annotations or API, globally, per entity or per
property.
Specifying options per query is still an open issue. We plan to support
options specific to one Session [2] which will override the otherwise
defined settings, but per operation is something different yet. The main
challenge is that the existing APIs (createQuery() etc.) don't accept any
additional context, so we need to find a way to establish such option
context valid to one operation somehow.
Those are my thoughts so far... I'll see about doing a POC of
some of what
I have described above
Awesome. Looking forward to it very much.
If you want to discuss anything specific in the code, just let us know. If
you like, you also can send an "early review" pull request as the basis for
discussion of the general approach.
Do you have any branch newer than the original one from Jonathan already on
GitHub? I could then take a look to make myself acquainted with the current
state.
Cheers,
John
Many thanks for your help and with best regards,
--Gunnar
[1]
https://docs.jboss.org/hibernate/ogm/4.1/reference/en-US/html_single/#_co...
[2]
https://hibernate.atlassian.net/browse/OGM-343
On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesinge(a)gmail.com> wrote:
> Hi Emmanuel,
>
> I'll take a look at what is there, and I'll get up and running on IRC.
>
> I'll particularly look at the time-series issue - non-trivial I think.
>
> Cheers,
>
> John
>
>
> On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard <
emmanuel(a)hibernate.org>
> wrote:
>
>> Hi John,
>>
>> I thought I had replied to you on Friday but apparently the email never
>> went through :/
>>
>> That is good news :)
>> Jonathan worked on a Cassandra prototype but had to drop due to other
>> duties. He pushed everything at
>>
https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra
>>
>> Have a look at what he has done and come ask any question to Gunnar,
>> Davide or me. There are a bunch of moving pieces. We are mostly on
>> freenode’s #hibernate-dev ( you need a freenode login
>>
http://freenode.net/faq.shtml#nicksetup ). If you are allergic to IRC,
>> let me know and we will find alternatives.
>>
>> The most interesting challenge will be to see how we can map time series
>> into a collection and make sure we let the user decide how much he
wants to
>> load.
>>
>> Emmanuel
>>
>> On 16 Jul 2014, at 13:17, John Worrell <jlesinge(a)gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I'm interested in contributing to the Cassandra module of
Hibernate-OGM
>> -
>> > what would be the baest way to go about this?
>> >
>> > Thanks,
>> >
>> > John
>> > _______________________________________________
>> > hibernate-dev mailing list
>> > hibernate-dev(a)lists.jboss.org
>> >
https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>
>>
>
_______________________________________________
hibernate-dev mailing list
hibernate-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev