[hibernate-dev] [OGM] Mapping of component types in document stores

Tue Jul 12 07:35:16 EDT 2016

Hi,

For the sake of completeness, here is the mapping obtained with Morphia:
{ "_id" : ObjectId("5784ca2612d0226cb309666d"), "className" :
"TestEntity", "embeddeds" : [ { "singleProperty" : "value1" }, {
"singleProperty" : "value2" } ], "embedded" : { "singleProperty" :
"value" }, "collectionOfStrings" : [ "string1", "string2" ] }
They are basically following the POLA and follow the Java mapping for
the MongoDB one.

Btw, to be complete, here are the reasons why I would like to change
it (I agree we have to maintain compatibility with older databases
but, as Sanne, I think it should be the new default):
1/ POLA: I would expect my datastore mapping to follow my Java mapping
2/ related to 1/: I wouldn't expect to have to migrate my data when I
simply add a property to an existing embeddable
3/ remove special cases in our code, especially special cases present
in the dialects
4/ I don't think we are completely consistent with this behavior.
Typically, if I take StoryGame from our tests and remove all the
properties but one from OptionalStoryBranch, I end up with the
following:
- in the datastore: "chaoticBranches" : [ "[VENDETTA] assassinate the
leader of the party", "[ARTIFACT] Search for the evil artifact" ] -
this is what we expect, only one property, we remove the property
level
- in the native query generated by our JPA query "FROM StoryGame story
JOIN story.chaoticBranches c WHERE c.evilText = '[ARTIFACT] Search for
the evil artifact'": where={ "chaoticBranches.evilText" : "[ARTIFACT]
Search for the evil artifact"}
-> so our JPQL queries don't work if we only have one property in the
embedded. We might also want to special case this but I really don't
think it's a good idea.

While this discussion might seem to come out of the blue, it's in fact
related to OGM-893 and another special casing we do. See my comment
here: https://hibernate.atlassian.net/browse/OGM-893?focusedCommentId=79245&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-79245
. The mapping is changing when we add a @Column with a name for a
property of an embedded in a collection element.

-- 
Guillaume

On Tue, Jul 12, 2016 at 12:18 PM, Sanne Grinovero <sanne at hibernate.org> wrote:
> On 12 July 2016 at 11:13, Gunnar Morling <gunnar at hibernate.org> wrote:
>>> I'd be concerned about schema evolution:
>>
>> Yes, that's the main argument; as said, I can see that.
>>
>>> I'd see more value in making this the default, and have an "higher
>>> level" configuration property which is like "read like OGM 5.0 used to
>>> store it".
>>
>> I wouldn't like changing such default in a 5.x release. For 6, ok, why not,
>> if you all think that's better.
>
> ok
>
>>
>>> Even better, we'd provide tooling which migrates an existing database.
>>
>> Sure, migration support is on the roadmap ;)
>>
>>
>>
>>
>>
>> 2016-07-12 11:06 GMT+01:00 Sanne Grinovero <sanne at hibernate.org>:
>>>
>>> On 12 July 2016 at 10:55, Gunnar Morling <gunnar at hibernate.org> wrote:
>>> > Hi,
>>> >
>>> > We had an interesting discussion on how to map element collections of
>>> > component types with a single column to document stores such as MongoDB.
>>> >
>>> > E.g. assume we have
>>> >
>>> >     @Entity
>>> >     public class Person {
>>> >
>>> >         public String name;
>>> >
>>> >         @ElementCollection
>>> >         public List<Status> statusHistory;
>>> >     }
>>> >
>>> >     @Embeddable
>>> >     public class Status {
>>> >         public String name;
>>> >     }
>>> >
>>> >
>>> > Currently, that's mapped to documents like this:
>>> >
>>> >     {
>>> >         "name"  : "Bob";
>>> >         "statusHistory" : [
>>> >             "great",
>>> >             "mediocre",
>>> >             "splendid"
>>> >         ]
>>> >     }
>>>
>>> "great", "mediocre", etc.. are values of the `name` property?
>>>
>>> >
>>> > I.e. if the component type has a single column, we omit the field name
>>> > in
>>> > the persistent structure. Whereas if there are multiple columns, it's
>>> > added
>>> > so we can properly read back such documents:
>>> >
>>> >
>>> >     {
>>> >         "name"  : "Bob";
>>> >         "statusHistory" : [
>>> >             { "name" : "great", "date" : "22.06.2016" },
>>> >             { "name" : "mediocre", "date" : "15.05.2016" },
>>> >             { "name" : "splendid", "date" : "12.04.2016" }
>>> >         ]
>>> >     }
>>> >
>>> > The question now is, should we also create such array of sub-documents,
>>> > each containing the field name, in the case where there only is a single
>>> > column. As far as I remember, the current structure has been chosen for
>>> > the
>>> > sake of efficiency but also simplicity (why deal with sub-documents if
>>> > there only is a single field?).
>>> >
>>> > Guillaume is questioning the sanity of that, arguing that mapping this
>>> > as
>>> > an element collection of a component type rather than string should
>>> > mandate
>>> > the persistent structure to always contain the field name.
>>>
>>> I agree, but maybe for other reasons.
>>> I'd be concerned about schema evolution: if I add a new attribute to
>>> the `Status` class, say a "long timestampOfChance" for the sake of the
>>> example,
>>> as a developer I might want to consider this a nullable value as I'm
>>> aware that my existing database didn't define this property so far.
>>>
>>> I wouldn't be happy to see failures on loading existing stored values
>>> for Status#name : such mapping choices have to be very consistent.
>>>
>>> >
>>> > We cannot change the default as we are committed to the MongoDB format,
>>> > but
>>> > if there is agreement that it's useful, we could add an option to enable
>>> > this mapping.
>>>
>>> So many mapping options :-/
>>>
>>> I'd see more value in making this the default, and have an "higher
>>> level" configuration property which is like "read like OGM 5.0 used to
>>> store it".
>>> Even better, we'd provide tooling which migrates an existing database.
>>>
>>> >
>>> > I kind of see how this format simplifies migration (in case another
>>> > field
>>> > is added after a while), but personally I still like the more compact
>>> > looks
>>> > of the current approach. Having an option for it works for me.
>>> >
>>> > Any thoughts?
>>> >
>>> > --Gunnar
>>> > _______________________________________________
>>> > hibernate-dev mailing list
>>> > hibernate-dev at lists.jboss.org
>>> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>
>>
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev