[hibernate-dev] [OGM] Mapping of component types in document stores

Tue Jul 19 13:14:54 EDT 2016

Assuming we switch to the new mapping approach by default in 6.
Is there a way via mapping to use the old approach? Or would that
require some new annotation?

I understand the evolution arguments but frankly in a plain mongodb
document, I would have skipped the single property name. It's forced
upon me by Java here (at least the Enum case).

On Tue 2016-07-12 13:35, Guillaume Smet wrote:
> Hi,
> 
> For the sake of completeness, here is the mapping obtained with Morphia:
> { "_id" : ObjectId("5784ca2612d0226cb309666d"), "className" :
> "TestEntity", "embeddeds" : [ { "singleProperty" : "value1" }, {
> "singleProperty" : "value2" } ], "embedded" : { "singleProperty" :
> "value" }, "collectionOfStrings" : [ "string1", "string2" ] }
> They are basically following the POLA and follow the Java mapping for
> the MongoDB one.
> 
> Btw, to be complete, here are the reasons why I would like to change
> it (I agree we have to maintain compatibility with older databases
> but, as Sanne, I think it should be the new default):
> 1/ POLA: I would expect my datastore mapping to follow my Java mapping
> 2/ related to 1/: I wouldn't expect to have to migrate my data when I
> simply add a property to an existing embeddable
> 3/ remove special cases in our code, especially special cases present
> in the dialects
> 4/ I don't think we are completely consistent with this behavior.
> Typically, if I take StoryGame from our tests and remove all the
> properties but one from OptionalStoryBranch, I end up with the
> following:
> - in the datastore: "chaoticBranches" : [ "[VENDETTA] assassinate the
> leader of the party", "[ARTIFACT] Search for the evil artifact" ] -
> this is what we expect, only one property, we remove the property
> level
> - in the native query generated by our JPA query "FROM StoryGame story
> JOIN story.chaoticBranches c WHERE c.evilText = '[ARTIFACT] Search for
> the evil artifact'": where={ "chaoticBranches.evilText" : "[ARTIFACT]
> Search for the evil artifact"}
> -> so our JPQL queries don't work if we only have one property in the
> embedded. We might also want to special case this but I really don't
> think it's a good idea.
> 
> While this discussion might seem to come out of the blue, it's in fact
> related to OGM-893 and another special casing we do. See my comment
> here: https://hibernate.atlassian.net/browse/OGM-893?focusedCommentId=79245&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-79245
> . The mapping is changing when we add a @Column with a name for a
> property of an embedded in a collection element.
> 
> -- 
> Guillaume
> 
> On Tue, Jul 12, 2016 at 12:18 PM, Sanne Grinovero <sanne at hibernate.org> wrote:
> > On 12 July 2016 at 11:13, Gunnar Morling <gunnar at hibernate.org> wrote:
> >>> I'd be concerned about schema evolution:
> >>
> >> Yes, that's the main argument; as said, I can see that.
> >>
> >>> I'd see more value in making this the default, and have an "higher
> >>> level" configuration property which is like "read like OGM 5.0 used to
> >>> store it".
> >>
> >> I wouldn't like changing such default in a 5.x release. For 6, ok, why not,
> >> if you all think that's better.
> >
> > ok
> >
> >>
> >>> Even better, we'd provide tooling which migrates an existing database.
> >>
> >> Sure, migration support is on the roadmap ;)
> >>
> >>
> >>
> >>
> >>
> >> 2016-07-12 11:06 GMT+01:00 Sanne Grinovero <sanne at hibernate.org>:
> >>>
> >>> On 12 July 2016 at 10:55, Gunnar Morling <gunnar at hibernate.org> wrote:
> >>> > Hi,
> >>> >
> >>> > We had an interesting discussion on how to map element collections of
> >>> > component types with a single column to document stores such as MongoDB.
> >>> >
> >>> > E.g. assume we have
> >>> >
> >>> >     @Entity
> >>> >     public class Person {
> >>> >
> >>> >         public String name;
> >>> >
> >>> >         @ElementCollection
> >>> >         public List<Status> statusHistory;
> >>> >     }
> >>> >
> >>> >     @Embeddable
> >>> >     public class Status {
> >>> >         public String name;
> >>> >     }
> >>> >
> >>> >
> >>> > Currently, that's mapped to documents like this:
> >>> >
> >>> >     {
> >>> >         "name"  : "Bob";
> >>> >         "statusHistory" : [
> >>> >             "great",
> >>> >             "mediocre",
> >>> >             "splendid"
> >>> >         ]
> >>> >     }
> >>>
> >>> "great", "mediocre", etc.. are values of the `name` property?
> >>>
> >>> >
> >>> > I.e. if the component type has a single column, we omit the field name
> >>> > in
> >>> > the persistent structure. Whereas if there are multiple columns, it's
> >>> > added
> >>> > so we can properly read back such documents:
> >>> >
> >>> >
> >>> >     {
> >>> >         "name"  : "Bob";
> >>> >         "statusHistory" : [
> >>> >             { "name" : "great", "date" : "22.06.2016" },
> >>> >             { "name" : "mediocre", "date" : "15.05.2016" },
> >>> >             { "name" : "splendid", "date" : "12.04.2016" }
> >>> >         ]
> >>> >     }
> >>> >
> >>> > The question now is, should we also create such array of sub-documents,
> >>> > each containing the field name, in the case where there only is a single
> >>> > column. As far as I remember, the current structure has been chosen for
> >>> > the
> >>> > sake of efficiency but also simplicity (why deal with sub-documents if
> >>> > there only is a single field?).
> >>> >
> >>> > Guillaume is questioning the sanity of that, arguing that mapping this
> >>> > as
> >>> > an element collection of a component type rather than string should
> >>> > mandate
> >>> > the persistent structure to always contain the field name.
> >>>
> >>> I agree, but maybe for other reasons.
> >>> I'd be concerned about schema evolution: if I add a new attribute to
> >>> the `Status` class, say a "long timestampOfChance" for the sake of the
> >>> example,
> >>> as a developer I might want to consider this a nullable value as I'm
> >>> aware that my existing database didn't define this property so far.
> >>>
> >>> I wouldn't be happy to see failures on loading existing stored values
> >>> for Status#name : such mapping choices have to be very consistent.
> >>>
> >>> >
> >>> > We cannot change the default as we are committed to the MongoDB format,
> >>> > but
> >>> > if there is agreement that it's useful, we could add an option to enable
> >>> > this mapping.
> >>>
> >>> So many mapping options :-/
> >>>
> >>> I'd see more value in making this the default, and have an "higher
> >>> level" configuration property which is like "read like OGM 5.0 used to
> >>> store it".
> >>> Even better, we'd provide tooling which migrates an existing database.
> >>>
> >>> >
> >>> > I kind of see how this format simplifies migration (in case another
> >>> > field
> >>> > is added after a while), but personally I still like the more compact
> >>> > looks
> >>> > of the current approach. Having an option for it works for me.
> >>> >
> >>> > Any thoughts?
> >>> >
> >>> > --Gunnar
> >>> > _______________________________________________
> >>> > hibernate-dev mailing list
> >>> > hibernate-dev at lists.jboss.org
> >>> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
> >>
> >>
> > _______________________________________________
> > hibernate-dev mailing list
> > hibernate-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev