From pmuir at redhat.com Wed Apr 2 12:05:13 2014 From: pmuir at redhat.com (Pete Muir) Date: Wed, 2 Apr 2014 17:05:13 +0100 Subject: [searchisko-dev] How to represent this as a query Message-ID: Hi, I have this piece of JSON, which maps to a contributor profile on the DCP: {:displayName=>"Pete Muir", :accounts=>[{"username"=>"pmuir", "domain"=>"vimeo.com"}]} How do I represent this as a query? I?m not sure how to represent the nested structure for the best... Thanks Pete From lvlcek at redhat.com Thu Apr 3 09:31:15 2014 From: lvlcek at redhat.com (Lukas Vlcek) Date: Thu, 3 Apr 2014 09:31:15 -0400 (EDT) Subject: [searchisko-dev] How to represent this as a query In-Reply-To: References: Message-ID: <901157169.2358415.1396531875208.JavaMail.zimbra@redhat.com> Hi Pete, You can use: accounts.username:pmuir AND accounts.domain:"vimeo.com" but there are some gotchas: Imagine the following document: { displayName: "Pere Muir", accounts: [ {username: "pmuir", domain: "vimeo.com"}, {username: "petemuir", domain: "youtube.com"} ] } When this is indexed then it is transformed to the following document (at Lucene level): { displayName: "Pete Muir", accounts.username: ["pmuir","petemuir"], accounts.domain: ["vimeo.com","youtube.com"] } Both 'accounts.username' and 'accounts.domain' is name for a field name - it just contains a dot, there is no notion of "nestedness" in it at Lucene level. At this point you can notice that we lost connection between username and domain. We do not know which username belong to which domain. This means we can have query: accounts.username:petemuir AND accounts.domain:"vimeo.com" and it would result into the same document in spite of the fact that was no account for vimeo.com having username petemuir. So the question is if this is a problem for you now or not. I can imagine that you might be fine with how it works for now - it depends what you exactly need to retrieve from the DCP. However, going forward we can expose a query that would honour the "nestedness" because Elasticsearch provides it [1]. Let me know if you have more questions. Regards, Lukas Vlcek jboss.org Development Team [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-nested-type.html#mapping-nested-type ----- Original Message ----- > Hi, > > I have this piece of JSON, which maps to a contributor profile on the DCP: > > {:displayName=>"Pete Muir", :accounts=>[{"username"=>"pmuir", > "domain"=>"vimeo.com"}]} > > How do I represent this as a query? I?m not sure how to represent the nested > structure for the best... > > Thanks > > Pete From pmuir at redhat.com Thu Apr 3 09:33:45 2014 From: pmuir at redhat.com (Pete Muir) Date: Thu, 3 Apr 2014 14:33:45 +0100 Subject: [searchisko-dev] How to represent this as a query In-Reply-To: <901157169.2358415.1396531875208.JavaMail.zimbra@redhat.com> References: <901157169.2358415.1396531875208.JavaMail.zimbra@redhat.com> Message-ID: <2C5E6588-F8A5-4296-8960-D684B35F45B5@redhat.com> This should be fine for now, but it?s going to produce bugs before too long. On 3 Apr 2014, at 14:31, Lukas Vlcek wrote: > Hi Pete, > > You can use: > accounts.username:pmuir AND accounts.domain:"vimeo.com" > > but there are some gotchas: > > Imagine the following document: > > { > displayName: "Pere Muir", > accounts: [ > {username: "pmuir", domain: "vimeo.com"}, > {username: "petemuir", domain: "youtube.com"} > ] > } > > When this is indexed then it is transformed to the following document (at Lucene level): > > { > displayName: "Pete Muir", > accounts.username: ["pmuir","petemuir"], > accounts.domain: ["vimeo.com","youtube.com"] > } > > Both 'accounts.username' and 'accounts.domain' is name for a field name - it just contains a dot, there is no notion of "nestedness" in it at Lucene level. > At this point you can notice that we lost connection between username and domain. We do not know which username belong to which domain. > This means we can have query: > > accounts.username:petemuir AND accounts.domain:"vimeo.com" > > and it would result into the same document in spite of the fact that was no account for vimeo.com having username petemuir. > > So the question is if this is a problem for you now or not. I can imagine that you might be fine with how it works for now - it depends what you exactly need to retrieve from the DCP. However, going forward we can expose a query that would honour the "nestedness" because Elasticsearch provides it [1]. > > Let me know if you have more questions. > > Regards, > Lukas Vlcek > jboss.org Development Team > > [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-nested-type.html#mapping-nested-type > > ----- Original Message ----- >> Hi, >> >> I have this piece of JSON, which maps to a contributor profile on the DCP: >> >> {:displayName=>"Pete Muir", :accounts=>[{"username"=>"pmuir", >> "domain"=>"vimeo.com"}]} >> >> How do I represent this as a query? I?m not sure how to represent the nested >> structure for the best... >> >> Thanks >> >> Pete From lvlcek at redhat.com Thu Apr 3 09:41:22 2014 From: lvlcek at redhat.com (Lukas Vlcek) Date: Thu, 3 Apr 2014 09:41:22 -0400 (EDT) Subject: [searchisko-dev] How to represent this as a query In-Reply-To: <2C5E6588-F8A5-4296-8960-D684B35F45B5@redhat.com> References: <901157169.2358415.1396531875208.JavaMail.zimbra@redhat.com> <2C5E6588-F8A5-4296-8960-D684B35F45B5@redhat.com> Message-ID: <208652147.2361796.1396532482957.JavaMail.zimbra@redhat.com> Sure, it is not generally good advice for now, however, as long as we can assume not too messy accounts data for contributor profiles it should work relatively fine. This should however be improved going forward. We are looking for options how to allow users to run more specific queries. But this will get priority later. Regards, Lukas Vlcek jboss.org Development Team ----- Original Message ----- > This should be fine for now, but it?s going to produce bugs before too long. > > On 3 Apr 2014, at 14:31, Lukas Vlcek wrote: > > > Hi Pete, > > > > You can use: > > accounts.username:pmuir AND accounts.domain:"vimeo.com" > > > > but there are some gotchas: > > > > Imagine the following document: > > > > { > > displayName: "Pere Muir", > > accounts: [ > > {username: "pmuir", domain: "vimeo.com"}, > > {username: "petemuir", domain: "youtube.com"} > > ] > > } > > > > When this is indexed then it is transformed to the following document (at > > Lucene level): > > > > { > > displayName: "Pete Muir", > > accounts.username: ["pmuir","petemuir"], > > accounts.domain: ["vimeo.com","youtube.com"] > > } > > > > Both 'accounts.username' and 'accounts.domain' is name for a field name - > > it just contains a dot, there is no notion of "nestedness" in it at Lucene > > level. > > At this point you can notice that we lost connection between username and > > domain. We do not know which username belong to which domain. > > This means we can have query: > > > > accounts.username:petemuir AND accounts.domain:"vimeo.com" > > > > and it would result into the same document in spite of the fact that was no > > account for vimeo.com having username petemuir. > > > > So the question is if this is a problem for you now or not. I can imagine > > that you might be fine with how it works for now - it depends what you > > exactly need to retrieve from the DCP. However, going forward we can > > expose a query that would honour the "nestedness" because Elasticsearch > > provides it [1]. > > > > Let me know if you have more questions. > > > > Regards, > > Lukas Vlcek > > jboss.org Development Team > > > > [1] > > http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-nested-type.html#mapping-nested-type > > > > ----- Original Message ----- > >> Hi, > >> > >> I have this piece of JSON, which maps to a contributor profile on the DCP: > >> > >> {:displayName=>"Pete Muir", :accounts=>[{"username"=>"pmuir", > >> "domain"=>"vimeo.com"}]} > >> > >> How do I represent this as a query? I?m not sure how to represent the > >> nested > >> structure for the best... > >> > >> Thanks > >> > >> Pete > >