<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0">
<base href="https://hibernate.atlassian.net">
<title>Message Title</title>
</head>
<body class="jira" style="color: #333333; font-family: Arial, sans-serif; font-size: 14px; line-height: 1.429">
<table id="background-table" cellpadding="0" cellspacing="0" width="100%" style="border-collapse: collapse; mso-table-lspace: 0; mso-table-rspace: 0; background-color: #f5f5f5; border-collapse: collapse; mso-table-lspace: 0; mso-table-rspace: 0" bgcolor="#f5f5f5">
<!-- header here -->
<tbody>
<tr>
<td id="header-pattern-container" style="padding: 0; border-collapse: collapse; padding: 10px 20px">
<table id="header-pattern" cellspacing="0" cellpadding="0" border="0" style="border-collapse: collapse; mso-table-lspace: 0; mso-table-rspace: 0">
<tbody>
<tr>
<td id="header-avatar-image-container" valign="top" style="padding: 0; border-collapse: collapse; vertical-align: top; width: 32px; padding-right: 8px" width="32"> <img id="header-avatar-image" class="image_fix" src="https://secure.gravatar.com/avatar/3673815784047b7e0673677a0bc7dde0?d=mm&s=48" height="32" width="32" border="0" style="border-radius: 3px; vertical-align: top"> </td>
<td id="header-text-container" valign="middle" style="padding: 0; border-collapse: collapse; vertical-align: middle; font-family: Arial, sans-serif; font-size: 14px; line-height: 20px; mso-line-height-rule: exactly; mso-text-raise: 1px"> <a class="user-hover" rel="yrodiere" id="email_yrodiere" href="https://hibernate.atlassian.net/secure/ViewProfile.jspa?name=yrodiere" style="color:#6c797f;; color: #3b73af; text-decoration: none">Yoann Rodière</a> <strong>commented</strong> on <a href="https://hibernate.atlassian.net/browse/HSEARCH-1927" style="color: #3b73af; text-decoration: none"><img src="cid:jira-generated-image-avatar-6a11ef78-47bf-46b0-88fd-b8691d790f5f" height="16" width="16" border="0" align="absmiddle" alt="Bug"> HSEARCH-1927</a> </td>
</tr>
</tbody>
</table> </td>
</tr>
<tr>
<td id="email-content-container" style="padding: 0; border-collapse: collapse; padding: 0 20px">
<table id="email-content-table" cellspacing="0" cellpadding="0" border="0" width="100%" style="border-collapse: collapse; mso-table-lspace: 0; mso-table-rspace: 0; border-spacing: 0; border-collapse: separate">
<tbody>
<tr>
<!-- there needs to be content in the cell for it to render in some clients -->
<td class="email-content-rounded-top mobile-expand" style="padding: 0; border-collapse: collapse; color: #ffffff; padding: 0 15px 0 16px; height: 15px; background-color: #ffffff; border-left: 1px solid #cccccc; border-top: 1px solid #cccccc; border-right: 1px solid #cccccc; border-bottom: 0; border-top-right-radius: 5px; border-top-left-radius: 5px; height: 10px; line-height: 10px; padding: 0 15px 0 16px; mso-line-height-rule: exactly" height="10" bgcolor="#ffffff"> </td>
</tr>
<tr>
<td class="email-content-main mobile-expand " style="padding: 0; border-collapse: collapse; border-left: 1px solid #cccccc; border-right: 1px solid #cccccc; border-top: 0; border-bottom: 0; padding: 0 15px 0 16px; background-color: #ffffff" bgcolor="#ffffff">
<table class="page-title-pattern" cellspacing="0" cellpadding="0" border="0" width="100%" style="border-collapse: collapse; mso-table-lspace: 0; mso-table-rspace: 0">
<tbody>
<tr>
<td style="vertical-align: top;; padding: 0; border-collapse: collapse; padding-right: 5px; font-size: 20px; line-height: 30px; mso-line-height-rule: exactly" class="page-title-pattern-header-container"> <span class="page-title-pattern-header" style="font-family: Arial, sans-serif; padding: 0; font-size: 20px; line-height: 30px; mso-text-raise: 2px; mso-line-height-rule: exactly; vertical-align: middle"> <a href="https://hibernate.atlassian.net/browse/HSEARCH-1927" style="color: #3b73af; text-decoration: none">Re: Range faceting on multiple numeric values does not work</a> </span> </td>
</tr>
</tbody>
</table> </td>
</tr>
<tr>
<td id="text-paragraph-pattern-top" class="email-content-main mobile-expand comment-top-pattern" style="padding: 0; border-collapse: collapse; border-left: 1px solid #cccccc; border-right: 1px solid #cccccc; border-top: 0; border-bottom: 0; padding: 0 15px 0 16px; background-color: #ffffff; border-bottom: none; padding-bottom: 0" bgcolor="#ffffff">
<table class="text-paragraph-pattern" cellspacing="0" cellpadding="0" border="0" width="100%" style="border-collapse: collapse; mso-table-lspace: 0; mso-table-rspace: 0; font-family: Arial, sans-serif; font-size: 14px; line-height: 20px; mso-line-height-rule: exactly; mso-text-raise: 2px">
<tbody>
<tr>
<td class="text-paragraph-pattern-container mobile-resize-text " style="padding: 0; border-collapse: collapse; padding: 0 0 10px"> <p style="margin: 10px 0 0; margin-top: 0">I dug a bit into the Elasticsearch implementation, and it seems they indeed use <tt>SortedNumericDocValuesField</tt> when indexing:</p>
<div class="code panel" style="border-width: 1px;; border: 1px solid #cccccc; background: #f5f5f5; font-size: 12px; line-height: 1.333; font-family: monospace; border: 1px solid #cccccc; -moz-border-radius: 3px; border-radius: 3px; margin: 9px 0">
<div class="codeHeader panelHeader" style="border-bottom-width: 1px;; border-bottom: 1px solid #cccccc; padding: 9px 12px">
<b>NumberFieldMapper.java, line 707, commit 27496d6b925d8900b3357ff5672cbafa5ef2b154</b>
</div>
<div class="codeContent panelContent" style="padding: 9px 12px">
<pre class="code-java" style="margin: 10px 0 0; margin-top: 0; max-height: 30em; overflow: auto; white-space: pre-wrap; word-wrap: normal">
@Override
<span class="code-keyword" style="color: #000091">public</span> List<Field> createFields(<span class="code-object" style="color: #910091">String</span> name, <span class="code-object" style="color: #910091">Number</span> value,
<span class="code-object" style="color: #910091">boolean</span> indexed, <span class="code-object" style="color: #910091">boolean</span> docValued, <span class="code-object" style="color: #910091">boolean</span> stored) {
List<Field> fields = <span class="code-keyword" style="color: #000091">new</span> ArrayList<>();
<span class="code-keyword" style="color: #000091">if</span> (indexed) {
fields.add(<span class="code-keyword" style="color: #000091">new</span> LongPoint(name, value.longValue()));
}
<span class="code-keyword" style="color: #000091">if</span> (docValued) {
fields.add(<span class="code-keyword" style="color: #000091">new</span> SortedNumericDocValuesField(name, value.longValue()));
}
<span class="code-keyword" style="color: #000091">if</span> (stored) {
fields.add(<span class="code-keyword" style="color: #000091">new</span> StoredField(name, value.longValue()));
}
<span class="code-keyword" style="color: #000091">return</span> fields;
}
</pre>
</div>
</div> <p style="margin: 10px 0 0">On the querying size (range aggretations), the implementation seems to be fully specific (they didn't use any Lucene feature):</p>
<div class="code panel" style="border-width: 1px;; border: 1px solid #cccccc; background: #f5f5f5; font-size: 12px; line-height: 1.333; font-family: monospace; border: 1px solid #cccccc; -moz-border-radius: 3px; border-radius: 3px; margin: 9px 0">
<div class="codeHeader panelHeader" style="border-bottom-width: 1px;; border-bottom: 1px solid #cccccc; padding: 9px 12px">
<b>RangeAggregator.java, line 242, commit 27496d6b925d8900b3357ff5672cbafa5ef2b154</b>
</div>
<div class="codeContent panelContent" style="padding: 9px 12px">
<pre class="code-java" style="margin: 10px 0 0; margin-top: 0; max-height: 30em; overflow: auto; white-space: pre-wrap; word-wrap: normal">
@Override
<span class="code-keyword" style="color: #000091">public</span> LeafBucketCollector getLeafCollector(LeafReaderContext ctx,
<span class="code-keyword" style="color: #000091">final</span> LeafBucketCollector sub) <span class="code-keyword" style="color: #000091">throws</span> IOException {
<span class="code-keyword" style="color: #000091">final</span> SortedNumericDoubleValues values = valuesSource.doubleValues(ctx);
<span class="code-keyword" style="color: #000091">return</span> <span class="code-keyword" style="color: #000091">new</span> LeafBucketCollectorBase(sub, values) {
@Override
<span class="code-keyword" style="color: #000091">public</span> void collect(<span class="code-object" style="color: #910091">int</span> doc, <span class="code-object" style="color: #910091">long</span> bucket) <span class="code-keyword" style="color: #000091">throws</span> IOException {
values.setDocument(doc);
<span class="code-keyword" style="color: #000091">final</span> <span class="code-object" style="color: #910091">int</span> valuesCount = values.count();
<span class="code-keyword" style="color: #000091">for</span> (<span class="code-object" style="color: #910091">int</span> i = 0, lo = 0; i < valuesCount; ++i) {
<span class="code-keyword" style="color: #000091">final</span> <span class="code-object" style="color: #910091">double</span> value = values.valueAt(i);
lo = collect(doc, value, bucket, lo);
}
}
<span class="code-keyword" style="color: #000091">private</span> <span class="code-object" style="color: #910091">int</span> collect(<span class="code-object" style="color: #910091">int</span> doc, <span class="code-object" style="color: #910091">double</span> value, <span class="code-object" style="color: #910091">long</span> owningBucketOrdinal, <span class="code-object" style="color: #910091">int</span> lowBound) <span class="code-keyword" style="color: #000091">throws</span> IOException {
<span class="code-object" style="color: #910091">int</span> lo = lowBound, hi = ranges.length - 1; <span class="code-comment" style="color: #808080">// all candidates are between these indexes
</span> <span class="code-object" style="color: #910091">int</span> mid = (lo + hi) >>> 1;
<span class="code-keyword" style="color: #000091">while</span> (lo <= hi) {
<span class="code-keyword" style="color: #000091">if</span> (value < ranges[mid].from) {
hi = mid - 1;
} <span class="code-keyword" style="color: #000091">else</span> <span class="code-keyword" style="color: #000091">if</span> (value >= maxTo[mid]) {
lo = mid + 1;
} <span class="code-keyword" style="color: #000091">else</span> {
<span class="code-keyword" style="color: #000091">break</span>;
}
mid = (lo + hi) >>> 1;
}
<span class="code-keyword" style="color: #000091">if</span> (lo > hi) <span class="code-keyword" style="color: #000091">return</span> lo; <span class="code-comment" style="color: #808080">// no potential candidate
</span>
<span class="code-comment" style="color: #808080">// binary search the lower bound
</span> <span class="code-object" style="color: #910091">int</span> startLo = lo, startHi = mid;
<span class="code-keyword" style="color: #000091">while</span> (startLo <= startHi) {
<span class="code-keyword" style="color: #000091">final</span> <span class="code-object" style="color: #910091">int</span> startMid = (startLo + startHi) >>> 1;
<span class="code-keyword" style="color: #000091">if</span> (value >= maxTo[startMid]) {
startLo = startMid + 1;
} <span class="code-keyword" style="color: #000091">else</span> {
startHi = startMid - 1;
}
}
<span class="code-comment" style="color: #808080">// binary search the upper bound
</span> <span class="code-object" style="color: #910091">int</span> endLo = mid, endHi = hi;
<span class="code-keyword" style="color: #000091">while</span> (endLo <= endHi) {
<span class="code-keyword" style="color: #000091">final</span> <span class="code-object" style="color: #910091">int</span> endMid = (endLo + endHi) >>> 1;
<span class="code-keyword" style="color: #000091">if</span> (value < ranges[endMid].from) {
endHi = endMid - 1;
} <span class="code-keyword" style="color: #000091">else</span> {
endLo = endMid + 1;
}
}
<span class="code-keyword" style="color: #000091">assert</span> startLo == lowBound || value >= maxTo[startLo - 1];
<span class="code-keyword" style="color: #000091">assert</span> endHi == ranges.length - 1 || value < ranges[endHi + 1].from;
<span class="code-keyword" style="color: #000091">for</span> (<span class="code-object" style="color: #910091">int</span> i = startLo; i <= endHi; ++i) {
<span class="code-keyword" style="color: #000091">if</span> (ranges[i].matches(value)) {
collectBucket(sub, doc, subBucketOrdinal(owningBucketOrdinal, i));
}
}
<span class="code-keyword" style="color: #000091">return</span> endHi + 1;
}
};
}
</pre>
</div>
</div> <p style="margin: 10px 0 0">SolR, on the other hand, seems to use <tt>SortedSetDocValues</tt> for numeric multi-valued fields, storing the numeric value as <tt>ByteRefs</tt><br> I only checked the faceting query part, which looks like that:</p>
<div class="code panel" style="border-width: 1px;; border: 1px solid #cccccc; background: #f5f5f5; font-size: 12px; line-height: 1.333; font-family: monospace; border: 1px solid #cccccc; -moz-border-radius: 3px; border-radius: 3px; margin: 9px 0">
<div class="codeHeader panelHeader" style="border-bottom-width: 1px;; border-bottom: 1px solid #cccccc; padding: 9px 12px">
<b>"IntervalFacets.java, line 176, commit 93562da610bf8756351be7720c69872bc1cea727"</b>
</div>
<div class="codeContent panelContent" style="padding: 9px 12px">
<pre class="code-java" style="margin: 10px 0 0; margin-top: 0; max-height: 30em; overflow: auto; white-space: pre-wrap; word-wrap: normal">
<span class="code-keyword" style="color: #000091">private</span> void doCount() <span class="code-keyword" style="color: #000091">throws</span> IOException {
<span class="code-keyword" style="color: #000091">if</span> (schemaField.getType().getNumericType() != <span class="code-keyword" style="color: #000091">null</span> && !schemaField.multiValued()) {
getCountNumeric();
} <span class="code-keyword" style="color: #000091">else</span> {
getCountString();
}
}
</pre>
</div>
</div>
<div class="code panel" style="border-width: 1px;; border: 1px solid #cccccc; background: #f5f5f5; font-size: 12px; line-height: 1.333; font-family: monospace; border: 1px solid #cccccc; -moz-border-radius: 3px; border-radius: 3px; margin: 9px 0">
<div class="codeHeader panelHeader" style="border-bottom-width: 1px;; border-bottom: 1px solid #cccccc; padding: 9px 12px">
<b>"IntervalFacets.java, line 246, commit 93562da610bf8756351be7720c69872bc1cea727"</b>
</div>
<div class="codeContent panelContent" style="padding: 9px 12px">
<pre class="code-java" style="margin: 10px 0 0; margin-top: 0; max-height: 30em; overflow: auto; white-space: pre-wrap; word-wrap: normal">
<span class="code-keyword" style="color: #000091">private</span> void getCountString() <span class="code-keyword" style="color: #000091">throws</span> IOException {
Filter filter = docs.getTopFilter();
List<LeafReaderContext> leaves = searcher.getTopReaderContext().leaves();
<span class="code-keyword" style="color: #000091">for</span> (<span class="code-object" style="color: #910091">int</span> subIndex = 0; subIndex < leaves.size(); subIndex++) {
LeafReaderContext leaf = leaves.get(subIndex);
DocIdSet dis = filter.getDocIdSet(leaf, <span class="code-keyword" style="color: #000091">null</span>); <span class="code-comment" style="color: #808080">// solr docsets already exclude any deleted docs
</span> <span class="code-keyword" style="color: #000091">if</span> (dis == <span class="code-keyword" style="color: #000091">null</span>) {
<span class="code-keyword" style="color: #000091">continue</span>;
}
DocIdSetIterator disi = dis.iterator();
<span class="code-keyword" style="color: #000091">if</span> (disi != <span class="code-keyword" style="color: #000091">null</span>) {
<span class="code-keyword" style="color: #000091">if</span> (schemaField.multiValued()) {
SortedSetDocValues sub = leaf.reader().getSortedSetDocValues(schemaField.getName());
<span class="code-keyword" style="color: #000091">if</span> (sub == <span class="code-keyword" style="color: #000091">null</span>) {
<span class="code-keyword" style="color: #000091">continue</span>;
}
<span class="code-keyword" style="color: #000091">final</span> SortedDocValues singleton = DocValues.unwrapSingleton(sub);
<span class="code-keyword" style="color: #000091">if</span> (singleton != <span class="code-keyword" style="color: #000091">null</span>) {
<span class="code-comment" style="color: #808080">// some codecs may optimize SORTED_SET storage <span class="code-keyword" style="color: #000091; color: #808080">for</span> single-valued fields
</span> accumIntervalsSingle(singleton, disi, dis.bits());
} <span class="code-keyword" style="color: #000091">else</span> {
accumIntervalsMulti(sub, disi, dis.bits());
}
} <span class="code-keyword" style="color: #000091">else</span> {
SortedDocValues sub = leaf.reader().getSortedDocValues(schemaField.getName());
<span class="code-keyword" style="color: #000091">if</span> (sub == <span class="code-keyword" style="color: #000091">null</span>) {
<span class="code-keyword" style="color: #000091">continue</span>;
}
accumIntervalsSingle(sub, disi, dis.bits());
}
}
}
}
</pre>
</div>
</div> <p style="margin: 10px 0 0">See how multi-valued fields are handled by <tt>getCountString()</tt>.<br> The full code is in <tt>IntervalFacets.java</tt>: it's a bit dense, but it really seems to be designed to work with numeric values.</p> <p style="margin: 10px 0 0">This all tends to prove that Lucene itself is not capable of doing what we want to do without some additional work.<br> Judging by the amount of attention the Lucene ticket (<a href="https://issues.apache.org/jira/browse/LUCENE-7044" class="external-link" rel="nofollow" style="color: #3b73af; text-decoration: none">https://issues.apache.org/jira/browse/LUCENE-7044</a>) has received, I'd say we'll have to do it ourselves.</p> </td>
</tr>
</tbody>
</table> </td>
</tr>
<tr>
<td class="email-content-main mobile-expand " style="padding: 0; border-collapse: collapse; border-left: 1px solid #cccccc; border-right: 1px solid #cccccc; border-top: 0; border-bottom: 0; padding: 0 15px 0 16px; background-color: #ffffff" bgcolor="#ffffff"> <script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "EmailMessage",
"description": "View Issue",
"potentialAction": {
"@type": "ViewAction",
"target": "https://hibernate.atlassian.net/browse/HSEARCH-1927?inbox=true&focusedCommentId=88210&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-88210",
"name": "View Comment"
},
"publisher": {
"@type": "Organization",
"name": "Atlassian",
"url": "https://www.atlassian.com"
}
}
</script>
<table id="actions-pattern" cellspacing="0" cellpadding="0" border="0" width="100%" style="border-collapse: collapse; mso-table-lspace: 0; mso-table-rspace: 0; font-family: Arial, sans-serif; font-size: 14px; line-height: 20px; mso-line-height-rule: exactly; mso-text-raise: 1px">
<tbody>
<tr>
<td id="actions-pattern-container" valign="middle" style="padding: 0; border-collapse: collapse; padding: 10px 0 10px 24px; vertical-align: middle; padding-left: 0">
<table align="left" style="border-collapse: collapse; mso-table-lspace: 0; mso-table-rspace: 0">
<tbody>
<tr>
<td class="actions-pattern-action-icon-container" style="padding: 0; border-collapse: collapse; font-family: Arial, sans-serif; font-size: 14px; line-height: 20px; mso-line-height-rule: exactly; mso-text-raise: 0; vertical-align: middle"> <a href="https://hibernate.atlassian.net/browse/HSEARCH-1927#add-comment" target="_blank" title="Add Comment" style="color: #3b73af; text-decoration: none"> <img class="actions-pattern-action-icon-image" src="cid:jira-generated-image-static-comment-icon-c7bcb1b5-5332-4ce9-9228-8ad672844d7c" alt="Add Comment" title="Add Comment" height="16" width="16" border="0" style="vertical-align: middle"> </a> </td>
<td class="actions-pattern-action-text-container" style="padding: 0; border-collapse: collapse; font-family: Arial, sans-serif; font-size: 14px; line-height: 20px; mso-line-height-rule: exactly; mso-text-raise: 4px; padding-left: 5px"> <a href="https://hibernate.atlassian.net/browse/HSEARCH-1927#add-comment" target="_blank" title="Add Comment" style="color: #3b73af; text-decoration: none">Add Comment</a> </td>
</tr>
</tbody>
</table> </td>
</tr>
</tbody>
</table> </td>
</tr>
<!-- there needs to be content in the cell for it to render in some clients -->
<tr>
<td class="email-content-rounded-bottom mobile-expand" style="padding: 0; border-collapse: collapse; color: #ffffff; padding: 0 15px 0 16px; height: 5px; line-height: 5px; background-color: #ffffff; border-top: 0; border-left: 1px solid #cccccc; border-bottom: 1px solid #cccccc; border-right: 1px solid #cccccc; border-bottom-right-radius: 5px; border-bottom-left-radius: 5px; mso-line-height-rule: exactly" height="5" bgcolor="#ffffff"> </td>
</tr>
</tbody>
</table> </td>
</tr>
<tr>
<td id="footer-pattern" style="padding: 0; border-collapse: collapse; padding: 12px 20px">
<table id="footer-pattern-container" cellspacing="0" cellpadding="0" border="0" style="border-collapse: collapse; mso-table-lspace: 0; mso-table-rspace: 0">
<tbody>
<tr>
<td id="footer-pattern-text" class="mobile-resize-text" width="100%" style="padding: 0; border-collapse: collapse; color: #999999; font-size: 12px; line-height: 18px; font-family: Arial, sans-serif; mso-line-height-rule: exactly; mso-text-raise: 2px"> This message was sent by Atlassian JIRA <span id="footer-build-information">(v1000.656.2#100024-<span title="71a5b157ecd6fc52905681569063805810158d50" data-commit-id="71a5b157ecd6fc52905681569063805810158d50}">sha1:71a5b15</span>)</span> </td>
<td id="footer-pattern-logo-desktop-container" valign="top" style="padding: 0; border-collapse: collapse; padding-left: 20px; vertical-align: top">
<table style="border-collapse: collapse; mso-table-lspace: 0; mso-table-rspace: 0">
<tbody>
<tr>
<td id="footer-pattern-logo-desktop-padding" style="padding: 0; border-collapse: collapse; padding-top: 3px"> <img id="footer-pattern-logo-desktop" src="cid:jira-generated-image-static-footer-desktop-logo-e77af9ad-cc85-45bd-a0b4-d8a2d35d93dd" alt="Atlassian logo" title="Atlassian logo" width="169" height="36" class="image_fix"> </td>
</tr>
</tbody>
</table> </td>
</tr>
</tbody>
</table> </td>
</tr>
</tbody>
</table>
</body>
</html>