[OGM] Shutdown embedded MongoDB instance
by Guillaume SCHEIBEL
Hi guys,
I'm working on OGM-303 and after having run the test suite when comes the
moment to shutdown the embedded mongodb instance I get this:
*WARNING: sendShutdown /127.0.0.1:27018 <http://127.0.0.1:27018>*
*java.net.SocketException: Connection reset*
Is it somehow a normal behavior ?
Guillaume
10 years, 10 months
Java 8 is coming - testing time is running out!
by Sanne Grinovero
We promised to test things to the OpenJDK team: I would highly
appreciate a little love being directed at:
http://ci.hibernate.org/view/JDK8/
The JDK version installed on CI is the final candidate release,
released on 7th of february 2014
As previously discussed, the Animal Sniffer plugin is not compatible
so it needs to be disabled. All other failures need a bit of
investigation!
Cheers,
Sanne
10 years, 10 months
[Search] DisjunctionMaxQuery and MoreLikeThis
by Emmanuel Bernard
I have been thinking about our initial idea to use DisjunctionMaxQuery
(aka DisMax) with MoreLikeThis instead of the Boolean query we have
today.
## Definition and landscape
DisMax lets you amongst a set of subqueries under a SHOULD clause boost
the matching documents up to the score of the highest subquery (and not
add up the score of each).
A concrete use case is as followed. If the query is "albino elephant"
this ensures that "albino" matching one field and "elephant" matching
another gets a higher score than "albino" matching both fields.
Each term (albinos and elephant) has a DisMax query where the subqueries
are a term query for each targeted field. Then both DisMax queries are
joined with a regular boolean query.
In peusdo HSearch query DSL it would look like:
.bool()
.should(
.dismax()
.should(
.keyword().onField("title").matching("Albinos")
)
.should(
.keyword().onField("description").matching("Albinos")
)
)
.should(
.dismax()
.should(
.keyword().onField("title").matching("Elephant")
)
.should(
.keyword().onField("description").matching("Elephant")
)
)
## More Like This (aka MLT)
Our more like this algorithm does the following.
- look for the term vectors of a document i
- for each field contained in document i (or a subset)
- find the most popular terms the field f of document i
- build a boolean query with the most popular terms on field f
- combine these boolean queries per field into a bigger boolean query
The original Lucene more like this algorithm is a bit different in the
sense that it does not look for popular terms *per field* but rather
look for an all star popular term for document i and then build a
boolean query with the most popular term for each field.
## More Like This and DisMax
With our MLT approach, terms between fields are not necessarily
shared. In fact they are only looked for if they belong to the field f
of document i in the first place.
I don't see how DisMax would be of any use for us as we don't have a
common set of terms that we look for across several fields. At least not
to solve the now famous albinos elephant problem.
We could use Dismax for the final top boolean query. The effect would be
that documents are scored up to the highest lookalike-factor of their
best field as opposed to the cumulated lookalike-ness of each field.
Is that desirable? It does not look like it. I would naturally use boost
factors between fields to express their respective importance but still
want to find matching documents across all fields.
Thoughts?
## DisMax and our current keyword matching
It would make some sense I think to offer DisMax for our current keyword
matching queries.
.keyword().onFields("title", "description").matching("Albinos Elephant")
In this case **and assuming the same analyzer for both fields**, we
could use DisMax to essentially do
.bool()
.should(
.dismax()
.should( keyword().onField("title").matching("Albinos") )
.should( keyword().onField("description").matching("Albinos") )
)
.should(
.dismax()
.should( keyword().onField("title").matching("Elephant") )
.should( keyword().onField("description").matching("Elephant") )
)
I am not sure how we would call that effect?
- .favorMultipleKeywordMatching()
- .decreaseCrossFieldKeywordImportanceBy(90%) //this number is 1 - DisMax tieBreakMultiplier for the curious ; 100% is what I have described above
## DisMax as top level DSL feature
Should we add .dismax() like we did bool()?
I am hard pressed to find a use case.
Emmanuel
10 years, 10 months
WELD-1606
by Abhijit Sarkar
Hi,
Does anyone want to take a look at WELD-1606 and help me out? It seems
pretty similar to WELD-1498, a fix for which is targeted for 2.1.0.Beta2.
Regards,
Abhijit Sarkar
10 years, 10 months