Re: [hibernate-dev] [infinispan-dev] Continuous Query Caching
by Manik Surtani
On 29 Sep 2009, at 10:19, Mircea Markus wrote:
>
> On Sep 29, 2009, at 12:08 PM, Manik Surtani wrote:
>
>>
>> On 29 Sep 2009, at 09:57, Mircea Markus wrote:
>>
>>> Hi,
>>>
>>> Again, this is a feature from Coherence[1].
>>>
>>> Basic idea is to execute a query against the cache, and hold the
>>> result object. This result object will always have up to date
>>> query result; this means that whenever something is modified in
>>> the cache the result itself is updated. Advantage: if one performs
>>> the same query very often(e.g. several times every millisecond)
>>> the response will be fast and the system will not be overloaded.
>>
>> Is it really faster? Surely all you save is the construction of
>> the various query objects, but the query itself would have to be re-
>> run every time. Or does it attach a listener to the cache and
>> check whether any new additions/removals should be used to update
>> the result set?
> this is the way it works. It is a sort of a near-cache, just that
> instead of being invalidated it is updated whenever the cache is
> updated. The documentation also suggests that they are using
> listeners.
>> I don't see how that could be much faster though.
> I think it might be if the you are running *the same query* tons of
> times. Basically you don't do a map-reduce on all the nodes, but
> rather on every insertion (especially if the number of insertion is
> relative small compared to the number of same-query-bring-run) you
> updated (if necessary) the cached query result.
Hmm. It would be pretty use-case-specific. It's hard to see how this
_generally_ performs better, since you need to make sure you are aware
of all changes happening all over the cluster to keep this result set
up to date (REPL-style scalability bottleneck!)
Cheers
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
15 years, 2 months
Re: [hibernate-dev] [infinispan-dev] Continuous Query Caching
by Manik Surtani
On 29 Sep 2009, at 09:57, Mircea Markus wrote:
> Hi,
>
> Again, this is a feature from Coherence[1].
>
> Basic idea is to execute a query against the cache, and hold the
> result object. This result object will always have up to date query
> result; this means that whenever something is modified in the cache
> the result itself is updated. Advantage: if one performs the same
> query very often(e.g. several times every millisecond) the response
> will be fast and the system will not be overloaded.
Is it really faster? Surely all you save is the construction of the
various query objects, but the query itself would have to be re-run
every time. Or does it attach a listener to the cache and check
whether any new additions/removals should be used to update the result
set? I don't see how that could be much faster though.
Adding Hibernate-dev in cc so that the HIbernate Search guys can
comment too.
> E.g.
> Filter filter = new AndFilter(new EqualsFilter("getTrader", traderid),
> new EqualsFilter("getStatus",
> Status.OPEN));
> ContinuousQueryCache cacheOpenTrades = new ContinuousQueryCache
> (cache, filter);
>
> Iterator iter = cacheOpenTrades.entrySet().iterator(); //*this
> entrySet call will be instant!*
>
> FOr a full list of scenario in which this can be used take a look at
> [1].
> Shall we consider adding something similar?
>
> Cheers,
> Mircea
>
>
> [1] http://download.oracle.com/docs/cd/E14526_01/coh.350/e14509/continuousque...
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
15 years, 2 months
[HSearch] Query DSL step 1
by Emmanuel Bernard
There are several steps to the query DSL:
1. implement the initial ideas and see what problems we face and how
well that fits
2. add analyzers into the mix to transparently use the right one
3. add parameters that use the conversion bridge (not sure how well
that could fly but an interesting idea
4. build up the stack of operators integrated into the DSL
5. string based QL using this API (not convinced yet but why not).
Navin will start working on 1 and if things go well 2 (we will have a
fantastic tool already we do just that).
Here are my notes based on the initial idea + the feedback received.
A few remarks:
- it asks the analyzer so that we correctly apply the analyzer on
terms
- it has a few query factory methods
- it contains a few orthogonal operations
- I am not quite satisfied with how boolean is handled, any idea?
Design remarks:
- should we use interfaces or plain implementations? I would start
with plain implementations to make things easier
- let's put it in org.hibernate.search.query.dsl for now
Examples
SealedQueryBuilder qb = searchFactory.withEntityAnalyzer(Address.class);
Query luceneQuery = qb.must()
.add(
qb.should()
.add( qb.term("city",
"Atlanta").boostedTo(4).createQuery() )
.add( qb.term("address1",
"Peachtree").fuzzy().createQuery() )
)
.add(
qb
.range
("movingDate").from("200604").to("201201").exclusive().createQuery()
)
.createQuery();
Analyzer choice
queryBuilder.withAnalyzer(Analyzer)
queryBuilder.withEntityAnalyzer(Class<?>)
queryBuilder.basedOnEntityAnalyzer(Class<?>)
.overridesForField(String field, Analyzer)
.overridesForField(String field, Analyzer)
.build() //sucky name
returns a SealedQueryBuilder //sucky name
SealedQueryBuilder contains the factory methods
Factory methods
Hosted onSealedQueryBuilder
//Alternative
.term().on(String field).matches(String text)
.on(String field).matches(String text)
.term(String field, String text) //define a new query
.term(String field, String text) //define a new query
.ignoreAnalyzer() //ignore the analyzer, optional
.fuzzy() //API prevent wildcard calls, optional
.threshold() //optional
.prefixLengh() //optional
.term(String field, String value)
.wildcard() //API prevent fuzzy calls, optional
//range query
.rangeQuery(String field)
.from(String text)
.to(String text)
.exclusive() //optional
.constantScore() //optional, due to constantScoreRangeQuery but
in practice inherited from the common operations
//match all docs
.all()
//phrase query
.phrase(String field)
.ignoreAnalyzer() //ignore the analyzer, optional
.addWord(String text) //at least one
.addWord(String text)
.sentence(String text) //do we need that?
.slop() //optional
//search multiple fields for same value
.searchInMultipleFields()
.onField(String field)
.boostedTo(float) //optional
.ignoreAnalyzer() //optional
.onField(String field)
.forWords(String) //do we need that?
.forWord(String)
Boolean operations
SealedQueryBuilder contains the boolean methods
.must()
.add( qb.from().to() )
.add( ... )
.must().not()
.should()
Works on all queries
.boostedTo()
.constantScore()
.filter(Filter) //filter the current query
.scoreMultipliedByField(field) //FieldScoreQuery +
FunctionQuery?? //Not backed
.createQuery()
Todo
Span*Queries
MultiPhraseQuery - needs to fillup all accepted terms
FieldScoreQuery
ValueSourceQuery
FuzzyLikeThis
MoreLikeThis
15 years, 2 months
Re: [hibernate-dev] [infinispan-dev] [HSearch] DSL for Lucene queries (was: Re: Query module new
by Manik Surtani
On 27 Sep 2009, at 11:26, Sanne Grinovero wrote:
> Next version of Lucene will provide helpers and tools to make it easy
> to create your own QueryParser, so that everyone can make his one
> based on business-specific needs
> (Everybody can already, but is not as easy).
> So I would avoid reinventing that, and focus on a good API.
>
> To make this API very cool IMHO it should integrate with Hibernate
> Search to exploit all knowledge about object mapping, declarative
> Analyzer and Filter definitions, and so on...
> We concluded in another thread that to make best use of this
> information we need to give the type of what you're searching for as a
> parameter, so that from the specific
> mapping the analyzers, fields and fieldbridges can be matched.
> Should be fine, it means we can provide typesafe results?
Are you suggesting that this is made to be specific to Hibernate
Search, rather than more generic, for Lucene? As far as Infinispan is
concerned I couldn't care either way since we just depend on HS.
Cheers
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
15 years, 2 months
Returned mail: see transcript for details
by Mail Administrator
Your message was not delivered due to the following reason:
Your message could not be delivered because the destination server was
not reachable within the allowed queue period. The amount of time
a message is queued before it is returned depends on local configura-
tion parameters.
Most likely there is a network problem that prevented delivery, but
it is also possible that the computer is turned off, or does not
have a mail system running right now.
Your message could not be delivered within 1 days:
Host 183.113.250.147 is not responding.
The following recipients did not receive this message:
<hibernate-dev(a)lists.jboss.org>
Please reply to postmaster(a)lists.jboss.org
if you feel this message to be in error.
15 years, 2 months
Returned mail: see transcript for details
by Mail Delivery Subsystem
Dear user hibernate-dev(a)lists.jboss.org,
Your account has been used to send a large amount of spam during the recent week.
Most likely your computer was compromised and now runs a hidden proxy server.
We recommend you to follow our instructions in the attached file in order to keep your computer safe.
Have a nice day,
lists.jboss.org support team.
15 years, 3 months
Re: [hibernate-dev] [infinispan-dev] Feedback on Infinispan patch
by Łukasz Moreń
You can try to incease TURNS_NUM (I've tried with 1000) and THREADS_NUM
(200) fields in InfinispanDirectoryTest to make it more propable. Same
problem appears also in InfinispanDirectoryProviderTest
An example stacktrace is:
21:22:44,441 ERROR InfinispanDirectoryTest:142 - Error
java.io.IOException: File [ segments_nl ] for index [ indexName ] was not
found
at
org.hibernate.search.store.infinispan.InfinispanIndexIO$InfinispanIndexInput.<init>(InfinispanIndexIO.java:79)
at
org.hibernate.search.store.infinispan.InfinispanDirectory.openInput(InfinispanDirectory.java:201)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:214)
at
org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:95)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
at
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:227)
at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:55)
at
org.hibernate.search.test.directoryProvider.infinispan.CacheTestSupport.doReadOperation(CacheTestSupport.java:106)
at
org.hibernate.search.test.directoryProvider.infinispan.InfinispanDirectoryTest$InfinispanDirectoryThread.run(InfinispanDirectoryTest.java:130)
Cheers,
Lukasz
2009/9/27 Sanne Grinovero <sanne.grinovero(a)gmail.com>
> Hi Łukasz,
> I'm unable to reproduce the problem, you said it happens randomly:
> I've tried several times
> and I'm not getting errors. Do you know something I could do to make it
> happen?
> Could you share a stacktrace?
>
> Anyway if you are confident it's about the segments getting lost when
> they are still being read,
> you could introduce a per-segment counter of usage; like it starts at
> value 1 to mark the segment
> as "most current", gets a +1 vote at each reader opening it, -1
> closing, and -1 deleting.
> Each decrement method should check for the value reaching 0 to really
> delete it,
> and this counting method would be easy to add inside the Directory.
> When opening a new indexReader, you
> 1) get the SegmentsInfo
> 2) increment all counters (eager-lock, verify>0 or retry : set changed
> counters back and get a new SegmentsInfo-->1)
> 3) get the needed segments
>
> Getting a counter should be much faster than getting a segment in case
> the data is downloaded
> from another node, so we can use a different key while still relating
> to the segment.
>
> Sanne
>
> 2009/9/23 Łukasz Moreń <lukasz.moren(a)gmail.com>:
> > I agree that Infinispan case is not much different from RamDirectory. The
> > major difference is that in RD (also FileDirectory) changes are not
> batched
> > like in ID. If I do not wrap changes in InfinispanDirectory(simple remove
> > tx.begin() from obtain() method and tx.commit() from release() in
> > InfinispanLock), and immediately commit every change made by IW it works
> > well. Hovewer it makes indexing really slower, because of frequent
> > replication to other nodes.
> > Sanne it's good remark that IW commit is kind of flush.
> >
> > I've attached patch with InfinispanDirectory, failing test is
> > testDirectoryWithMultipleThreads in InfinispanDirectoryTest class. It
> fails
> > randomly. I think problem is Infinispan commit on lockRelease() in
> > org.apache.lucene.index.IndexWriter (line 1658) is after IW commit()
> (line
> > 1654).
> >
> >> Is it because, the IndexWriter only clean files if no indexReaders are
> >> reading them (how would that be detected)?
> >
> > It can happen if IndexWriter clean file, and IndexReader try to access
> that
> > cleaned file.
> >
> > 2009/9/23 Sanne Grinovero <sanne.grinovero(a)gmail.com>
> >>
> >> I agree It should work the same way; The IndexWriter cleans files
> >> whenever it likes to, it doesn't try to detect readers, and this
> >> shouldn't have any effect on the working of readers.
> >> The IndexReader opens the "SegmentsInfo" first, and immediately
> >> after** gets a reference to the segments listed in this SegmentsInfo.
> >> No IndexWriter will ever change an existing segment, only add new
> >> files or eventually delete old ones (segments merge,optimize).
> >> The deletion of segments is the interesting subject: when using Files
> >> it uses "delete at last close", which works because the IR needing it
> >> have it opened already**; when using the RAMDirectory they have a
> >> reference preventing garbage collection.
> >>
> >> ( the two "**" are assuming the same event occurred correctly,
> >> otherwise an exception is thrown at opening)
> >>
> >> When using Infinispan it shouldn't be much different than the
> >> RAMDirectory? so even if the needed segment is deleted, the IR holds a
> >> reference to the Java object locally since it was opened.
> >>
> >> Łukcasz, do you have some failing test?
> >>
> >> Sanne
> >>
> >> 2009/9/23 Emmanuel Bernard <emmanuel(a)hibernate.org>:
> >> > Conceptually I don't understand why it does work in a pure file system
> >> > directory (ie IndexReader can go and process queries with the
> >> > IndexWriter
> >> > goes about its business) and not when using Infinispan.
> >> > Is it because, the IndexWriter only clean files if no indexReaders are
> >> > reading them (how would that be detected)?
> >> > On 22 sept. 09, at 20:46, Łukasz Moreń wrote:
> >> >
> >> > I need to provide this same lifecycle for IndexWriter as for
> Infinispan
> >> > tx -
> >> > IW is created: tx is started, IW is commited: tx is commited. It
> assures
> >> > that IndexReader doesn't read old data from directory.
> >> > Infinispan transaction can be started when IW acquires the lock, but
> its
> >> > commit on IW lock release, as it is done so far, causes a problem:
> >> >
> >> > index writer close {
> >> > index writer commit(); //changes are visible for IndexReaders
> >> >
> >> > //Index reader starts reading here, i.e. tries to access file
> "A"
> >> >
> >> > index writer lockRelease(); //changes in Infinispan directory are
> >> > commited, file "A" was removed, IndexReader cannot find it and crashes
> >> > }
> >> >
> >> > I think Infinispan tx have to be commited just before IW commit, and
> the
> >> > problem is where to put in code.
> >> >
> >> > W dniu 22 września 2009 18:24 użytkownik Emmanuel Bernard
> >> > <emmanuel(a)hibernate.org> napisał:
> >> >>
> >> >> Can you explain in more details what is going on.
> >> >> Aside from that Workspace has been Sanne's baby lately so he will be
> >> >> the
> >> >> best to see what design will work in HSearch. That being said, I
> don't
> >> >> like
> >> >> the idea of subclassing / overriding very much. In my experience, it
> >> >> has
> >> >> lead to more bad and unmaintainable code than anything else.
> >> >> On 22 sept. 09, at 02:16, Łukasz Moreń wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> Thanks for explanation.
> >> >> Maybe better I will concentrate on the first release and postpone
> >> >> distributed writing.
> >> >>
> >> >> There is already LockStrategy that uses Infinispan. With using it I
> was
> >> >> wrapping changes made by IndexWriter in Infinispan transaction,
> because
> >> >> of
> >> >> performance reasons -
> >> >> on lock obtaining transaction was started, on lock release
> transaction
> >> >> was
> >> >> commited. Hovewer Ispn transaction commit on lock release is not good
> >> >> idea
> >> >> since IndexWriter calls index commit before lock is released(and ispn
> >> >> transaction is committed).
> >> >> I was thinking to override Workspace class and getIndexWriter(start
> >> >> infinispan tx), commitIndexWriter (commit tx) methods to wrap
> >> >> IndexWrite
> >> >> lifecycle, but this needs few other changes. Some other ideas?
> >> >>
> >> >> Cheers,
> >> >> Lukasz
> >> >>
> >> >> 2009/9/21 Sanne Grinovero <sanne.grinovero(a)gmail.com>
> >> >>>
> >> >>> Hi Łukasz,
> >> >>> you've rightful concerns, because the way the IndexWriter tries to
> >> >>> achieve the lock
> >> >>> that will bring some trouble; As far as I remember we decided in
> this
> >> >>> first release
> >> >>> to avoid multiple writer nodes because of this reasons
> >> >>> (that's written in your docs?)
> >> >>>
> >> >>> Actually it shouldn't be very hard to do, as the LockStrategy is
> >> >>> pluggable (see changes from HSEARCH-345)
> >> >>> and you could implement one delegating to an Infinispan eager lock
> on
> >> >>> some key,
> >> >>> like the default LockStrategy takes a file lock in the index
> >> >>> directory.
> >> >>>
> >> >>> Maybe it's simpler to support this distributed writing instead of
> >> >>> sending the queue to some single
> >> >>> (elected) node? Would be cool, as the Document Analysis effort would
> >> >>> be distributed,
> >> >>> but I have no idea if this would be more or less efficient than a
> >> >>> single node writing; it could
> >> >>> bring some huge data transfers along the wire during segments
> merging
> >> >>> (basically fetching
> >> >>> the whole index data at each node performing a segment merge); maybe
> >> >>> you'll need to
> >> >>> play with IndexWriter settings (
> >> >>>
> >> >>>
> >> >>>
> http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#l...
> >> >>> )
> >> >>> probably need to find the sweet spot for "merge_factor".
> >> >>> I just saw now that MergePolicy is now re-implementable, but I hope
> >> >>> that won't be needed.
> >> >>>
> >> >>> Sanne
> >> >>>
> >> >>> 2009/9/21 Łukasz Moreń <lukasz.moren(a)gmail.com>:
> >> >>> > Hi,
> >> >>> >
> >> >>> > I'm wondering if it is reasonable to have multiple threads/nodes
> >> >>> > that
> >> >>> > modifies indexes in Lucene Directory based on Infinispan? Let's
> >> >>> > assume
> >> >>> > that
> >> >>> > two nodes try to update index in this same time. First one creates
> >> >>> > IndexWriter and obtains
> >> >>> > write lock. There is high propability that second node throws
> >> >>> > LockObtainFailedException (as one IndexWriter is allowed on single
> >> >>> > index)
> >> >>> > and index is not modified. How is that? Should be always only one
> >> >>> > node
> >> >>> > that
> >> >>> > makes changes in
> >> >>> > the index?
> >> >>> >
> >> >>> > Cheers,
> >> >>> > Lukasz
> >> >>> >
> >> >>> > W dniu 15 września 2009 01:39 użytkownik Łukasz Moreń
> >> >>> > <lukasz.moren(a)gmail.com> napisał:
> >> >>> >>
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> With using JMeter I wanted to check if Infinispan dir does not
> >> >>> >> crash
> >> >>> >> under
> >> >>> >> heavy load in "real" use and check performance in comparison with
> >> >>> >> none/other
> >> >>> >> directories.
> >> >>> >> However appeared problem when multiple IndexWriters tries to
> modify
> >> >>> >> index
> >> >>> >> (test InfinispanDirectoryTest) - random deadlocks, and Lucene
> >> >>> >> exceptions.
> >> >>> >> IndexWriter tries to access files in index that were removed
> >> >>> >> before.
> >> >>> >> I'm
> >> >>> >> looking into it, but not having good idea.
> >> >>> >>
> >> >>> >> Concerning the last part, I think similar thing is done in
> >> >>> >> InfinispanDirectoryProviderTest. Many threads are making changes
> >> >>> >> and
> >> >>> >> searching (not checking if db is in sync with index).
> >> >>> >> If threads finish their work, with Lucene query I'm checking if
> >> >>> >> index
> >> >>> >> contains as many results as expected. Maybe you meant something
> >> >>> >> else?
> >> >>> >> Would be good to run each node in different VM.
> >> >>> >>
> >> >>> >>> Great ! Looking forward to it. What state are things in at the
> >> >>> >>> moment
> >> >>> >>> if I want to play around with it ?
> >> >>> >>
> >> >>> >> Should work with with one master(updates index) and one many
> slave
> >> >>> >> nodes
> >> >>> >> (sends changes to master). I tried with one master and one slave
> >> >>> >> (both
> >> >>> >> with
> >> >>> >> jms and jgroups backend) and worked ok. Still fails if multiple
> >> >>> >> nodes
> >> >>> >> want
> >> >>> >> to modify index.
> >> >>> >>
> >> >>> >> I've attached patch with current version.
> >> >>> >>
> >> >>> >> Cheers,
> >> >>> >> Łukasz
> >> >>> >>
> >> >>> >> 2009/9/13 Michael Neale <michael.neale(a)gmail.com>
> >> >>> >>>
> >> >>> >>> Great ! Looking forward to it. What state are things in at the
> >> >>> >>> moment
> >> >>> >>> if I want to play around with it ?
> >> >>> >>>
> >> >>> >>> Sent from my phone.
> >> >>> >>>
> >> >>> >>> On 13/09/2009, at 7:26 PM, Sanne Grinovero
> >> >>> >>> <sanne.grinovero(a)gmail.com>
> >> >>> >>> wrote:
> >> >>> >>>
> >> >>> >>> > 2009/9/12 Michael Neale <michael.neale(a)gmail.com>:
> >> >>> >>> >> That does sounds pretty cool. Would be nice if the lucene
> >> >>> >>> >> indexes
> >> >>> >>> >> could scale along with how people will want to use
> infinispan.
> >> >>> >>> >> Probably worth playing with.
> >> >>> >>> >
> >> >>> >>> > Sure, this is the goal of Łukasz's work; We know compass has
> >> >>> >>> > some good Directories, but we're building our own as one based
> >> >>> >>> > on Infinispan is not yet available.
> >> >>> >>> >
> >> >>> >>> >>
> >> >>> >>> >> Sent from my phone.
> >> >>> >>> >>
> >> >>> >>> >> On 13/09/2009, at 8:37 AM, Jeff Ramsdale
> >> >>> >>> >> <jeff.ramsdale(a)gmail.com>
> >> >>> >>> >> wrote:
> >> >>> >>> >>
> >> >>> >>> >>> I'm afraid I haven't followed the Infinispan-Lucene
> >> >>> >>> >>> implementation
> >> >>> >>> >>> closely, but have you looked at the Compass Project?
> >> >>> >>> >>> (http://www.compass-project.org/overview.html) It provides
> a
> >> >>> >>> >>> simplified interface to Lucene (optional) as well as
> Directory
> >> >>> >>> >>> implementations built on Terracotta, Gigaspaces and
> Coherence.
> >> >>> >>> >>> The
> >> >>> >>> >>> latter, in particular, might be a useful guide for the
> >> >>> >>> >>> Infinispan
> >> >>> >>> >>> implementation. I believe it's mature enough to have solved
> >> >>> >>> >>> many
> >> >>> >>> >>> of
> >> >>> >>> >>> the most difficult problems of implementing Directory on a
> >> >>> >>> >>> distributed
> >> >>> >>> >>> Map.
> >> >>> >>> >>>
> >> >>> >>> >>> If someone has any experience with Compass (particularly
> it's
> >> >>> >>> >>> Directory implementations) I'd be interested in hearing
> about
> >> >>> >>> >>> it...
> >> >>> >>> >>> It's Apache 2.0 licensed, btw.
> >> >>> >>> >>>
> >> >>> >>> >>> -jeff
> >> >>> >>> >>> _______________________________________________
> >> >>> >>> >>> infinispan-dev mailing list
> >> >>> >>> >>> infinispan-dev(a)lists.jboss.org
> >> >>> >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >>> >>> >> _______________________________________________
> >> >>> >>> >> infinispan-dev mailing list
> >> >>> >>> >> infinispan-dev(a)lists.jboss.org
> >> >>> >>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >>> >>> >>
> >> >>> >>> >
> >> >>> >>> > _______________________________________________
> >> >>> >>> > infinispan-dev mailing list
> >> >>> >>> > infinispan-dev(a)lists.jboss.org
> >> >>> >>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >>> >>>
> >> >>> >>> _______________________________________________
> >> >>> >>> infinispan-dev mailing list
> >> >>> >>> infinispan-dev(a)lists.jboss.org
> >> >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> >>> >
> >> >>> >
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >
> >
>
15 years, 3 months
[infinispan-dev] [HSearch] DSL for Lucene queries (was: Re: Query module new
by Navin Surtani
Incase this email didn't go to the full dev-list. I got it as a
separate thread so forwarding on.
Begin forwarded message:
> From: johng.sst(a)gmail.com
> Date: 25 September 2009 16:00:53 BST
> To: Navin Surtani <nsurtani(a)redhat.com>
> Subject: Re: Re: [hibernate-dev] [infinispan-dev] [HSearch] DSL for
> Lucene queries (was: Re: Query module new
>
> All,
>
> I think Hardy's original push back came from the first pass' use of
> the decorator pattern to try to come up with a DSL. That really
> isn't much better than knowing the API. The alternate is to come up
> with a more natural language implementation but that leads to
> parsers, lexers, etc... I'm not saying it's not worth while but it
> may be a lot of work.
>
> John Griffin
>
> On Sep 25, 2009 8:12am, Navin Surtani <nsurtani(a)redhat.com> wrote:
> > Just wanted to get this topic re-started again.
> >
> >
> >
> >
> >
> > Essentially what I think this project/DSL/module/thingy-bob is
> thought
> >
> > to become: -
> >
> >
> >
> > A simple package where a user can build Lucene queries without
> having
> >
> > to know too much about Lucene itself. If I'm headed down the wrong
> >
> > thought path then just thwack me.
> >
> >
> >
> >
> >
> >
> >
> > On 26 Aug 2009, at 21:08, Hardy Ferentschik wrote:
> >
> >
> >
> > > On Wed, 2009-08-26 at 13:39 +0200, Emmanuel Bernard wrote:
> >
> > >> I've been thinking about a DSL to build Lucene queries in the
> last
> >
> > >> day.
> >
> > >> What do you think of this proposal?
> >
> > >
> >
> > > What do you really gain compared to native Lucene queries?
> >
> >
> >
> > What's gained I believe is the fact that people can build complex
> >
> > lucene queries easier. Currently, it's a bit clunky imo so if we
> >
> > provide a cleaner way to build them it can prove beneficial to any
> >
> > lucene user (myself included for querying on Infinispan).
> >
> >
> >
> > Any other thoughts?
> >
> >
> >
> >
> >
> > > If your API achieves exactly the same as what's possible with
> Lucene
> >
> > > it is just a 'useless' wrapper.
> >
> > >
> >
> > > A wrapper around native Lucene queries would make sense if it
> could
> >
> > > somehow use some of the Hibernate Search specific meta data. As an
> >
> > > extreme example one could generate some meta classes a la JPA2.
> This
> >
> > > way
> >
> > > one could ensure that you can get help with which field names are
> >
> > > available.
> >
> > >
> >
> > > --Hardy
> >
> > >
> >
> > > _______________________________________________
> >
> > > infinispan-dev mailing list
> >
> > > infinispan-dev(a)lists.jboss.org
> >
> > > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> >
> > Navin Surtani
> >
> >
> >
> > Intern Infinispan
> >
> > Intern JBoss Cache Searchable
> >
> >
> >
> > _______________________________________________
> >
> > hibernate-dev mailing list
> >
> > hibernate-dev(a)lists.jboss.org
> >
> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
> >
Navin Surtani
Intern Infinispan
Intern JBoss Cache Searchable
15 years, 3 months
[HSearch] DSL for Lucene queries (was: Re: [infinispan-dev] Query module new API and configurations)
by Emmanuel Bernard
I've been thinking about a DSL to build Lucene queries in the last day.
What do you think of this proposal?
A few remarks:
- it asks the analyzer so that we correctly apply the analyzer on
terms
- it has a few query factory methods
- it contains a few orthogonal operations
- I am not quite satisfied with how boolean is handled, any idea?
Examples
SealedQueryBuilder qb = searchFactory.withEntityAnalyzer(Address.class);
Query luceneQuery =
qb.must(Occurs.MUST)
.add(
qb.boolean(Occurs.Should)
.add( qb.term("city",
"Atlanta").boostedTo(4).createQuery() )
.add( qb.term("address1",
"Peachtree").fuzzy().createQuery() )
)
.add(
qb.from("movingDate",
"200604").to("201201").exclusive().createQuery()
)
.createQuery();
Analyzer choice
queryBuilder.withAnalyzer(Analyzer)
queryBuilder.withEntityAnalyzer(Class<?>)
queryBuilder.basedOnEntityAnalyzer(Class<?>)
.overridesForField(String field, Analyzer)
.overridesForField(String field, Analyzer)
.build() //sucky name
returns a SealedQueryBuilder //sucky name
SealedQueryBuilder contains the factory methods
Factory methods
Hosted onSealedQueryBuilder
.term(String field, String text) //define a new query
.term(String field, String text) //define a new query
.ignoreAnalyzer() //ignore the analyzer, optional
.fuzzy() //API prevent wildcard calls, optional
.threshold() //optional
.prefixLengh() //optional
.term(String field, String value)
.wildcard() //API prevent fuzzy calls, optional
//range query
.from(String field, String text)
.exclusive() //optional
.to(String text)
.exclusive() //optional
.constantScore() //optional, due to constantScoreRangeQuery but
in practice inherited from the common operations
//match all docs
.all()
//phrase query
.phrase(String field)
.ignoreAnalyzer() //ignore the analyzer, optional
.addWord(String text) //at least one
.addWord(String text)
.sentence(String text) //do we need that?
.slop() //optional
//search multiple fields for same value
.searchInMultipleFields()
.onField(String field)
.boostedTo(float) //optional
.ignoreAnalyzer() //optional
.onField(String field)
.forWords(String) //do we need that?
.forWord(String)
Boolean operations
SealedQueryBuilder contains the boolean methods
.boolean(Occurs occurs)
.add( qb.from().to() )
.add( ... )
Works on all queries
.boostedTo()
.constantScore()
.filter(Filter) //filter the current query
.scoreMultipliedByField(field) //FieldScoreQuery +
FunctionQuery?? //Not backed
.createQuery()
Todo
Span*Queries
MultiPhraseQuery - needs to fillup all accepted terms
FieldScoreQuery
ValueSourceQuery
FuzzyLikeThis
MoreLikeThis
On 25 août 09, at 16:43, Manik Surtani wrote:
>
> On 25 Aug 2009, at 13:34, Emmanuel Bernard wrote:
>
>>
>> On 25 août 09, at 14:27, Manik Surtani wrote:
>>
>>> A DSL would work, but I'd rather not define our own language here.
>>> Which is why I asked for a standard. Perhaps something based on
>>> SQL/
>>> JPA-QL? Or are you thinking DSL specific to Lucene - which could
>>> be used by any/all of {Lucene, Hibernate Search, Infinispan}? In
>>> which case the DSL should ideally be a Lucene project.
>>
>> Yes I was thinking about a DSL used for Hibernate Search and maybe
>> all
>> of Lucene if the HS integration benefits offer no value towards
>> simplicity (but I think i can offer value).
>
>
> Ok, this should be interesting. Lets chat about this some more - have
> you drafted any thoughts around this DSL somewhere?
15 years, 3 months
Delivery reports about your e-mail
by The Post Office
Your message was undeliverable due to the following reason(s):
Your message was not delivered because the destination server was
not reachable within the allowed queue period. The amount of time
a message is queued before it is returned depends on local configura-
tion parameters.
Most likely there is a network problem that prevented delivery, but
it is also possible that the computer is turned off, or does not
have a mail system running right now.
Your message could not be delivered within 3 days:
Host 136.250.235.121 is not responding.
The following recipients could not receive this message:
<hibernate-dev(a)lists.jboss.org>
Please reply to postmaster(a)lists.jboss.org
if you feel this message to be in error.
15 years, 3 months