[hibernate-dev] RE: HSearch: Using sharding and avoiding query on multiple shards

John Griffin jgriffin at thebluezone.net
Fri Aug 1 00:22:47 EDT 2008


Performance is ALWAYS an issue. 

But am I correct in assuming that the performance increase would be
approximately inversely proportional to the number of shards queried? 

My point is that the amount of work required to implement this should be
worth what we think the expected gain will be.

John G.
http://thediningphilosopher.blogspot.com

 
-----Original Message-----
From: hibernate-dev-bounces at lists.jboss.org
[mailto:hibernate-dev-bounces at lists.jboss.org] On Behalf Of
hibernate-dev-request at lists.jboss.org
Sent: Thursday, July 31, 2008 10:00 AM
To: hibernate-dev at lists.jboss.org
Subject: hibernate-dev Digest, Vol 25, Issue 28

Send hibernate-dev mailing list submissions to
	hibernate-dev at lists.jboss.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://lists.jboss.org/mailman/listinfo/hibernate-dev
or, via email, send a message with subject or body 'help' to
	hibernate-dev-request at lists.jboss.org

You can reach the person managing the list at
	hibernate-dev-owner at lists.jboss.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of hibernate-dev digest..."


Today's Topics:

   1. HSearch: Using sharding and avoiding query on	multiple shards
      (Emmanuel Bernard)
   2. Re: HSearch: Using sharding and avoiding query on	multiple
      shards (Sanne Grinovero)


----------------------------------------------------------------------

Message: 1
Date: Wed, 30 Jul 2008 20:36:13 -0400
From: Emmanuel Bernard <emmanuel at hibernate.org>
Subject: [hibernate-dev] HSearch: Using sharding and avoiding query on
	multiple shards
To: hibernate-dev at lists.jboss.org, Aaron Walker
	<a.walker at base2services.com>
Message-ID: <165DCC1F-BE42-49BA-8537-C8C41F834677 at hibernate.org>
Content-Type: text/plain; charset="us-ascii"

Today, in Hibernate Search, a query is applied on all shards. We use a  
MultiReader to wrap them together.
In some sharding scenario, it makes sense to apply the query on a  
single shard or a subset of the shards.

We could add the following API to IndexShardingStrategy

public DirectoryProvider<?>[]  
getDirectoryProvidersForQuery(o.a.l.search.Query query);

The query could be analyzed by the sharding strategy to detect boolean  
queries on their sharding criteria

//query building
BooleanQuery bQuery = new BooleanQuery();
bQuery.add(regularQuery, Occur.MUST);
bQuery.add( new TermQuery( new Term("distributor.id", "2"),  
Occur.MUST ); //only occurs in shard 1

public DirectoryProvider<?>[]  
getDirectoryProvidersForQuery(o.a.l.search.Query query) {
   if (query instanceof BooleanQuery) {
     List<BooleanClause> clauses =  
BooleanQuery.class.cast(query).clauses
   }
   int restrictedShard;
   boolean isAllMust = true;
   for (BooleanClause clause : clauses) {
     if (clause.getOccur() != Occur.MUST) { isAllMust = false; break; }
     if ( clause.getQuery() instanceof TermQuery ) {
       Term term = TermQuery.class.cast( clause.getQuery() ).getTerm();
       if (term.field().equals("distributor.id")) { restrictedShard =  
Integer.parse(term.text(); }
     }
   }
   if (isAllMust && restrictedShard != null) return new Provider[]  
{ providers[restrictedShard-1] };
   else return providers;
}


That's very flexibile but quite hard to implement correctly especially  
since the query tree structure might not be trivial

The alternative strategy is to have the following API on  
IndexShardingStrategy

public DirectoryProvider<?>[] getDirectoryProvidersForQuery(Object  
hint);

and a corresponding fullTextQuery.setShardHint(Object);

A query could "know it targets shard 2 and pass the information to the  
strategy through a standard language:

fullTextQuery.setShardHint("Sony");

public DirectoryProvider<?>[] getDirectoryProvidersForQuery(Object  
hint) {
   if (String.class.isInstance(hint) &&  
String.class.cast(hint).equals("Sony")) {
     return new Provider[] { providers[2] }
   }
   else {
     return providers;
   }
}

WDYT? How useful would that be?
--
Emmanuel Bernard
http://in.relation.to/Bloggers/Emmanuel | http://blog.emmanuelbernard.com 
  | http://twitter.com/emmanuelbernard
Hibernate Search in Action (http://is.gd/Dl1)

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.jboss.org/pipermail/hibernate-dev/attachments/20080730/69df77df
/attachment-0001.html

------------------------------

Message: 2
Date: Thu, 31 Jul 2008 15:22:52 +0200
From: "Sanne Grinovero" <sanne.grinovero at gmail.com>
Subject: Re: [hibernate-dev] HSearch: Using sharding and avoiding
	query on	multiple shards
To: "Emmanuel Bernard" <emmanuel at hibernate.org>
Cc: hibernate-dev at lists.jboss.org, Aaron Walker
	<a.walker at base2services.com>
Message-ID:
	<50e5f6250807310622y7075f6c3n9becfccce152ee54 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hello,
the feature is awesome and I know of several real world cases were it
would have been
both useful and would have performed better.

about the API, wouldn't it make more sense to have it look like a filter?

regards,
Sanne

2008/7/31 Emmanuel Bernard <emmanuel at hibernate.org>:
> Today, in Hibernate Search, a query is applied on all shards. We use a
> MultiReader to wrap them together.
> In some sharding scenario, it makes sense to apply the query on a single
> shard or a subset of the shards.
> We could add the following API to IndexShardingStrategy
> public
> DirectoryProvider<?>[] getDirectoryProvidersForQuery(o.a.l.search.Query
> query);
> The query could be analyzed by the sharding strategy to detect boolean
> queries on their sharding criteria
> //query building
> BooleanQuery bQuery = new BooleanQuery();
> bQuery.add(regularQuery, Occur.MUST);
> bQuery.add( new TermQuery( new Term("distributor.id", "2"), Occur.MUST );
> //only occurs in shard 1
> public
> DirectoryProvider<?>[] getDirectoryProvidersForQuery(o.a.l.search.Query
> query) {
>   if (query instanceof BooleanQuery) {
>     List<BooleanClause> clauses = BooleanQuery.class.cast(query).clauses
>   }
>   int restrictedShard;
>   boolean isAllMust = true;
>   for (BooleanClause clause : clauses) {
>     if (clause.getOccur() != Occur.MUST) { isAllMust = false; break; }
>     if ( clause.getQuery() instanceof TermQuery ) {
>       Term term = TermQuery.class.cast( clause.getQuery() ).getTerm();
>       if (term.field().equals("distributor.id")) { restrictedShard =
> Integer.parse(term.text(); }
>     }
>   }
>   if (isAllMust && restrictedShard != null) return new Provider[] {
> providers[restrictedShard-1] };
>   else return providers;
> }
>
> That's very flexibile but quite hard to implement correctly especially
since
> the query tree structure might not be trivial
> The alternative strategy is to have the following API on
> IndexShardingStrategy
> public DirectoryProvider<?>[] getDirectoryProvidersForQuery(Object hint);
> and a corresponding fullTextQuery.setShardHint(Object);
> A query could "know it targets shard 2 and pass the information to the
> strategy through a standard language:
> fullTextQuery.setShardHint("Sony");
> public DirectoryProvider<?>[] getDirectoryProvidersForQuery(Object hint) {
>   if (String.class.isInstance(hint) &&
> String.class.cast(hint).equals("Sony")) {
>     return new Provider[] { providers[2] }
>   }
>   else {
>     return providers;
>   }
> }
> WDYT? How useful would that be?
> --
> Emmanuel Bernard
> http://in.relation.to/Bloggers/Emmanuel | http://blog.emmanuelbernard.com
| http://twitter.com/emmanuelbernard
> Hibernate Search in Action (http://is.gd/Dl1)
>
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>
>


------------------------------

_______________________________________________
hibernate-dev mailing list
hibernate-dev at lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


End of hibernate-dev Digest, Vol 25, Issue 28
*********************************************




More information about the hibernate-dev mailing list