[infinispan-issues] [JBoss JIRA] (ISPN-5452) Query Execution using Hibernate Search slow for large volume data

Sat May 9 07:09:46 EDT 2015

     [ https://issues.jboss.org/browse/ISPN-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sanne Grinovero updated ISPN-5452:
----------------------------------
    Steps to Reproduce: 
1. Create a Batch job to Create 240 million entries in Infinispan with all entries Indexed

2. Execute the following query through hotrod remote client
{code:java}String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI from com.subex.spark.common.distributedcaching.data.Subscriber where PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and SUBSCRIBER_TYPE = 0";  
QueryFactory qf = Search.getQueryFactory(distcacheclient.getRemoteCache());
RemoteQuery remoteQuery = new RemoteQuery(qf,(RemoteCacheImpl)distcacheclient.getRemoteCache(), distcacheclient.getSerializationContext(), queryString, 0, (int)numOfRecords);
List<Subscriber> list = remoteQuery.list();{code}

3. Following Query was executed in Embedded mode
{code:java}String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI  from com.subex.spark.common.distributedcaching.data.Subscriber where " + "PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and SUBSCRIBER_TYPE = 0"; 
QueryFactory qf = org.infinispan.query.Search.getQueryFactory(ispnCacheServer.getCache("SUBSCRIBER"));
Query query = qf.from("com.subex.spark.common.distributedcaching.data.Subscriber").setProjection("ACCOUNT_ID","ID","PRODUCT_TYPE","FIRST_CALL","ID","IMEI").
			having("PHONE_NUMBER").eq(prefix+ i).and().having("STATUS").in(1,2).and().having("SUBSCRIBER_TYPE").eq(0).toBuilder().build();

List<Subscriber> list = query.list();
rs = query.list().iterator();{code}

4. Hibernate Search Query 

{code}QueryContextBuilder queryBuilder = searchSession.getSearchFactory().buildQueryBuilder();
Query query = queryBuilder.forEntity(Subscriber.class).get().keyword().onField("ID").matching(Long.parseLong(prefix+i)).createQuery();{code}

org.hibernate.Query hibernateQuery=searchSession.createFullTextQuery(query, Subscriber.class).setProjection("ACCOUNT_ID","ACCOUNT_NAME","CONNECTION_TYPE","CONTACT_PHONE_NUMBER","CURRENT_BALANCE","CUST_ALERT_CONTACT_NUMBER","CUST_ALERT_EMAIL_ID","DATE_OF_BIRTH","DEALER_NAME","DS_NAME","FILE_NAME","FIRST_CALL","GROUPS","PVN","HOME_PHONE_NUMBER","ID","ID_NUMBER","IMEI","IMSI","IS_UPDATE","MCN1","MCN2","MODIFIED_DATE","NETWORK_ID","NOTIFICATION_GROUPS","OFFICE_PHONE_NUMBER","OPTIONAL_FIELD_1","OPTIONAL_FIELD_10","OPTIONAL_FIELD_11","OPTIONAL_FIELD_12","OPTIONAL_FIELD_13","OPTIONAL_FIELD_14","OPTIONAL_FIELD_15","OPTIONAL_FIELD_2","OPTIONAL_FIELD_3","OPTIONAL_FIELD_4","OPTIONAL_FIELD_5","OPTIONAL_FIELD_6","OPTIONAL_FIELD_7","OPTIONAL_FIELD_8","OPTIONAL_FIELD_9","PHONE_NUMBER","PRODUCT_TYPE","QOS","SERVICES","SERVICE_NUMBER_TYPE","SSID","STATUS","SUBSCRIBER_DOA","SUBSCRIBER_TYPE","SUBSCRIBER_UID","SUBSCRIBER_UID_DOA"); {code}
This contains the list of all fields in the Cache all of which are indexed.

  was:
1. Create a Batch job to Create 240 million entries in Infinispan with all entries Indexed

2. Execute the following query through hotrod remote client
{code:java}String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI from com.subex.spark.common.distributedcaching.data.Subscriber where PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and SUBSCRIBER_TYPE = 0";  
QueryFactory qf = Search.getQueryFactory(distcacheclient.getRemoteCache());
RemoteQuery remoteQuery = new RemoteQuery(qf,(RemoteCacheImpl)distcacheclient.getRemoteCache(), distcacheclient.getSerializationContext(), queryString, 0, (int)numOfRecords);
List<Subscriber> list = remoteQuery.list();{code}

3. Following Query was executed in Embedded mode
{code:java}String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI  from com.subex.spark.common.distributedcaching.data.Subscriber where " + "PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and SUBSCRIBER_TYPE = 0"; 
QueryFactory qf = org.infinispan.query.Search.getQueryFactory(ispnCacheServer.getCache("SUBSCRIBER"));
Query query = qf.from("com.subex.spark.common.distributedcaching.data.Subscriber").setProjection("ACCOUNT_ID","ID","PRODUCT_TYPE","FIRST_CALL","ID","IMEI").
			having("PHONE_NUMBER").eq(prefix+ i).and().having("STATUS").in(1,2).and().having("SUBSCRIBER_TYPE").eq(0).toBuilder().build();

List<Subscriber> list = query.list();
rs = query.list().iterator();{code}

4. Hibernate Search Query 

{code}QueryContextBuilder queryBuilder = searchSession.getSearchFactory().buildQueryBuilder();
Query query = queryBuilder.forEntity(Subscriber.class).get().keyword().onField("ID").matching(Long.parseLong(prefix+i)).createQuery();{code}

5.
{code}org.hibernate.Query hibernateQuery=searchSession.createFullTextQuery(query, Subscriber.class).setProjection("ACCOUNT_ID","ACCOUNT_NAME","CONNECTION_TYPE","CONTACT_PHONE_NUMBER","CURRENT_BALANCE","CUST_ALERT_CONTACT_NUMBER","CUST_ALERT_EMAIL_ID","DATE_OF_BIRTH","DEALER_NAME","DS_NAME","FILE_NAME","FIRST_CALL","GROUPS","PVN","HOME_PHONE_NUMBER","ID","ID_NUMBER","IMEI","IMSI","IS_UPDATE","MCN1","MCN2","MODIFIED_DATE","NETWORK_ID","NOTIFICATION_GROUPS","OFFICE_PHONE_NUMBER","OPTIONAL_FIELD_1","OPTIONAL_FIELD_10","OPTIONAL_FIELD_11","OPTIONAL_FIELD_12","OPTIONAL_FIELD_13","OPTIONAL_FIELD_14","OPTIONAL_FIELD_15","OPTIONAL_FIELD_2","OPTIONAL_FIELD_3","OPTIONAL_FIELD_4","OPTIONAL_FIELD_5","OPTIONAL_FIELD_6","OPTIONAL_FIELD_7","OPTIONAL_FIELD_8","OPTIONAL_FIELD_9","PHONE_NUMBER","PRODUCT_TYPE","QOS","SERVICES","SERVICE_NUMBER_TYPE","SSID","STATUS","SUBSCRIBER_DOA","SUBSCRIBER_TYPE","SUBSCRIBER_UID","SUBSCRIBER_UID_DOA"); {code}
This contains the list of all fields in the Cache all of which are indexed.

> Query Execution using Hibernate Search slow for large volume data
> -----------------------------------------------------------------
>
>                 Key: ISPN-5452
>                 URL: https://issues.jboss.org/browse/ISPN-5452
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Configuration, Remote Querying
>    Affects Versions: 7.2.1.Final
>         Environment: Linux
>            Reporter: Prashant Thakur
>
> While benchmarking Infinispan we found that Querying is very slow when compared with Hibernate Search in Isolation
> Single node of Infinispan  
> Memory allocated 230GB. No GC seen throughout query operation.
> Total required after full GC was 122GB.
> Setup 240 million records each of avg size 330 bytes .
> System has 16 cores and 40 worker threads were allocated at server side.
> With Single Client thread throughput was 900 req/sec in remote and 3k per sec in embedded more same request with Hibernate Search in Isolation gives throughput of 14000 req/sec.
> For 50 threads of clients the throughput was limited to 15k req/sec while hibernate search gives 80k req/sec for 10 threads.

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)