[infinispan-issues] [JBoss JIRA] (ISPN-5452) Query Execution using Hibernate Search slow for large volume data

Saturday, 9 May 2015

     [
https://issues.jboss.org/browse/ISPN-5452?page=com.atlassian.jira.plugin....
]

Sanne Grinovero updated ISPN-5452:
----------------------------------
    Steps to Reproduce: 
1. Create a Batch job to Create 240 million entries in Infinispan with all entries
Indexed

2. Execute the following query through hotrod remote client
{code:java}String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI
from com.subex.spark.common.distributedcaching.data.Subscriber where
PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and SUBSCRIBER_TYPE =
0";  
QueryFactory qf = Search.getQueryFactory(distcacheclient.getRemoteCache());
RemoteQuery remoteQuery = new
RemoteQuery(qf,(RemoteCacheImpl)distcacheclient.getRemoteCache(),
distcacheclient.getSerializationContext(), queryString, 0, (int)numOfRecords);
List<Subscriber> list = remoteQuery.list();{code}

3. Following Query was executed in Embedded mode
{code:java}String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI 
from com.subex.spark.common.distributedcaching.data.Subscriber where " +
"PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and
SUBSCRIBER_TYPE = 0"; 
QueryFactory qf =
org.infinispan.query.Search.getQueryFactory(ispnCacheServer.getCache("SUBSCRIBER"));
Query query =
qf.from("com.subex.spark.common.distributedcaching.data.Subscriber").setProjection("ACCOUNT_ID","ID","PRODUCT_TYPE","FIRST_CALL","ID","IMEI").
			having("PHONE_NUMBER").eq(prefix+
i).and().having("STATUS").in(1,2).and().having("SUBSCRIBER_TYPE").eq(0).toBuilder().build();

List<Subscriber> list = query.list();
rs = query.list().iterator();{code}

4. Hibernate Search Query 

{code}QueryContextBuilder queryBuilder =
searchSession.getSearchFactory().buildQueryBuilder();
Query query =
queryBuilder.forEntity(Subscriber.class).get().keyword().onField("ID").matching(Long.parseLong(prefix+i)).createQuery();{code}

org.hibernate.Query hibernateQuery=searchSession.createFullTextQuery(query,
Subscriber.class).setProjection("ACCOUNT_ID","ACCOUNT_NAME","CONNECTION_TYPE","CONTACT_PHONE_NUMBER","CURRENT_BALANCE","CUST_ALERT_CONTACT_NUMBER","CUST_ALERT_EMAIL_ID","DATE_OF_BIRTH","DEALER_NAME","DS_NAME","FILE_NAME","FIRST_CALL","GROUPS","PVN","HOME_PHONE_NUMBER","ID","ID_NUMBER","IMEI","IMSI","IS_UPDATE","MCN1","MCN2","MODIFIED_DATE","NETWORK_ID","NOTIFICATION_GROUPS","OFFICE_PHONE_NUMBER","OPTIONAL_FIELD_1","OPTIONAL_FIELD_10","OPTIONAL_FIELD_11","OPTIONAL_FIELD_12","OPTIONAL_FIELD_13","OPTIONAL_FIELD_14","OPTIONAL_FIELD_15","OPTIONAL_FIELD_2","OPTIONAL_FIELD_3","OPTIONAL_FIELD_4","OPTIONAL_FIELD_5","OPTIONAL_FIELD_6","OPTIONAL_FIELD_7","OPTIONAL_FIELD_8","OPTIONAL_FIELD_9","PHONE_NUMBER","PRODUCT_TYPE","QOS","SERVICES","SERVICE_NUMBER_TYPE","SSID","STATUS","SUBSCRIBER_DOA","SUBSCRIBER_TYPE","SUBSCRIBER_UID","SUBSCRIBER_UID_DOA");
{code}
This contains the list of all fields in the Cache all of which are indexed.

  was:
1. Create a Batch job to Create 240 million entries in Infinispan with all entries
Indexed

2. Execute the following query through hotrod remote client
{code:java}String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI
from com.subex.spark.common.distributedcaching.data.Subscriber where
PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and SUBSCRIBER_TYPE =
0";  
QueryFactory qf = Search.getQueryFactory(distcacheclient.getRemoteCache());
RemoteQuery remoteQuery = new
RemoteQuery(qf,(RemoteCacheImpl)distcacheclient.getRemoteCache(),
distcacheclient.getSerializationContext(), queryString, 0, (int)numOfRecords);
List<Subscriber> list = remoteQuery.list();{code}

3. Following Query was executed in Embedded mode
{code:java}String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI 
from com.subex.spark.common.distributedcaching.data.Subscriber where " +
"PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and
SUBSCRIBER_TYPE = 0"; 
QueryFactory qf =
org.infinispan.query.Search.getQueryFactory(ispnCacheServer.getCache("SUBSCRIBER"));
Query query =
qf.from("com.subex.spark.common.distributedcaching.data.Subscriber").setProjection("ACCOUNT_ID","ID","PRODUCT_TYPE","FIRST_CALL","ID","IMEI").
			having("PHONE_NUMBER").eq(prefix+
i).and().having("STATUS").in(1,2).and().having("SUBSCRIBER_TYPE").eq(0).toBuilder().build();

List<Subscriber> list = query.list();
rs = query.list().iterator();{code}

4. Hibernate Search Query 

{code}QueryContextBuilder queryBuilder =
searchSession.getSearchFactory().buildQueryBuilder();
Query query =
queryBuilder.forEntity(Subscriber.class).get().keyword().onField("ID").matching(Long.parseLong(prefix+i)).createQuery();{code}

5.
{code}org.hibernate.Query hibernateQuery=searchSession.createFullTextQuery(query,
Subscriber.class).setProjection("ACCOUNT_ID","ACCOUNT_NAME","CONNECTION_TYPE","CONTACT_PHONE_NUMBER","CURRENT_BALANCE","CUST_ALERT_CONTACT_NUMBER","CUST_ALERT_EMAIL_ID","DATE_OF_BIRTH","DEALER_NAME","DS_NAME","FILE_NAME","FIRST_CALL","GROUPS","PVN","HOME_PHONE_NUMBER","ID","ID_NUMBER","IMEI","IMSI","IS_UPDATE","MCN1","MCN2","MODIFIED_DATE","NETWORK_ID","NOTIFICATION_GROUPS","OFFICE_PHONE_NUMBER","OPTIONAL_FIELD_1","OPTIONAL_FIELD_10","OPTIONAL_FIELD_11","OPTIONAL_FIELD_12","OPTIONAL_FIELD_13","OPTIONAL_FIELD_14","OPTIONAL_FIELD_15","OPTIONAL_FIELD_2","OPTIONAL_FIELD_3","OPTIONAL_FIELD_4","OPTIONAL_FIELD_5","OPTIONAL_FIELD_6","OPTIONAL_FIELD_7","OPTIONAL_FIELD_8","OPTIONAL_FIELD_9","PHONE_NUMBER","PRODUCT_TYPE","QOS","SERVICES","SERVICE_NUMBER_TYPE","SSID","STATUS","SUBSCRIBER_DOA","SUBSCRIBER_TYPE","SUBSCRIBER_UID","SUBSCRIBER_UID_DOA");
{code}
This contains the list of all fields in the Cache all of which are indexed.

...
 Query Execution using Hibernate Search slow for large volume data
 -----------------------------------------------------------------

                 Key: ISPN-5452
                 URL: https://issues.jboss.org/browse/ISPN-5452
             Project: Infinispan
          Issue Type: Bug
          Components: Configuration, Remote Querying
    Affects Versions: 7.2.1.Final
         Environment: Linux
            Reporter: Prashant Thakur

 While benchmarking Infinispan we found that Querying is very slow when compared with
Hibernate Search in Isolation
 Single node of Infinispan  
 Memory allocated 230GB. No GC seen throughout query operation.
 Total required after full GC was 122GB.
 Setup 240 million records each of avg size 330 bytes .
 System has 16 cores and 40 worker threads were allocated at server side.
 With Single Client thread throughput was 900 req/sec in remote and 3k per sec in embedded
more same request with Hibernate Search in Isolation gives throughput of 14000 req/sec.
 For 50 threads of clients the throughput was limited to 15k req/sec while hibernate
search gives 80k req/sec for 10 threads. 

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] (ISPN-5452) Query Execution using Hibernate Search slow for large volume data