[infinispan-issues] [JBoss JIRA] (ISPN-5452) Query Execution using Hibernate Search slow for large volume data
Sanne Grinovero (JIRA)
issues at jboss.org
Sat May 9 07:09:46 EDT 2015
[ https://issues.jboss.org/browse/ISPN-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sanne Grinovero updated ISPN-5452:
----------------------------------
Steps to Reproduce:
1. Create a Batch job to Create 240 million entries in Infinispan with all entries Indexed
2. Execute the following query through hotrod remote client
{code:java}String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI from com.subex.spark.common.distributedcaching.data.Subscriber where PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and SUBSCRIBER_TYPE = 0";
QueryFactory qf = Search.getQueryFactory(distcacheclient.getRemoteCache());
RemoteQuery remoteQuery = new RemoteQuery(qf,(RemoteCacheImpl)distcacheclient.getRemoteCache(), distcacheclient.getSerializationContext(), queryString, 0, (int)numOfRecords);
List<Subscriber> list = remoteQuery.list();{code}
3. Following Query was executed in Embedded mode
{code:java}String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI from com.subex.spark.common.distributedcaching.data.Subscriber where " + "PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and SUBSCRIBER_TYPE = 0";
QueryFactory qf = org.infinispan.query.Search.getQueryFactory(ispnCacheServer.getCache("SUBSCRIBER"));
Query query = qf.from("com.subex.spark.common.distributedcaching.data.Subscriber").setProjection("ACCOUNT_ID","ID","PRODUCT_TYPE","FIRST_CALL","ID","IMEI").
having("PHONE_NUMBER").eq(prefix+ i).and().having("STATUS").in(1,2).and().having("SUBSCRIBER_TYPE").eq(0).toBuilder().build();
List<Subscriber> list = query.list();
rs = query.list().iterator();{code}
4. Hibernate Search Query
{code}QueryContextBuilder queryBuilder = searchSession.getSearchFactory().buildQueryBuilder();
Query query = queryBuilder.forEntity(Subscriber.class).get().keyword().onField("ID").matching(Long.parseLong(prefix+i)).createQuery();{code}
org.hibernate.Query hibernateQuery=searchSession.createFullTextQuery(query, Subscriber.class).setProjection("ACCOUNT_ID","ACCOUNT_NAME","CONNECTION_TYPE","CONTACT_PHONE_NUMBER","CURRENT_BALANCE","CUST_ALERT_CONTACT_NUMBER","CUST_ALERT_EMAIL_ID","DATE_OF_BIRTH","DEALER_NAME","DS_NAME","FILE_NAME","FIRST_CALL","GROUPS","PVN","HOME_PHONE_NUMBER","ID","ID_NUMBER","IMEI","IMSI","IS_UPDATE","MCN1","MCN2","MODIFIED_DATE","NETWORK_ID","NOTIFICATION_GROUPS","OFFICE_PHONE_NUMBER","OPTIONAL_FIELD_1","OPTIONAL_FIELD_10","OPTIONAL_FIELD_11","OPTIONAL_FIELD_12","OPTIONAL_FIELD_13","OPTIONAL_FIELD_14","OPTIONAL_FIELD_15","OPTIONAL_FIELD_2","OPTIONAL_FIELD_3","OPTIONAL_FIELD_4","OPTIONAL_FIELD_5","OPTIONAL_FIELD_6","OPTIONAL_FIELD_7","OPTIONAL_FIELD_8","OPTIONAL_FIELD_9","PHONE_NUMBER","PRODUCT_TYPE","QOS","SERVICES","SERVICE_NUMBER_TYPE","SSID","STATUS","SUBSCRIBER_DOA","SUBSCRIBER_TYPE","SUBSCRIBER_UID","SUBSCRIBER_UID_DOA"); {code}
This contains the list of all fields in the Cache all of which are indexed.
was:
1. Create a Batch job to Create 240 million entries in Infinispan with all entries Indexed
2. Execute the following query through hotrod remote client
{code:java}String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI from com.subex.spark.common.distributedcaching.data.Subscriber where PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and SUBSCRIBER_TYPE = 0";
QueryFactory qf = Search.getQueryFactory(distcacheclient.getRemoteCache());
RemoteQuery remoteQuery = new RemoteQuery(qf,(RemoteCacheImpl)distcacheclient.getRemoteCache(), distcacheclient.getSerializationContext(), queryString, 0, (int)numOfRecords);
List<Subscriber> list = remoteQuery.list();{code}
3. Following Query was executed in Embedded mode
{code:java}String queryString= "SELECT ACCOUNT_ID,ID,PRODUCT_TYPE,FIRST_CALL,ID,IMEI from com.subex.spark.common.distributedcaching.data.Subscriber where " + "PHONE_NUMBER='" +prefix+ i+"' AND STATUS in (1,2) and SUBSCRIBER_TYPE = 0";
QueryFactory qf = org.infinispan.query.Search.getQueryFactory(ispnCacheServer.getCache("SUBSCRIBER"));
Query query = qf.from("com.subex.spark.common.distributedcaching.data.Subscriber").setProjection("ACCOUNT_ID","ID","PRODUCT_TYPE","FIRST_CALL","ID","IMEI").
having("PHONE_NUMBER").eq(prefix+ i).and().having("STATUS").in(1,2).and().having("SUBSCRIBER_TYPE").eq(0).toBuilder().build();
List<Subscriber> list = query.list();
rs = query.list().iterator();{code}
4. Hibernate Search Query
{code}QueryContextBuilder queryBuilder = searchSession.getSearchFactory().buildQueryBuilder();
Query query = queryBuilder.forEntity(Subscriber.class).get().keyword().onField("ID").matching(Long.parseLong(prefix+i)).createQuery();{code}
5.
{code}org.hibernate.Query hibernateQuery=searchSession.createFullTextQuery(query, Subscriber.class).setProjection("ACCOUNT_ID","ACCOUNT_NAME","CONNECTION_TYPE","CONTACT_PHONE_NUMBER","CURRENT_BALANCE","CUST_ALERT_CONTACT_NUMBER","CUST_ALERT_EMAIL_ID","DATE_OF_BIRTH","DEALER_NAME","DS_NAME","FILE_NAME","FIRST_CALL","GROUPS","PVN","HOME_PHONE_NUMBER","ID","ID_NUMBER","IMEI","IMSI","IS_UPDATE","MCN1","MCN2","MODIFIED_DATE","NETWORK_ID","NOTIFICATION_GROUPS","OFFICE_PHONE_NUMBER","OPTIONAL_FIELD_1","OPTIONAL_FIELD_10","OPTIONAL_FIELD_11","OPTIONAL_FIELD_12","OPTIONAL_FIELD_13","OPTIONAL_FIELD_14","OPTIONAL_FIELD_15","OPTIONAL_FIELD_2","OPTIONAL_FIELD_3","OPTIONAL_FIELD_4","OPTIONAL_FIELD_5","OPTIONAL_FIELD_6","OPTIONAL_FIELD_7","OPTIONAL_FIELD_8","OPTIONAL_FIELD_9","PHONE_NUMBER","PRODUCT_TYPE","QOS","SERVICES","SERVICE_NUMBER_TYPE","SSID","STATUS","SUBSCRIBER_DOA","SUBSCRIBER_TYPE","SUBSCRIBER_UID","SUBSCRIBER_UID_DOA"); {code}
This contains the list of all fields in the Cache all of which are indexed.
> Query Execution using Hibernate Search slow for large volume data
> -----------------------------------------------------------------
>
> Key: ISPN-5452
> URL: https://issues.jboss.org/browse/ISPN-5452
> Project: Infinispan
> Issue Type: Bug
> Components: Configuration, Remote Querying
> Affects Versions: 7.2.1.Final
> Environment: Linux
> Reporter: Prashant Thakur
>
> While benchmarking Infinispan we found that Querying is very slow when compared with Hibernate Search in Isolation
> Single node of Infinispan
> Memory allocated 230GB. No GC seen throughout query operation.
> Total required after full GC was 122GB.
> Setup 240 million records each of avg size 330 bytes .
> System has 16 cores and 40 worker threads were allocated at server side.
> With Single Client thread throughput was 900 req/sec in remote and 3k per sec in embedded more same request with Hibernate Search in Isolation gives throughput of 14000 req/sec.
> For 50 threads of clients the throughput was limited to 15k req/sec while hibernate search gives 80k req/sec for 10 threads.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
More information about the infinispan-issues
mailing list