[infinispan-dev] Clustered queries and custom indexes

Monday, 15 December 2014

Hi,

there is an interesting research around similarity search at my university driven by David
Novák (CC-ed). If anyone interested, see [1][2][3]. 

Shortly: they basically achieved similarity search on any data (images, songs, etc...) by
creating some sort of custom index, that stores a "similarity vector" for each
object in the database. This index can solve queries like "give me the most similar
images to this example". So why am I posting this here?

The architecture is designed on top of Infinispan and they want to use it to speed it up.
Basically, they would like to distribute the entries across the cluster, each node would
have the similarity index of its entries. Then, when a query comes, it would be
distributed to all the nodes, custom search would be performed on the node's indexes
and the result returned. This is approximately what Index.LOCAL and ClusteredQuery could
do.

The difference is that the indexing and searching mechanism must be custom. So I wanted to
ask what do you think about implementing such a feature to Infinispan. I was thinking
about somehow extracting general API for indexing/searching, then e.g. our Lucene search
would become its implementation. 

I would be happy to take this as a contribution, since I find this extremely interesting
topic and also create a diploma thesis out of this. 
So here are some questions:
1) Is it doable?
2) Do we want this feature?
3) How to design it/where to start?

Any input is more then welcome :)

Cheers,
Jiri

[1] https://drive.google.com/file/d/0B4sztQSfpi3rRlJBQjJHMkR2LXc/view
[2] https://drive.google.com/file/d/0B4sztQSfpi3rU2p2MV9jRE9iTUk/view
[3] https://drive.google.com/file/d/0B4sztQSfpi3rZUpld24ydzJNclk/view

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-dev] Clustered queries and custom indexes