[infinispan-dev] Clustered queries and custom indexes

Jiri Holusa jholusa at redhat.com
Mon Dec 15 06:35:53 EST 2014


Hi,

there is an interesting research around similarity search at my university driven by David Novák (CC-ed). If anyone interested, see [1][2][3]. 

Shortly: they basically achieved similarity search on any data (images, songs, etc...) by creating some sort of custom index, that stores a "similarity vector" for each object in the database. This index can solve queries like "give me the most similar images to this example". So why am I posting this here?

The architecture is designed on top of Infinispan and they want to use it to speed it up. Basically, they would like to distribute the entries across the cluster, each node would have the similarity index of its entries. Then, when a query comes, it would be distributed to all the nodes, custom search would be performed on the node's indexes and the result returned. This is approximately what Index.LOCAL and ClusteredQuery could do.

The difference is that the indexing and searching mechanism must be custom. So I wanted to ask what do you think about implementing such a feature to Infinispan. I was thinking about somehow extracting general API for indexing/searching, then e.g. our Lucene search would become its implementation. 

I would be happy to take this as a contribution, since I find this extremely interesting topic and also create a diploma thesis out of this. 
So here are some questions:
1) Is it doable?
2) Do we want this feature?
3) How to design it/where to start?

Any input is more then welcome :)

Cheers,
Jiri

[1] https://drive.google.com/file/d/0B4sztQSfpi3rRlJBQjJHMkR2LXc/view
[2] https://drive.google.com/file/d/0B4sztQSfpi3rU2p2MV9jRE9iTUk/view
[3] https://drive.google.com/file/d/0B4sztQSfpi3rZUpld24ydzJNclk/view



More information about the infinispan-dev mailing list