Hi,
there is an interesting research around similarity search at my university driven by David
Novák (CC-ed). If anyone interested, see [1][2][3].
Shortly: they basically achieved similarity search on any data (images, songs, etc...) by
creating some sort of custom index, that stores a "similarity vector" for each
object in the database. This index can solve queries like "give me the most similar
images to this example". So why am I posting this here?
The architecture is designed on top of Infinispan and they want to use it to speed it up.
Basically, they would like to distribute the entries across the cluster, each node would
have the similarity index of its entries. Then, when a query comes, it would be
distributed to all the nodes, custom search would be performed on the node's indexes
and the result returned. This is approximately what Index.LOCAL and ClusteredQuery could
do.
The difference is that the indexing and searching mechanism must be custom. So I wanted to
ask what do you think about implementing such a feature to Infinispan. I was thinking
about somehow extracting general API for indexing/searching, then e.g. our Lucene search
would become its implementation.
I would be happy to take this as a contribution, since I find this extremely interesting
topic and also create a diploma thesis out of this.
So here are some questions:
1) Is it doable?
2) Do we want this feature?
3) How to design it/where to start?
Any input is more then welcome :)
Cheers,
Jiri
[1]
https://drive.google.com/file/d/0B4sztQSfpi3rRlJBQjJHMkR2LXc/view
[2]
https://drive.google.com/file/d/0B4sztQSfpi3rU2p2MV9jRE9iTUk/view
[3]
https://drive.google.com/file/d/0B4sztQSfpi3rZUpld24ydzJNclk/view