[infinispan-dev] Flexible indexing - an idea

Manik Surtani manik at jboss.org
Tue Sep 29 05:05:42 EDT 2009


On 29 Sep 2009, at 09:47, Sanne Grinovero wrote:

> IMHO even having just a couple (String mime-type, byte[] mp3)
> makes up a good POJO, and makes it easy to add more info
> you'll likely need in future.

Right, this is what I think as well.  So in the case of web caching,  
you may have a byte[] and some metadata (K/V pairs), but what you  
would cache is probably a custom object of yours- something like:

MIMEObject {
	@Indexed String mimeType;
	byte[] content;
}

I guess the tricky bit would be representing an arbitrary-length  
dictionary of metadata as indexable fields?  Is this what you are  
getting at, Mic?

>
> 2009/9/29 Michael Neale <michael.neale at gmail.com>:
>> yes well I was thinking that.
>>
>> But in the case of web caching, for instance, you tend to have a
>> byte[] and then a mime type - and that is about it.
>> If it was uniform content, sure, could have pojos for everything.
>>
>> Another case, JSON - don't necessarily want to put that into pojos
>> (and in any case, a map of maps would be the closest thing - which
>> isn't really a pojo in that sense anyway). Say if one was building a
>> distributed database *cough* *cough* ;)
>>
>> Yes this would be totally transparent - its only for certain things
>> (maybe let the user hook into it, but they shouldn't need to worry).
>>
>>
>>
>>
>> On Tue, Sep 29, 2009 at 6:08 PM, Emmanuel Bernard
>> <emmanuel at hibernate.org> wrote:
>>> Question.
>>> Why don't you create a MP3 file and populate it with your metadata  
>>> and
>>> the byte[] before putting it in the cache?
>>> ie your app is responsible for POJOifying a MP3. Everything's a POJO
>>> or is lame these days.
>>>
>>> If I understand your idea though, I think it could have merit, to do
>>> this transformation internally in HSearch or JBoss Cache, ie to  
>>> make.
>>> But it has some drawbacks:
>>>  - it must be 100% API transparent to the user otherwise that's  
>>> hacky
>>>  - by hiding the POJO aspect, you hide the fields a user can query.
>>> It has to read your doc or check this interception layer to find out
>>> that MP3 has a bpm field
>>>
>>> Emmanuel
>>>
>>> On 29 sept. 09, at 09:50, Michael Neale wrote:
>>>
>>>> Hi All.
>>>>
>>>> I have been looking over the Infinispan query module by Navin.
>>>>
>>>> As this is built on Hibernate Search - (and correct me if wrong)  
>>>> the
>>>> indexing happens on pojos fields.
>>>> This is great for most of the cases, but for my ulterior motive  
>>>> (which
>>>> I will reveal in another email) I would like to deal with certain
>>>> object types differently. So lets for instance take a media file  
>>>> like
>>>> MP3, if I was storing it in the cache -I would know when I go to  
>>>> the
>>>> index it that I have an instance of something that has extra data I
>>>> would like to index (ie its not really a pojo): at that point I can
>>>> extract whatever data out of the "rich" object (meta data, or  
>>>> whatnot)
>>>> and stick that in the Work object for HS to do its thing on (say  
>>>> based
>>>> on known MIME types, as one instance).
>>>>
>>>> I have tried out something like this, by messing with the
>>>> QueryInterceptor (and the tests):
>>>>
>>>> So I would propose some mechanism to register for the  
>>>> QueryInterceptor
>>>> a surrogate class for indexing purposes (which while only take  
>>>> effect
>>>> when it gets asked)  - so when it calls addToIndexes(value, key) -
>>>> then if if a surrogate is available it will create it, and pass  
>>>> it to
>>>> searchFactory.getWorker().performWork(new Work(surrogate...  
>>>> etc... -
>>>> where the surrogate is created based on the value type (as well  
>>>> as its
>>>> contents) - and thus searching will return what I want (as  
>>>> opposed to
>>>> nothing).
>>>>
>>>> Q1: Does this even make sense? Should I just be pushing a  
>>>> "surrogate"
>>>> type object into the cache in the first place (doesn't feel right-
>>>> changing what I would store for the purposes of indexing)?
>>>> Q2: Is there any way we can query heterogenous caches ie caches  
>>>> like
>>>> Cache<String, SomeParent> where there are many children of  
>>>> SomeParent.
>>>> (so in a query we would declare we are only interested in specific
>>>> instances types? )
>>>>
>>>> Thoughts?
>>>>
>>>>
>>>> --
>>>> Michael D Neale
>>>> home: www.michaelneale.net
>>>> blog: michaelneale.blogspot.com
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>
>>
>>
>> --
>> Michael D Neale
>> home: www.michaelneale.net
>> blog: michaelneale.blogspot.com
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org







More information about the infinispan-dev mailing list