[infinispan-dev] Flexible indexing - an idea

Michael Neale michael.neale at gmail.com
Tue Sep 29 06:43:29 EDT 2009


yes, I think you may be right.

Well there are kind of 2 cases, both related to taking in data from
outside (ie non pojo):

1) So in the REST interface, currently when adding data to infinispan,
we just grab it all as a byte[], and get the content-type from the
header. So basically we store byte[], media/mime type and (I think)
date - all useful for serving back up later on...
- in this case, the form of the pojo isn't that helpful for indexing
(eg if the byte[] is json would be nice to know).
Of course, we could just have some pojos to represent the media types
we care about - and then put those into the cache (with the requisite
info).

2) Heterogenous data - lots of objects of different types, yet want it
to kind of appear the same for indexing purposes.


But I think you are right, can address this via other means (and the
appropriate annotations).



On Tue, Sep 29, 2009 at 7:05 PM, Manik Surtani <manik at jboss.org> wrote:
>
> On 29 Sep 2009, at 09:47, Sanne Grinovero wrote:
>
>> IMHO even having just a couple (String mime-type, byte[] mp3)
>> makes up a good POJO, and makes it easy to add more info
>> you'll likely need in future.
>
> Right, this is what I think as well.  So in the case of web caching,
> you may have a byte[] and some metadata (K/V pairs), but what you
> would cache is probably a custom object of yours- something like:
>
> MIMEObject {
>        @Indexed String mimeType;
>        byte[] content;
> }
>
> I guess the tricky bit would be representing an arbitrary-length
> dictionary of metadata as indexable fields?  Is this what you are
> getting at, Mic?
>
>>
>> 2009/9/29 Michael Neale <michael.neale at gmail.com>:
>>> yes well I was thinking that.
>>>
>>> But in the case of web caching, for instance, you tend to have a
>>> byte[] and then a mime type - and that is about it.
>>> If it was uniform content, sure, could have pojos for everything.
>>>
>>> Another case, JSON - don't necessarily want to put that into pojos
>>> (and in any case, a map of maps would be the closest thing - which
>>> isn't really a pojo in that sense anyway). Say if one was building a
>>> distributed database *cough* *cough* ;)
>>>
>>> Yes this would be totally transparent - its only for certain things
>>> (maybe let the user hook into it, but they shouldn't need to worry).
>>>
>>>
>>>
>>>
>>> On Tue, Sep 29, 2009 at 6:08 PM, Emmanuel Bernard
>>> <emmanuel at hibernate.org> wrote:
>>>> Question.
>>>> Why don't you create a MP3 file and populate it with your metadata
>>>> and
>>>> the byte[] before putting it in the cache?
>>>> ie your app is responsible for POJOifying a MP3. Everything's a POJO
>>>> or is lame these days.
>>>>
>>>> If I understand your idea though, I think it could have merit, to do
>>>> this transformation internally in HSearch or JBoss Cache, ie to
>>>> make.
>>>> But it has some drawbacks:
>>>>  - it must be 100% API transparent to the user otherwise that's
>>>> hacky
>>>>  - by hiding the POJO aspect, you hide the fields a user can query.
>>>> It has to read your doc or check this interception layer to find out
>>>> that MP3 has a bpm field
>>>>
>>>> Emmanuel
>>>>
>>>> On 29 sept. 09, at 09:50, Michael Neale wrote:
>>>>
>>>>> Hi All.
>>>>>
>>>>> I have been looking over the Infinispan query module by Navin.
>>>>>
>>>>> As this is built on Hibernate Search - (and correct me if wrong)
>>>>> the
>>>>> indexing happens on pojos fields.
>>>>> This is great for most of the cases, but for my ulterior motive
>>>>> (which
>>>>> I will reveal in another email) I would like to deal with certain
>>>>> object types differently. So lets for instance take a media file
>>>>> like
>>>>> MP3, if I was storing it in the cache -I would know when I go to
>>>>> the
>>>>> index it that I have an instance of something that has extra data I
>>>>> would like to index (ie its not really a pojo): at that point I can
>>>>> extract whatever data out of the "rich" object (meta data, or
>>>>> whatnot)
>>>>> and stick that in the Work object for HS to do its thing on (say
>>>>> based
>>>>> on known MIME types, as one instance).
>>>>>
>>>>> I have tried out something like this, by messing with the
>>>>> QueryInterceptor (and the tests):
>>>>>
>>>>> So I would propose some mechanism to register for the
>>>>> QueryInterceptor
>>>>> a surrogate class for indexing purposes (which while only take
>>>>> effect
>>>>> when it gets asked)  - so when it calls addToIndexes(value, key) -
>>>>> then if if a surrogate is available it will create it, and pass
>>>>> it to
>>>>> searchFactory.getWorker().performWork(new Work(surrogate...
>>>>> etc... -
>>>>> where the surrogate is created based on the value type (as well
>>>>> as its
>>>>> contents) - and thus searching will return what I want (as
>>>>> opposed to
>>>>> nothing).
>>>>>
>>>>> Q1: Does this even make sense? Should I just be pushing a
>>>>> "surrogate"
>>>>> type object into the cache in the first place (doesn't feel right-
>>>>> changing what I would store for the purposes of indexing)?
>>>>> Q2: Is there any way we can query heterogenous caches ie caches
>>>>> like
>>>>> Cache<String, SomeParent> where there are many children of
>>>>> SomeParent.
>>>>> (so in a query we would declare we are only interested in specific
>>>>> instances types? )
>>>>>
>>>>> Thoughts?
>>>>>
>>>>>
>>>>> --
>>>>> Michael D Neale
>>>>> home: www.michaelneale.net
>>>>> blog: michaelneale.blogspot.com
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>
>>>
>>>
>>> --
>>> Michael D Neale
>>> home: www.michaelneale.net
>>> blog: michaelneale.blogspot.com
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>



-- 
Michael D Neale
home: www.michaelneale.net
blog: michaelneale.blogspot.com




More information about the infinispan-dev mailing list