<div class="gmail_quote">On Thu, Jul 12, 2012 at 2:15 AM, Sanne Grinovero <span dir="ltr"><<a href="mailto:sanne@hibernate.org" target="_blank">sanne@hibernate.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5">On 10 July 2012 12:48, Dan Berindei <<a href="mailto:dan.berindei@gmail.com">dan.berindei@gmail.com</a>> wrote:<br>
> On Tue, Jul 10, 2012 at 2:15 PM, Galder Zamarreño <<a href="mailto:galder@redhat.com">galder@redhat.com</a>> wrote:<br>
>><br>
>><br>
>> On Jul 9, 2012, at 9:52 AM, Mircea Markus wrote:<br>
>><br>
>> > On 06/07/2012 22:48, Sanne Grinovero wrote:<br>
>> >> On 6 July 2012 15:06, Galder Zamarreño <<a href="mailto:galder@redhat.com">galder@redhat.com</a>> wrote:<br>
>> >>> On Jun 26, 2012, at 6:13 PM, Sanne Grinovero wrote:<br>
>> >>><br>
>> >>>> Imagine I have a value object which needs to be stored in Infinispan:<br>
>> >>>><br>
>> >>>> class Person {<br>
>> >>>> final String nationality = ...<br>
>> >>>> final String fullName = ...<br>
>> >>>> [constructor]<br>
>> >>>> }<br>
>> >>>><br>
>> >>>> And now let's assume that - as you could expect - most Person<br>
>> >>>> instances have the same value for the nationality String, but a<br>
>> >>>> different name.<br>
>> >>>><br>
>> >>>> I want to define a custom Externalizer for my type, but the current<br>
>> >>>> Externalizer API doesn't allow to refer to some common application<br>
>> >>>> context, which might be extremely useful to deserialize this Person<br>
>> >>>> instance:<br>
>> >>>><br>
>> >>>> we could avoid filling the memory of my Grid by having multiple<br>
>> >>>> copies<br>
>> >>>> of the nationality String repeated all over, when a String [1] could<br>
>> >>>> be reused.<br>
>> >>>><br>
>> >>>> Would it be a good idea to have the Externalizer instances have an<br>
>> >>>> initialization phase receiving a ComponentRegistry, so I could look<br>
>> >>>> up<br>
>> >>>> some custom service to de-duplicate or otherwise optimize my<br>
>> >>>> in-memory<br>
>> >>>> data representation?<br>
>> >>>> Personally I'd prefer to receive it injected via the constructor so<br>
>> >>>> that I could use a final field when my custom Externalizer is<br>
>> >>>> constructed.<br>
>> >>>><br>
>> >>>> This is OGM related.<br>
>> >>> ^ Makes sense, but only solves one part of the problem.<br>
>> >>><br>
>> >>> String is probably a bad example here [as you already said, due to 1],<br>
>> >>> but a better example is if you have a Nationality class with country name,<br>
>> >>> timezone…etc in it.<br>
>> >>><br>
>> >>> My point is, your suggestion works for nodes to which data is<br>
>> >>> replicated to, but in the original node where you've created 100 Person<br>
>> >>> instances for Spanish nationaility, you'd still potentially have 100<br>
>> >>> instances.<br>
>> >>><br>
>> >>> Did you have anything in mind for this?<br>
>> >> That's where the ComponentRegistry's role kicks in: the user<br>
>> >> application created these object instances before storing them in the<br>
>> >> original node, and if it is a bit cleverly designed it will have<br>
>> >> something like a Map of immutable Nationality instances, so that every<br>
>> >> time it needs Spanish it looks up the same instance.<br>
>> >><br>
>> >> Consequentially the custom externalizer implementation needs access to<br>
>> >> the same service instance as used by the application, so that it can<br>
>> >> make use of the same pool rather than having to create his own pool<br>
>> >> instance: the essence of my proposal is really to have the user<br>
>> >> application and the Externalizer framework to share the same Factory.<br>
>> >><br>
>> >>> Btw, not sure about the need of ComponentRegistry here. IMO, this kind<br>
>> >>> of feature should work for Hot Rod clients too, where Externalizers might be<br>
>> >>> used in the future, and where there's no ComponentRegistry (unless it's a<br>
>> >>> RemoteCacheStore...)<br>
>> >> It doesn't need to be literally a ComponentRegistry interface<br>
>> >> implementation, just anything which allows the Externalizer to be<br>
>> >> initialized using some externally provided service as in the above<br>
>> >> example.<br>
>> >><br>
>> >> This optimisation should have no functional impact but just an<br>
>> >> optionally implementable trick which saves some memory.. so if we can<br>
>> >> think of a way to do the same for Hot Rod that's very cool but doesn't<br>
>> >> necessarily have to use the same components and (internal) interfaces.<br>
>> >><br>
>> >> I'm thinking of this as a similar "optionality" as we have when<br>
>> >> choosing between Serializable vs. custom Externalizers : people can<br>
>> >> plug one in if they know what they're doing (like these instances<br>
>> >> should definitely be immutable) but everything just works fine if you<br>
>> >> don't.<br>
>> >> I'm not really sure if there is a wide range of applications, nor have<br>
>> >> any idea of the amount of memory it could save in practice... just and<br>
>> >> idea I wanted to sketch.<br>
>> > I think there might be quite useful; the flyweight pattern[1] was<br>
>> > created to solve exactly this kind of *existing* problems.<br>
>> > Just as a note, there is a simple, not necessarily nice, workaround for<br>
>> > this: make the object pool statically accessible (or even better Enums).<br>
>><br>
>> It's wise to avoid static object pools, cos they can lead to classloader<br>
>> leak issues. Enums might be better...<br>
>><br>
><br>
> Sanne already mentioned in another email that OGM doesn't know the actual<br>
> data type at compile time, so switching to an enum is definitely not an<br>
> option.<br>
<br>
</div></div>+1, thanks.<br>
<div class="im"><br>
> Although it might work well enough when you know the fields ahead of time, a<br>
> single static cache does seem a bit simplistic for the general case. I think<br>
> in general you'd want a cache per field, e.g. so that you can give up on<br>
> caching once there are too many different values for that field.<br>
<br>
</div>Not sure what you mean by fields. I'm not intending to specify how<br>
such a component would need to be designed, what I'd like is to be<br>
able to access my application-provided services from a custom<br>
Externalizer implementation. I would then be able to do something<br>
clever, but leaving clever details to what is most suited for the<br>
application, so I don't think Infinispan should try enforce any logic,<br>
just expose the integration points.<br>
<br></blockquote><div><br>I meant a regular Java field, since that's what the Externalizer deals with. But what I had in mind was a generic Externalizer for user-supplied classes (registered at runtime), so the externalizer would need get the field metadata from a central registry and based on the current conditions it would decide whether to cache the deserialized value or not.<br>
<br>I think we all agree that Infinispan should not be concerned about how exactly this will be implemented. The discussion seems to be around whether there really is a need for such a smarter externalizer.<br><br> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
To talk specifics, I wouldn't do this per user-type fields: as you say<br>
I might have too many, my "cache" would need complex eviction logic.<br>
But I know that some specific fields are all very likely the same;<br>
think at "table name" for example, when storing the field "to which<br>
table name this entry is related to", as column names and relation<br>
roles.. so not the values, but still a good boost and likely more than<br>
halving the memory overhead as for each entry we have more meta-data<br>
stuff than actual user values.<br>
<br></blockquote><div><br>The table name example isn't convincing enough, I think String.intern() would actually be a great fit here as you don't really need eviction :)<br><br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
>> > [1] <a href="http://en.wikipedia.org/wiki/Flyweight_pattern" target="_blank">http://en.wikipedia.org/wiki/Flyweight_pattern</a><br>
<br>
Exactly.<br>
<div class="im HOEnZb"><br>
>> >> I suspect it might allow me to do some cool things with both OGM and<br>
>> >> Lucene Directoy, as you can re-hidratate complex object graphs from<br>
>> >> different cache entries, reassembling them with direct references...<br>
>> >> dreaming?<br>
>> >><br>
>> >>>> Cheers,<br>
>> >>>> Sanne<br>
>> >>>><br>
>> >>>><br>
>> >>>> 1 - or any immutable object: I'm using String as an example so let's<br>
>> >>>> forget about the static String pool optimizations the JVM might<br>
>> >>>> enable..<br>
><br></div></blockquote></div>