On Thu, Jul 12, 2012 at 2:15 AM, Sanne Grinovero <sanne(a)hibernate.org>wrote:
On 10 July 2012 12:48, Dan Berindei <dan.berindei(a)gmail.com>
wrote:
> On Tue, Jul 10, 2012 at 2:15 PM, Galder Zamarreño <galder(a)redhat.com>
wrote:
>>
>>
>> On Jul 9, 2012, at 9:52 AM, Mircea Markus wrote:
>>
>> > On 06/07/2012 22:48, Sanne Grinovero wrote:
>> >> On 6 July 2012 15:06, Galder Zamarreño <galder(a)redhat.com>
wrote:
>> >>> On Jun 26, 2012, at 6:13 PM, Sanne Grinovero wrote:
>> >>>
>> >>>> Imagine I have a value object which needs to be stored in
Infinispan:
>> >>>>
>> >>>> class Person {
>> >>>> final String nationality = ...
>> >>>> final String fullName = ...
>> >>>> [constructor]
>> >>>> }
>> >>>>
>> >>>> And now let's assume that - as you could expect - most
Person
>> >>>> instances have the same value for the nationality String, but
a
>> >>>> different name.
>> >>>>
>> >>>> I want to define a custom Externalizer for my type, but the
current
>> >>>> Externalizer API doesn't allow to refer to some common
application
>> >>>> context, which might be extremely useful to deserialize this
Person
>> >>>> instance:
>> >>>>
>> >>>> we could avoid filling the memory of my Grid by having
multiple
>> >>>> copies
>> >>>> of the nationality String repeated all over, when a String [1]
could
>> >>>> be reused.
>> >>>>
>> >>>> Would it be a good idea to have the Externalizer instances have
an
>> >>>> initialization phase receiving a ComponentRegistry, so I could
look
>> >>>> up
>> >>>> some custom service to de-duplicate or otherwise optimize my
>> >>>> in-memory
>> >>>> data representation?
>> >>>> Personally I'd prefer to receive it injected via the
constructor so
>> >>>> that I could use a final field when my custom Externalizer is
>> >>>> constructed.
>> >>>>
>> >>>> This is OGM related.
>> >>> ^ Makes sense, but only solves one part of the problem.
>> >>>
>> >>> String is probably a bad example here [as you already said, due to
1],
>> >>> but a better example is if you have a Nationality class with
country name,
>> >>> timezone…etc in it.
>> >>>
>> >>> My point is, your suggestion works for nodes to which data is
>> >>> replicated to, but in the original node where you've created
100
Person
>> >>> instances for Spanish nationaility, you'd still potentially
have 100
>> >>> instances.
>> >>>
>> >>> Did you have anything in mind for this?
>> >> That's where the ComponentRegistry's role kicks in: the user
>> >> application created these object instances before storing them in the
>> >> original node, and if it is a bit cleverly designed it will have
>> >> something like a Map of immutable Nationality instances, so that
every
>> >> time it needs Spanish it looks up the same instance.
>> >>
>> >> Consequentially the custom externalizer implementation needs access
to
>> >> the same service instance as used by the application, so that it can
>> >> make use of the same pool rather than having to create his own pool
>> >> instance: the essence of my proposal is really to have the user
>> >> application and the Externalizer framework to share the same Factory.
>> >>
>> >>> Btw, not sure about the need of ComponentRegistry here. IMO, this
kind
>> >>> of feature should work for Hot Rod clients too, where
Externalizers
might be
>> >>> used in the future, and where there's no ComponentRegistry
(unless
it's a
>> >>> RemoteCacheStore...)
>> >> It doesn't need to be literally a ComponentRegistry interface
>> >> implementation, just anything which allows the Externalizer to be
>> >> initialized using some externally provided service as in the above
>> >> example.
>> >>
>> >> This optimisation should have no functional impact but just an
>> >> optionally implementable trick which saves some memory.. so if we can
>> >> think of a way to do the same for Hot Rod that's very cool but
doesn't
>> >> necessarily have to use the same components and (internal)
interfaces.
>> >>
>> >> I'm thinking of this as a similar "optionality" as we
have when
>> >> choosing between Serializable vs. custom Externalizers : people can
>> >> plug one in if they know what they're doing (like these instances
>> >> should definitely be immutable) but everything just works fine if you
>> >> don't.
>> >> I'm not really sure if there is a wide range of applications, nor
have
>> >> any idea of the amount of memory it could save in practice... just
and
>> >> idea I wanted to sketch.
>> > I think there might be quite useful; the flyweight pattern[1] was
>> > created to solve exactly this kind of *existing* problems.
>> > Just as a note, there is a simple, not necessarily nice, workaround
for
>> > this: make the object pool statically accessible (or even better
Enums).
>>
>> It's wise to avoid static object pools, cos they can lead to classloader
>> leak issues. Enums might be better...
>>
>
> Sanne already mentioned in another email that OGM doesn't know the actual
> data type at compile time, so switching to an enum is definitely not an
> option.
+1, thanks.
> Although it might work well enough when you know the fields ahead of
time, a
> single static cache does seem a bit simplistic for the general case. I
think
> in general you'd want a cache per field, e.g. so that you can give up on
> caching once there are too many different values for that field.
Not sure what you mean by fields. I'm not intending to specify how
such a component would need to be designed, what I'd like is to be
able to access my application-provided services from a custom
Externalizer implementation. I would then be able to do something
clever, but leaving clever details to what is most suited for the
application, so I don't think Infinispan should try enforce any logic,
just expose the integration points.
I meant a regular Java field, since that's what the Externalizer deals
with. But what I had in mind was a generic Externalizer for user-supplied
classes (registered at runtime), so the externalizer would need get the
field metadata from a central registry and based on the current conditions
it would decide whether to cache the deserialized value or not.
I think we all agree that Infinispan should not be concerned about how
exactly this will be implemented. The discussion seems to be around whether
there really is a need for such a smarter externalizer.
To talk specifics, I wouldn't do this per user-type fields: as
you say
I might have too many, my "cache" would need complex eviction logic.
But I know that some specific fields are all very likely the same;
think at "table name" for example, when storing the field "to which
table name this entry is related to", as column names and relation
roles.. so not the values, but still a good boost and likely more than
halving the memory overhead as for each entry we have more meta-data
stuff than actual user values.
The table name example isn't convincing enough, I think String.intern()
would actually be a great fit here as you don't really need eviction :)
> > [1]
http://en.wikipedia.org/wiki/Flyweight_pattern
Exactly.
>> >> I suspect it might allow me to do some cool things with both OGM and
>> >> Lucene Directoy, as you can re-hidratate complex object graphs from
>> >> different cache entries, reassembling them with direct references...
>> >> dreaming?
>> >>
>> >>>> Cheers,
>> >>>> Sanne
>> >>>>
>> >>>>
>> >>>> 1 - or any immutable object: I'm using String as an example
so
let's
>> >>>> forget about the static String pool optimizations the JVM
might
>> >>>> enable..
>