[infinispan-dev] Providing a context for object de-serialization

Fri Jul 13 04:07:31 EDT 2012

On Thu, Jul 12, 2012 at 2:15 AM, Sanne Grinovero <sanne at hibernate.org>wrote:

> On 10 July 2012 12:48, Dan Berindei <dan.berindei at gmail.com> wrote:
> > On Tue, Jul 10, 2012 at 2:15 PM, Galder Zamarreño <galder at redhat.com>
> wrote:
> >>
> >>
> >> On Jul 9, 2012, at 9:52 AM, Mircea Markus wrote:
> >>
> >> > On 06/07/2012 22:48, Sanne Grinovero wrote:
> >> >> On 6 July 2012 15:06, Galder ZamarreÃ±o <galder at redhat.com> wrote:
> >> >>> On Jun 26, 2012, at 6:13 PM, Sanne Grinovero wrote:
> >> >>>
> >> >>>> Imagine I have a value object which needs to be stored in
> Infinispan:
> >> >>>>
> >> >>>> class Person {
> >> >>>>   final String nationality = ...
> >> >>>>   final String fullName = ...
> >> >>>> [constructor]
> >> >>>> }
> >> >>>>
> >> >>>> And now let's assume that - as you could expect - most Person
> >> >>>> instances have the same value for the nationality String, but a
> >> >>>> different name.
> >> >>>>
> >> >>>> I want to define a custom Externalizer for my type, but the current
> >> >>>> Externalizer API doesn't allow to refer to some common application
> >> >>>> context, which might be extremely useful to deserialize this Person
> >> >>>> instance:
> >> >>>>
> >> >>>> we could avoid filling the memory of my Grid by having multiple
> >> >>>> copies
> >> >>>> of the nationality String repeated all over, when a String [1]
> could
> >> >>>> be reused.
> >> >>>>
> >> >>>> Would it be a good idea to have the Externalizer instances have an
> >> >>>> initialization phase receiving a ComponentRegistry, so I could look
> >> >>>> up
> >> >>>> some custom service to de-duplicate or otherwise optimize my
> >> >>>> in-memory
> >> >>>> data representation?
> >> >>>> Personally I'd prefer to receive it injected via the constructor so
> >> >>>> that I could use a final field when my custom Externalizer is
> >> >>>> constructed.
> >> >>>>
> >> >>>> This is OGM related.
> >> >>> ^ Makes sense, but only solves one part of the problem.
> >> >>>
> >> >>> String is probably a bad example here [as you already said, due to
> 1],
> >> >>> but a better example is if you have a Nationality class with
> country name,
> >> >>> timezoneâ€¦etc in it.
> >> >>>
> >> >>> My point is, your suggestion works for nodes to which data is
> >> >>> replicated to, but in the original node where you've created 100
> Person
> >> >>> instances for Spanish nationaility, you'd still potentially have 100
> >> >>> instances.
> >> >>>
> >> >>> Did you have anything in mind for this?
> >> >> That's where the ComponentRegistry's role kicks in: the user
> >> >> application created these object instances before storing them in the
> >> >> original node, and if it is a bit cleverly designed it will have
> >> >> something like a Map of immutable Nationality instances, so that
> every
> >> >> time it needs Spanish it looks up the same instance.
> >> >>
> >> >> Consequentially the custom externalizer implementation needs access
> to
> >> >> the same service instance as used by the application, so that it can
> >> >> make use of the same pool rather than having to create his own pool
> >> >> instance: the essence of my proposal is really to have the user
> >> >> application and the Externalizer framework to share the same Factory.
> >> >>
> >> >>> Btw, not sure about the need of ComponentRegistry here. IMO, this
> kind
> >> >>> of feature should work for Hot Rod clients too, where Externalizers
> might be
> >> >>> used in the future, and where there's no ComponentRegistry (unless
> it's a
> >> >>> RemoteCacheStore...)
> >> >> It doesn't need to be literally a ComponentRegistry interface
> >> >> implementation, just anything which allows the Externalizer to be
> >> >> initialized using some externally provided service as in the above
> >> >> example.
> >> >>
> >> >> This optimisation should have no functional impact but just an
> >> >> optionally implementable trick which saves some memory.. so if we can
> >> >> think of a way to do the same for Hot Rod that's very cool but
> doesn't
> >> >> necessarily have to use the same components and (internal)
> interfaces.
> >> >>
> >> >> I'm thinking of this as a similar "optionality" as we have when
> >> >> choosing between Serializable vs. custom Externalizers : people can
> >> >> plug one in if they know what they're doing (like these instances
> >> >> should definitely be immutable) but everything just works fine if you
> >> >> don't.
> >> >> I'm not really sure if there is a wide range of applications, nor
> have
> >> >> any idea of the amount of memory it could save in practice... just
> and
> >> >> idea I wanted to sketch.
> >> > I think there might be quite useful; the flyweight pattern[1] was
> >> > created to solve exactly this kind of *existing* problems.
> >> > Just as a note, there is a simple, not necessarily nice, workaround
> for
> >> > this: make the object pool statically accessible (or even better
> Enums).
> >>
> >> It's wise to avoid static object pools, cos they can lead to classloader
> >> leak issues. Enums might be better...
> >>
> >
> > Sanne already mentioned in another email that OGM doesn't know the actual
> > data type at compile time, so switching to an enum is definitely not an
> > option.
>
> +1, thanks.
>
> > Although it might work well enough when you know the fields ahead of
> time, a
> > single static cache does seem a bit simplistic for the general case. I
> think
> > in general you'd want a cache per field, e.g. so that you can give up on
> > caching once there are too many different values for that field.
>
> Not sure what you mean by fields. I'm not intending to specify how
> such a component would need to be designed, what I'd like is to be
> able to access my application-provided services from a custom
> Externalizer implementation. I would then be able to do something
> clever, but leaving clever details to what is most suited for the
> application, so I don't think Infinispan should try enforce any logic,
> just expose the integration points.
>
>
I meant a regular Java field, since that's what the Externalizer deals
with. But what I had in mind was a generic Externalizer for user-supplied
classes (registered at runtime), so the externalizer would need get the
field metadata from a central registry and based on the current conditions
it would decide whether to cache the deserialized value or not.

I think we all agree that Infinispan should not be concerned about how
exactly this will be implemented. The discussion seems to be around whether
there really is a need for such a smarter externalizer.

> To talk specifics, I wouldn't do this per user-type fields: as you say
> I might have too many, my "cache" would need complex eviction logic.
> But I know that some specific fields are all very likely the same;
> think at "table name" for example, when storing the field "to which
> table name this entry is related to", as column names and relation
> roles.. so not the values, but still a good boost and likely more than
> halving the memory overhead as for each entry we have more meta-data
> stuff than actual user values.
>
>
The table name example isn't convincing enough, I think String.intern()
would actually be a great fit here as you don't really need eviction :)

>> > [1] http://en.wikipedia.org/wiki/Flyweight_pattern
>
> Exactly.
>
> >> >> I suspect it might allow me to do some cool things with both OGM and
> >> >> Lucene Directoy, as you can re-hidratate complex object graphs from
> >> >> different cache entries, reassembling them with direct references...
> >> >> dreaming?
> >> >>
> >> >>>> Cheers,
> >> >>>> Sanne
> >> >>>>
> >> >>>>
> >> >>>> 1 - or any immutable object: I'm using String as an example so
> let's
> >> >>>> forget about the static String pool optimizations the JVM might
> >> >>>> enable..
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20120713/c302b026/attachment.html