[infinispan-dev] Compressing Marshaller Wrapper

Galder Zamarreno galder at redhat.com
Fri Feb 26 07:35:23 EST 2010


On Fri, 26 Feb 2010 12:36:57 +0100, philippe van dyck <pvdyck at gmail.com>  
wrote:

> Thanks for the reentrant scenario Galder.
>
> https://jira.jboss.org/jira/browse/ISPN-357 is now closed.
>
> If the Marshaller is used for something else than storing cache entries,  
> I don't think it is a good idea to implement compression at this level.

To clarify, the marshaller is used for, well, marshalling (and  
unmarshalling) :) and the marshalling is used for the following use cases:
- Marshall/unmarshall objects to wire format for sending them to other  
nodes in the cluster.
- Marshall/unmarshall objects to wire format for storing them in cache  
stores.
- Marshall/unmarshall objects to wire format for storing them as byte[] in  
the cache. This enables lazy deserialization.

Now, we use the same marshaller instance for all 3 use cases, which  
somehow explains why the API is maybe not as easy to use at first glance.  
Some of the methods are more oriented at maybe reading from streams, such  
as file streams, whereas others simply transform it all to byte[]. As  
Manik said, this is a bit of legacy API coming from the JBC days. I do  
remember looking at it and thinking whether it could be simplified  
somehow, but didn't looked into it too much since it's mostly an internal  
API. This is something that might make sense doing at some point. I don't  
think it's urgent though.

>
> Compression is cpu intensive, and it may be a good idea to "prepare"  
> entries in memory (with a low priority thread), like adding a  
> "compressed" flag to a cache entry.
> This way, they are ready for storage or transfer... they consume less  
> memory, but they cost much more to use (decompression time).
>
> In fact, it is a very old tradeoff and IMO if compression should be  
> integrated in Infinispan, it is at a higher level -- and another  
> discussion.
>
>> From my point of view, S3 entries are now compressed and cost less to  
>> transfer and store, it was my initial goal.
>
> cheers,
>
> phil
>
>
>
> Le 26 févr. 2010 à 11:16, Galder Zamarreno a écrit :
>
>> On Thu, 25 Feb 2010 12:02:34 +0100, philippe van dyck <pvdyck at gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> Currently, I compress all data before sending it to the cache. Once
>>> compressed, I gain 95% of the JSonized qi4j objects.
>>>
>>> I did some profiling during the load tests and compression is taking
>>> roughly 80% of the cpu time.
>>> So I would like to compress only the data sent to the store, not in
>>> memory.
>>>
>>> Looks like the Marshaller is my friend here, and I plan to write a
>>> compressing wrapper around it.
>>>
>>> Now, when I look at it, I see two ways to wrap the compression process.
>>>
>>> One way is with the ObjectInput / ObjectOutput but I am bothered by the
>>> reentrant flag.
>>
>> As a side note, the reentrant flag is used to signal the marshaller
>> whether several ObjectOutput/ObjectInput as open without a close, i.e.
>> --
>> marshaller.startObjectOutput(x, false)
>> marshaller.startObjectOutput(x, true) -> is reentrant, so mark it as  
>> such
>> --
>> marshaller.startObjectOutput(x, false)
>> marshaller.finishObjectOutput()
>> marshaller.startObjectOutput(x, false) -> not reentrant
>> marshaller.finishObjectOutput()
>> --
>>
>> Why do we use this? To enable marshaller implementations to return a
>> different ObjectOutput if the call is reentrant. If you look at
>> org.infinispan.marshall.jboss.JBossMarshaller you see that the
>> ObjectOutput (or org.jboss.marshalling.Marshaller) is a ThreadLocal, but
>> JBossMarshaller does not allow for the same
>> org.jboss.marshalling.Marshaller to be opened twice. So, by using the
>> reentrant flag, we can make sure that the 2nd time that  
>> startObjectOutput
>> is called, a different one is provided.
>>
>> For an example of reentrancy, see the javadoc:
>>
>>     * <p>On the other hand, when a call is reentrant, i.e.
>> startObjectOutput/startObjectOutput(reentrant)...finishObjectOutput/finishObjectOutput,
>>     * the Marshaller implementation might treat it differently. An  
>> example
>> of reentrancy would be marshalling of {@link MarshalledValue}.
>>     * When sending or storing a MarshalledValue, a call to
>> startObjectOutput() would occur so that the stream is open and
>>     * following, a 2nd call could occur so that MarshalledValue's raw  
>> byte
>> array version is calculated and sent accross.
>>     * This enables lazy deserialization on the receiver side which is
>> performance gain. The Marshaller implementation could decide
>>     * that it needs a separate ObjectOutput or similar for the 2nd call
>> since it's aim is only to get the raw byte array version
>>     * and the close finish with it.</p>
>>
>> The second reentrant call is the one to create the MarshalledValue form  
>> of
>> the in memory data. The first call would be the stream opened to send  
>> the
>> put or get or whichever op you're sending around.
>>
>> As a side note, using ThreadLocal is a much cleaner solution to having  
>> to
>> maintain a pool of org.jboss.marshalling.Marshaller instances.
>>
>> Hope this clarifies further what the reentrant stuff does.
>>
>> Cheers,
>>
>>> The other is the ByteBuffer stuff, no concurrency problem here, but it
>>> looks like more work.
>>>
>>> WDYT ?
>>>
>>> Cheers,
>>>
>>> phil
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> --
>> Galder Zamarreño
>> Sr. Software Engineer
>> Infinispan, JBoss Cache
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


-- 
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache




More information about the infinispan-dev mailing list