[infinispan-dev] Compressing Marshaller Wrapper

philippe van dyck pvdyck at gmail.com
Fri Feb 26 06:36:57 EST 2010


Thanks for the reentrant scenario Galder.

https://jira.jboss.org/jira/browse/ISPN-357 is now closed.

If the Marshaller is used for something else than storing cache entries, I don't think it is a good idea to implement compression at this level.

Compression is cpu intensive, and it may be a good idea to "prepare" entries in memory (with a low priority thread), like adding a "compressed" flag to a cache entry. 
This way, they are ready for storage or transfer... they consume less memory, but they cost much more to use (decompression time).

In fact, it is a very old tradeoff and IMO if compression should be integrated in Infinispan, it is at a higher level -- and another discussion.

>From my point of view, S3 entries are now compressed and cost less to transfer and store, it was my initial goal.

cheers,

phil



Le 26 févr. 2010 à 11:16, Galder Zamarreno a écrit :

> On Thu, 25 Feb 2010 12:02:34 +0100, philippe van dyck <pvdyck at gmail.com>  
> wrote:
> 
>> Hi All,
>> 
>> Currently, I compress all data before sending it to the cache. Once  
>> compressed, I gain 95% of the JSonized qi4j objects.
>> 
>> I did some profiling during the load tests and compression is taking  
>> roughly 80% of the cpu time.
>> So I would like to compress only the data sent to the store, not in  
>> memory.
>> 
>> Looks like the Marshaller is my friend here, and I plan to write a  
>> compressing wrapper around it.
>> 
>> Now, when I look at it, I see two ways to wrap the compression process.
>> 
>> One way is with the ObjectInput / ObjectOutput but I am bothered by the  
>> reentrant flag.
> 
> As a side note, the reentrant flag is used to signal the marshaller  
> whether several ObjectOutput/ObjectInput as open without a close, i.e.
> --
> marshaller.startObjectOutput(x, false)
> marshaller.startObjectOutput(x, true) -> is reentrant, so mark it as such
> --
> marshaller.startObjectOutput(x, false)
> marshaller.finishObjectOutput()
> marshaller.startObjectOutput(x, false) -> not reentrant
> marshaller.finishObjectOutput()
> --
> 
> Why do we use this? To enable marshaller implementations to return a  
> different ObjectOutput if the call is reentrant. If you look at  
> org.infinispan.marshall.jboss.JBossMarshaller you see that the  
> ObjectOutput (or org.jboss.marshalling.Marshaller) is a ThreadLocal, but  
> JBossMarshaller does not allow for the same  
> org.jboss.marshalling.Marshaller to be opened twice. So, by using the  
> reentrant flag, we can make sure that the 2nd time that startObjectOutput  
> is called, a different one is provided.
> 
> For an example of reentrancy, see the javadoc:
> 
>     * <p>On the other hand, when a call is reentrant, i.e.  
> startObjectOutput/startObjectOutput(reentrant)...finishObjectOutput/finishObjectOutput,
>     * the Marshaller implementation might treat it differently. An example  
> of reentrancy would be marshalling of {@link MarshalledValue}.
>     * When sending or storing a MarshalledValue, a call to  
> startObjectOutput() would occur so that the stream is open and
>     * following, a 2nd call could occur so that MarshalledValue's raw byte  
> array version is calculated and sent accross.
>     * This enables lazy deserialization on the receiver side which is  
> performance gain. The Marshaller implementation could decide
>     * that it needs a separate ObjectOutput or similar for the 2nd call  
> since it's aim is only to get the raw byte array version
>     * and the close finish with it.</p>
> 
> The second reentrant call is the one to create the MarshalledValue form of  
> the in memory data. The first call would be the stream opened to send the  
> put or get or whichever op you're sending around.
> 
> As a side note, using ThreadLocal is a much cleaner solution to having to  
> maintain a pool of org.jboss.marshalling.Marshaller instances.
> 
> Hope this clarifies further what the reentrant stuff does.
> 
> Cheers,
> 
>> The other is the ByteBuffer stuff, no concurrency problem here, but it  
>> looks like more work.
>> 
>> WDYT ?
>> 
>> Cheers,
>> 
>> phil
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> 
> -- 
> Galder Zamarreño
> Sr. Software Engineer
> Infinispan, JBoss Cache
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev





More information about the infinispan-dev mailing list