[infinispan-dev] Compressing Marshaller Wrapper

Fri Feb 26 05:16:33 EST 2010

On Thu, 25 Feb 2010 12:02:34 +0100, philippe van dyck <pvdyck at gmail.com>  
wrote:

> Hi All,
>
> Currently, I compress all data before sending it to the cache. Once  
> compressed, I gain 95% of the JSonized qi4j objects.
>
> I did some profiling during the load tests and compression is taking  
> roughly 80% of the cpu time.
> So I would like to compress only the data sent to the store, not in  
> memory.
>
> Looks like the Marshaller is my friend here, and I plan to write a  
> compressing wrapper around it.
>
> Now, when I look at it, I see two ways to wrap the compression process.
>
> One way is with the ObjectInput / ObjectOutput but I am bothered by the  
> reentrant flag.

As a side note, the reentrant flag is used to signal the marshaller  
whether several ObjectOutput/ObjectInput as open without a close, i.e.
--
marshaller.startObjectOutput(x, false)
marshaller.startObjectOutput(x, true) -> is reentrant, so mark it as such
--
marshaller.startObjectOutput(x, false)
marshaller.finishObjectOutput()
marshaller.startObjectOutput(x, false) -> not reentrant
marshaller.finishObjectOutput()
--

Why do we use this? To enable marshaller implementations to return a  
different ObjectOutput if the call is reentrant. If you look at  
org.infinispan.marshall.jboss.JBossMarshaller you see that the  
ObjectOutput (or org.jboss.marshalling.Marshaller) is a ThreadLocal, but  
JBossMarshaller does not allow for the same  
org.jboss.marshalling.Marshaller to be opened twice. So, by using the  
reentrant flag, we can make sure that the 2nd time that startObjectOutput  
is called, a different one is provided.

For an example of reentrancy, see the javadoc:

     * <p>On the other hand, when a call is reentrant, i.e.  
startObjectOutput/startObjectOutput(reentrant)...finishObjectOutput/finishObjectOutput,
     * the Marshaller implementation might treat it differently. An example  
of reentrancy would be marshalling of {@link MarshalledValue}.
     * When sending or storing a MarshalledValue, a call to  
startObjectOutput() would occur so that the stream is open and
     * following, a 2nd call could occur so that MarshalledValue's raw byte  
array version is calculated and sent accross.
     * This enables lazy deserialization on the receiver side which is  
performance gain. The Marshaller implementation could decide
     * that it needs a separate ObjectOutput or similar for the 2nd call  
since it's aim is only to get the raw byte array version
     * and the close finish with it.</p>

The second reentrant call is the one to create the MarshalledValue form of  
the in memory data. The first call would be the stream opened to send the  
put or get or whichever op you're sending around.

As a side note, using ThreadLocal is a much cleaner solution to having to  
maintain a pool of org.jboss.marshalling.Marshaller instances.

Hope this clarifies further what the reentrant stuff does.

Cheers,

> The other is the ByteBuffer stuff, no concurrency problem here, but it  
> looks like more work.
>
> WDYT ?
>
> Cheers,
>
> phil
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache