Question about replication with 5.2.0.CR1

Friday, 11 January 2013

I'm trying to debug some problems that ModeShape is having in clustered situations
when using 5.2.0.CR1. I don't have a standalone test case, but hopefully I can explain
what I'm doing.

I'm working with a replicated cache and two processes. The cache configuration (see
attached) uses a cache store (with fetchPersistentState=true) and eviction (though the
value of 'maxEntries' is high enough that neither process hits it). 

The first process starts up fine and both ModeShape and Infinispan work fine. After some
period of time (~20 seconds), I start up the second process. It joins the cluster, and
receives the initial state transfer from the first process. I can see from the logs that
an entry with a particular key has been transferred and is complete. The second process
(as part of the ModeShape initialization code) attempts to look up the entry with this
particular key, but it doesn't find it. At this point, ModeShape starts mis-behaving
because this particular entry is critical in knowing if ModeShape needs to initialize the
repository content (by creating several hundred entries). Upon finding no such node, it
attempts to recreate it and a hundred other entries. Some succeed, but others fail because
existing entries are found when they weren't expected to be found. I've replicated
this problem on two different machines with different operating systems.

We're using explicit locks for writes, and we're using a cache with
SKIP_REMOTE_LOOKUP and DELTA_WRITE flags when writing, but no particular flags when
reading. (See below for why we're using these flags.)

My understanding is that, once the initial state transfer completes, the second
process' cache store should contain all of the transferred entries, and any attempt to
look up an entry by key will obviously check local memory and, if not found, will consult
the cache store.

I've attached the log file for this second process. Here are some of the key points in
the file:

1) Starting with line 39, the log shows that the cache is started, joins the existing
cluster, and waits for the initial state transfer. This is follows by lots of lines
showing the details of the state transfer.

2) On line 159, one of the state transfer DEBUG lines shows a PutKeyValueCommand with the
entry of interest, with key "cb80206317f1e7jcr:system" and who's value looks
as expected:

OOB-1,Machine1-27258 2013-01-11 11:31:26,819 TRACE statetransfer.StateTransferInterceptor
- handleTopologyAffectedCommand for command
PutKeyValueCommand{key=cb80206317f1e7jcr:system, value=SchematicEntryLiteral{
"metadata" : { "id" : "cb80206317f1e7jcr:system" ,
"contentType" : "application/json" } , "content" : {
"key" : "cb80206317f1e7jcr:system" , "parent" : [
"cb80206317f1e7/" , "cb80206cd556c0/" ] , "properties" : {
"http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" :
"mode:system" } } } , "children" : [ { "key" :
"cb80206317f1e7jcr:nodeTypes" , "name" : "jcr:nodeTypes" } ,
{ "key" : "cb80206317f1e7jcr:versionStorage" , "name" :
"jcr:versionStorage" } , { "key" :
"cb80206317f1e7mode:namespaces" , "name" : "mode:namespaces"
} , { "key" : "cb80206317f1e7mode:locks" , "name" :
"mode:locks" } , { "key" :
"cb80206317f1e7mode:synchronizedInitialization" , "name" :
"mode:repository" } ] , "childrenInfo" : { "count" : 5 } }
}, flags=[CACHE_MODE_LOCAL, SKIP_REMOTE_LOOKUP, PUT_FOR_STATE_TRANSFER,
SKIP_SHARED_CACHE_STORE, SKIP_OWNERSHIP_CHECK, IGNORE_RETURN_VALUES, SKIP_XSITE_BACKUP],
putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true}

3) On line 673, the log shows that state transfer was completed.

4) On line 700, ModeShape calls "cache.get(…)" with the
"cb80206317f1e7/" key, and this entry is successfully found:

com.pb.spring.Main.main() 2013-01-11 11:31:27,667 TRACE
statetransfer.StateTransferInterceptor - handleTopologyAffectedCommand for command
GetKeyValueCommand {key=cb80206317f1e7/, flags=null}

5) On line 704, ModeShape calls "cache.get(…)" with the
"cb80206317f1e7jcr:system" key, but this entry is not found:

com.pb.spring.Main.main() 2013-01-11 11:31:27,687 TRACE
statetransfer.StateTransferInterceptor - handleTopologyAffectedCommand for command
GetKeyValueCommand {key=cb80206317f1e7jcr:system, flags=null}

I don't understand why this happens. I understand why the cache doesn't find it in
memory, but why doesn't it consult the cache store? Am I missing a specific flag?

Here's a bit of background. Our SchematicEntryLiteral values are DeltaAware, and
we've patterned the code after AtomicHashMap to get the MVCC-style behavior, but our
SchematicEntryLiteral values contain JSON-like documents. All of our code can be seen at
[1]. Notice our SchematicEntryProxy, and the two Delta implementations. We're using
explicit locks for writes, and we're using a cache with SKIP_REMOTE_LOOKUP and
DELTA_WRITE flags when writing, but no particular flags when reading.

BTW, ModeShape creates two cache containers. One of them (created from the
"spectrum-workspace-cache.xml" configuration file) is used only for local
in-memory caches that use eviction and no persistence; this cache is not really of
concern.

Thanks in advance!

Randall Hauch

[1]
https://github.com/rhauch/modeshape/tree/1e0db548891a8355651654d0880553b2...

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009