[infinispan-dev] Question about replication with 5.2.0.CR1

Sanne Grinovero sanne at infinispan.org
Tue Jan 15 10:29:26 EST 2013


Thanks. Watching it as it would affect the Lucene Directory as well (!)

On 15 January 2013 13:45, Randall Hauch <rhauch at redhat.com> wrote:
> I logged this as https://issues.jboss.org/browse/ISPN-2712
>
> On Jan 15, 2013, at 7:44 AM, Randall Hauch <rhauch at redhat.com> wrote:
>
> I would tend to say that it does not behave like it is passivating some of
> the data, because some of the entries are simply missing in the second
> process after startup completes. In other words, we call "get" with a key
> that was clearly transferred, but we get null back.
>
> My understanding of passivation is that the entry will be *either* in-memory
> or persisted in the cache store (but not both). It appears in our tests that
> while most entries were added to the cache store (of the second process),
> some do not exist in the cache store *or* in-memory.
>
> On Jan 14, 2013, at 4:19 PM, Ray Tsang <saturnism at gmail.com> wrote:
>
> Does it behave like what's described here?
> https://docs.jboss.org/author/display/ISPN/Cache+Loaders+and+Stores#CacheLoadersandStores-CachePassivation
>
> Thanks,
>
> On Mon, Jan 14, 2013 at 6:53 AM, Randall Hauch <rhauch at redhat.com> wrote:
>>
>> We're using MODE-1745 to track this problem (from ModeShape's
>> perspective): https://issues.jboss.org/browse/MODE-1745
>>
>> Here's an update.
>>
>> After some more extensive debugging sessions, found the following in the
>> scenario when the 2nd cluster nodes is started only after the first
>> clustered node has completed initialization:
>>
>>         • state transfer seems to be sending all the data across to node 2
>> from node 1 - in total 293 PutKeyValueCommand
>>         • when the systemNode == null check is performed on node 2, the
>> cache only has 222 entries and there is nothing persisted in the file cache
>> store
>>
>> This seems to be a case of nodes being evicted from node 2, without being
>> persisted on the underlying cache store. Disabling eviction via the
>> configuration file make this scenario pass. With the latter set to NONE, all
>> the 293 entries received by node 2 are placed in the data container. With
>> the eviction set to
>>
>>         <eviction strategy="LIRS" maxEntries="1000"/>
>>
>> only 222 entries (out of 293 total) are placed in the cache. This
>> indicates to Horia and I that there's a possible bug
>> aroundorg.infinispan.container.DefaultDataContainer#put
>> /org.infinispan.util.concurrent.BoundedConcurrentHashMap (the actual runtime
>> instance) and eviction.
>>
>> Running with:
>>
>>         <eviction strategy="LRU" maxEntries="10000"/>
>>
>> produces the same problem, so ATM we're able to run successfully **ONLY**
>> with eviction disabled.
>>
>> Can someone familiar with Infinispan's internals please take a look at
>> this to see if we're correct?
>>
>>
>> On Jan 11, 2013, at 12:03 PM, Randall Hauch <rhauch at redhat.com> wrote:
>>
>> > I'm trying to debug some problems that ModeShape is having in clustered
>> > situations when using 5.2.0.CR1. I don't have a standalone test case, but
>> > hopefully I can explain what I'm doing.
>> >
>> > I'm working with a replicated cache and two processes. The cache
>> > configuration (see attached) uses a cache store (with
>> > fetchPersistentState=true) and eviction (though the value of 'maxEntries' is
>> > high enough that neither process hits it).
>> >
>> > The first process starts up fine and both ModeShape and Infinispan work
>> > fine. After some period of time (~20 seconds), I start up the second
>> > process. It joins the cluster, and receives the initial state transfer from
>> > the first process. I can see from the logs that an entry with a particular
>> > key has been transferred and is complete. The second process (as part of the
>> > ModeShape initialization code) attempts to look up the entry with this
>> > particular key, but it doesn't find it. At this point, ModeShape starts
>> > mis-behaving because this particular entry is critical in knowing if
>> > ModeShape needs to initialize the repository content (by creating several
>> > hundred entries). Upon finding no such node, it attempts to recreate it and
>> > a hundred other entries. Some succeed, but others fail because existing
>> > entries are found when they weren't expected to be found. I've replicated
>> > this problem on two different machines with different operating systems.
>> >
>> > We're using explicit locks for writes, and we're using a cache with
>> > SKIP_REMOTE_LOOKUP and DELTA_WRITE flags when writing, but no particular
>> > flags when reading. (See below for why we're using these flags.)
>> >
>> > My understanding is that, once the initial state transfer completes, the
>> > second process' cache store should contain all of the transferred entries,
>> > and any attempt to look up an entry by key will obviously check local memory
>> > and, if not found, will consult the cache store.
>> >
>> > I've attached the log file for this second process. Here are some of the
>> > key points in the file:
>> >
>> > 1) Starting with line 39, the log shows that the cache is started, joins
>> > the existing cluster, and waits for the initial state transfer. This is
>> > follows by lots of lines showing the details of the state transfer.
>> >
>> > 2) On line 159, one of the state transfer DEBUG lines shows a
>> > PutKeyValueCommand with the entry of interest, with key
>> > "cb80206317f1e7jcr:system" and who's value looks as expected:
>> >
>> > OOB-1,Machine1-27258 2013-01-11 11:31:26,819 TRACE
>> > statetransfer.StateTransferInterceptor - handleTopologyAffectedCommand for
>> > command PutKeyValueCommand{key=cb80206317f1e7jcr:system,
>> > value=SchematicEntryLiteral{ "metadata" : { "id" :
>> > "cb80206317f1e7jcr:system" , "contentType" : "application/json" } ,
>> > "content" : { "key" : "cb80206317f1e7jcr:system" , "parent" : [
>> > "cb80206317f1e7/" , "cb80206cd556c0/" ] , "properties" : {
>> > "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "mode:system" }
>> > } } , "children" : [ { "key" : "cb80206317f1e7jcr:nodeTypes" , "name" :
>> > "jcr:nodeTypes" } , { "key" : "cb80206317f1e7jcr:versionStorage" , "name" :
>> > "jcr:versionStorage" } , { "key" : "cb80206317f1e7mode:namespaces" , "name"
>> > : "mode:namespaces" } , { "key" : "cb80206317f1e7mode:locks" , "name" :
>> > "mode:locks" } , { "key" : "cb80206317f1e7mode:synchronizedInitialization" ,
>> > "name" : "mode:repository" } ] , "childrenInfo" : { "count" : 5 } } },
>> > flags=[CACHE_MODE_LOCAL, SKIP_REMOTE_LOOKUP, PUT_FOR_STATE_TRANSFER,
>> > SKIP_SHARED_CACHE_STORE, SKIP_OWNERSHIP_CHECK, IGNORE_RETURN_VALUES,
>> > SKIP_XSITE_BACKUP], putIfAbsent=false, lifespanMillis=-1,
>> > maxIdleTimeMillis=-1, successful=true}
>> >
>> > 3) On line 673, the log shows that state transfer was completed.
>> >
>> > 4) On line 700, ModeShape calls "cache.get(…)" with the
>> > "cb80206317f1e7/" key, and this entry is successfully found:
>> >
>> > com.pb.spring.Main.main() 2013-01-11 11:31:27,667 TRACE
>> > statetransfer.StateTransferInterceptor - handleTopologyAffectedCommand for
>> > command GetKeyValueCommand {key=cb80206317f1e7/, flags=null}
>> >
>> > 5) On line 704, ModeShape calls "cache.get(…)" with the
>> > "cb80206317f1e7jcr:system" key, but this entry is not found:
>> >
>> > com.pb.spring.Main.main() 2013-01-11 11:31:27,687 TRACE
>> > statetransfer.StateTransferInterceptor - handleTopologyAffectedCommand for
>> > command GetKeyValueCommand {key=cb80206317f1e7jcr:system, flags=null}
>> >
>> > I don't understand why this happens. I understand why the cache doesn't
>> > find it in memory, but why doesn't it consult the cache store? Am I missing
>> > a specific flag?
>> >
>> > Here's a bit of background. Our SchematicEntryLiteral values are
>> > DeltaAware, and we've patterned the code after AtomicHashMap to get the
>> > MVCC-style behavior, but our SchematicEntryLiteral values contain JSON-like
>> > documents. All of our code can be seen at [1]. Notice our
>> > SchematicEntryProxy, and the two Delta implementations. We're using explicit
>> > locks for writes, and we're using a cache with SKIP_REMOTE_LOOKUP and
>> > DELTA_WRITE flags when writing, but no particular flags when reading.
>> >
>> > BTW, ModeShape creates two cache containers. One of them (created from
>> > the "spectrum-workspace-cache.xml" configuration file) is used only for
>> > local in-memory caches that use eviction and no persistence; this cache is
>> > not really of concern.
>> >
>> > Thanks in advance!
>> >
>> > Randall Hauch
>> >
>> > [1]
>> > https://github.com/rhauch/modeshape/tree/1e0db548891a8355651654d0880553b2635b358f/modeshape-schematic/src/main/java/org/infinispan/schematic/internal
>> >
>> >
>> > <main2_1357921501.log><spectrum-repository-infinispan.xml>_______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev



More information about the infinispan-dev mailing list