Scratch what I said abt the Bdbje CL - it gets it's knickers in a
twist as well with enough CPUs/cores - the 4-cpu lab servers got the
BDBJE engine in a deadlock with concurrent reads, writes and
removes. Will have to apply the same StripedLock access check to
this as well.
On 19 Jun 2007, at 12:32, Bela Ban wrote:
OK, yes, having remote puts and CCL'induced gets not block on the
same lock is certainly a much better solution.
What will also help is JGroups 2.5 *should* we still encounter a
deadlock.
BTW: I assume using the Multiplexer here is out of the question ?
You could use a separate stack in the Multiplexer (e.g. "udp-repl-
sync") which is only used by the CCL, and has no FC in it, and
everyone else uses the async stack.
But then again, there was a reason we didn't configure the
Multiplexer to be the defaut in 2.4...
Manik Surtani wrote:
>
> On 19 Jun 2007, at 09:04, Bela Ban wrote:
>
>> Is this going to solve JBCACHE-1103 ? I mean, +1 for the lock
>> refactoring, sounds like it needs to be done anyway. But, because
>> you have to be thread safe at one point, aren't you deferring the
>> issue to a later stage ? At one point you have to acquire a lock
>> to prevent concurrent loading, and I'm afraid we'll have the same
>> issue.
>>
>> But, wait, if you defer lock acquisition till after the
>> synchronous cluster method call, you should be fine. Is this what
>> you have in mind ?
>
> There are 2 locks here that need to be done. One lock is on the
> Node, via the lock interceptor, which happens *after* the cluster
> method call. This is not the problem.
>
> We also had a level of sync in the cache loader interceptor to
> prevent the same Fqn being loaded twice from the CL (or a
> simultaneous load and store, etc). This bit was the problem,
> since this sync had to happen *before* any loading was done by the
> CL.
>
> Regarding the other CL impls, some of them already were threadsafe
> (BDBJE, Jdbm) and some weren't (JDBC, File). I created a
> StripedLock class which uses a set of ReentrantReadWriteLocks
> applied to the Fqn in question, and the JDBC and File CLs now use
> this to achieve thread safety (more efficient than synchronized
> blocks and better mem usage than one lock per Fqn - usual benefits
> of lock striping).
>
> You're right that the improvements in locking here are not quite
> central to solving 1103, but somewhat related.
>
> The real solution to 1103 is in the CCL - remotely originating put
> ()'s shouldn't need a load (and hence not need to wait on the Fqn
> being loaded) *only if* we're talking about the CCL, since CCLs
> don't perform a corresponding store (they're read-only).
> Therefore this logic cannot be in the CLI as it doesn't apply to
> all CL impls.
>
>
>>
>> On the same topic: I looked into interrupting clustered method
>> calls in JGroups (
http://jira.jboss.com/jira/browse/JGRP-533),
>> and came to the conclusion that it isn't feasible. When a thread
>> is blocked on FC.down(), waiting for credits, then I can
>> interrupt it, but it will go right back in its loop waiting for
>> credits, therefore blocking again. Unless I change the loop
>> condition (running & waiting-for-credits), the thread will always
>> block if there aren't enough credits available.
>>
>> Thoughts ?
>>
>>
>> Manik Surtani wrote:
>>>
>>>
>>> Ok, I spent a bit of time investigating this, and the original
>>> sync blocks in the CacheLoaderInterceptor, superceded by the
>>> synchronization on an Fqn in subsequent versions are all hacks
>>> to get around the fact that some cache loader implementations
>>> themselves aren't thread safe.
>>>
>>> I wrote a test to check the thread safety of cache loader impls,
>>> and most of what we have fail. I patched the
>>> DummyInMemoryCacheLoader to be thread safe (a lot of
>>> synchronization within the impl) and removed the unnecessary
>>> locks in the interceptors and things work fine. I tried this
>>> with some of the few thread-safe cache loader impls we have
>>> (BDBJE, for example) and this worked well as well.
>>>
>>> So what I propose is this:
>>>
>>> 1) Patch CLI to NOT use the Fqn locks (in
>>> BaseCacheLoaderInterceptor)
>>> 2) Make sure we acquire locks (lock() call up the interceptor
>>> stack) *before* attempting to load node, but after creating temp
>>> node
>>> 3) Make sure cache loader impls are thread safe
>>> 4) Add thread safety test to CacheLoaderTestBase
>>> 5) Patch loaders that aren't threadsafe and fail test in (4) -
>>> FileCacheLoader, JDBCCacheLoader, etc.
>>>
>>> I'm going to do this on my local checkout of HEAD and run the
>>> regression tests overnight. If this works, I suggest
>>> backporting this to 1.4.x and regressing there.
>>>
>>> WDYT?
>>
>> --
>> Bela Ban
>> Lead JGroups / JBoss Clustering team
>> JBoss - a division of Red Hat
>
> --
> Manik Surtani
>
> Lead, JBoss Cache
> JBoss, a division of Red Hat
>
>
>
--
Bela Ban
Lead JGroups / JBoss Clustering team
JBoss - a division of Red Hat