]
Paul Ferraro updated WFLY-6877:
-------------------------------
Fix Version/s: 11.0.0.Alpha1
Fine session attributes (i.e. replication-granularity = ATTRIBUTE)
cause org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock
after 15 seconds
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Key: WFLY-6877
URL:
https://issues.jboss.org/browse/WFLY-6877
Project: WildFly
Issue Type: Bug
Components: Clustering
Affects Versions: 10.0.0.Final
Environment: Wildfly 10.0.0-Final
JDK 1.8.0_66-b17
Linux
Reporter: Sammy Chu
Assignee: Paul Ferraro
Priority: Critical
Fix For: 10.1.0.CR1, 11.0.0.Alpha1
Attachments: FineSessionAttributes.java
Our system using the following infinispan configuration for web session cache (i.e. an
async, non-transactional, batch, optimistic locking dustrubutable cache):
{code:xml}
<cache-container name="web" default-cache="dist"
module="org.wildfly.clustering.web.infinispan">
<transport lock-timeout="60000"/>
<distributed-cache name="dist" mode="ASYNC"
l1-lifespan="0" owners="2">
<locking isolation="REPEATABLE_READ"/>
<transaction locking="OPTIMISTIC" mode="BATCH"/>
<file-store/>
</distributed-cache>
</cache-container>
{code}
When we perform a load test for one of our business use case in our system, Wildfly
always throw the infamous locking for 15 seconds timeout exception:
{noformat}
2016-07-21 13:01:37,971 TRACE
[org.infinispan.util.concurrent.locks.impl.DefaultLockManager] (default task-33) Lock all
keys=[4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl,
4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY] for
owner=GlobalTransaction:<10.1.3.29>:492:local. timeout=15000 (MILLISECONDS)
2016-07-21 13:01:52,975 ERROR [org.infinispan.interceptors.InvocationContextInterceptor]
(default task-33) ISPN000136: Error executing command PrepareCommand, writing keys
[4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl,
4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY]:
org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after
15 seconds for key 4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY and requestor
GlobalTransaction:<10.1.3.29>:492:local. Lock is held by
GlobalTransaction:<10.1.3.29>:458:local
at
org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.lock(DefaultLockManager.java:236)
at
org.infinispan.util.concurrent.locks.impl.DefaultLockManager$CompositeLockPromise.lock(DefaultLockManager.java:305)
at
org.infinispan.interceptors.locking.AbstractLockingInterceptor.lockAllAndRecord(AbstractLockingInterceptor.java:199)
at
org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.checkPendingAndLockAllKeys(AbstractTxLockingInterceptor.java:199)
at
org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockAllOrRegisterBackupLock(AbstractTxLockingInterceptor.java:165)
at
org.infinispan.interceptors.locking.OptimisticLockingInterceptor.visitPrepareCommand(OptimisticLockingInterceptor.java:69)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:173)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
at
org.infinispan.interceptors.TxInterceptor.invokeNextInterceptorAndVerifyTransaction(TxInterceptor.java:157)
at
org.infinispan.interceptors.TxInterceptor.visitPrepareCommand(TxInterceptor.java:144)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:173)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
at
org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:113)
at
org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)
at
org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitPrepareCommand(TransactionSynchronizerInterceptor.java:42)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:173)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
at
org.infinispan.statetransfer.StateTransferInterceptor.handleTxCommand(StateTransferInterceptor.java:238)
at
org.infinispan.statetransfer.StateTransferInterceptor.visitPrepareCommand(StateTransferInterceptor.java:87)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:173)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
at
org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:107)
at
org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:76)
at
org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:173)
at
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
at
org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:113)
at
org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)
at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:173)
at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:336)
at
org.infinispan.transaction.impl.TransactionCoordinator.commit(TransactionCoordinator.java:157)
at
org.infinispan.transaction.xa.TransactionXaAdapter.commit(TransactionXaAdapter.java:114)
at
org.infinispan.transaction.tm.DummyTransaction.finishResource(DummyTransaction.java:389)
at
org.infinispan.transaction.tm.DummyTransaction.commitResources(DummyTransaction.java:435)
at org.infinispan.transaction.tm.DummyTransaction.runCommit(DummyTransaction.java:314)
at org.infinispan.transaction.tm.DummyTransaction.commit(DummyTransaction.java:105)
at
org.infinispan.transaction.tm.DummyBaseTransactionManager.commit(DummyBaseTransactionManager.java:73)
at
org.wildfly.clustering.ee.infinispan.ActiveTransactionBatch.close(ActiveTransactionBatch.java:48)
at
org.wildfly.clustering.web.undertow.session.DistributableSession.requestDone(DistributableSession.java:77)
at
io.undertow.servlet.spec.ServletContextImpl.updateSessionAccessTime(ServletContextImpl.java:768)
at
io.undertow.servlet.spec.HttpServletResponseImpl.responseDone(HttpServletResponseImpl.java:557)
at
io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(ServletInitialHandler.java:331)
at
io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(ServletInitialHandler.java:263)
at
io.undertow.servlet.handlers.ServletInitialHandler.access$000(ServletInitialHandler.java:81)
at
io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(ServletInitialHandler.java:174)
at io.undertow.server.Connectors.executeRootHandler(Connectors.java:202)
at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:793)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}
After tracing using the trace log, we found the time difference between a "Lock all
keys" log statement and the "Release locks for keys" log statement takes
for about 15~20 seconds, so if another concurrent request would like to lock the same key
within that interval, it will result in the timeout exception, example as below:
{noformat}
2016-07-21 13:01:37,958 TRACE
[org.infinispan.util.concurrent.locks.impl.DefaultLockManager] (default task-37) Lock all
keys=[4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->XXX,
4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY,
4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->ZZZ,
4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl] for
owner=GlobalTransaction:<10.1.3.29>:458:local. timeout=15000 (MILLISECONDS)
2016-07-21 13:01:53,364 TRACE
[org.infinispan.util.concurrent.locks.impl.DefaultLockManager] (default task-37) Release
locks for keys=[4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->XXX,
4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY,
4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->ZZZ,
4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl].
owner=GlobalTransaction:<10.1.3.29>:458:local
{noformat}
We think that is strange as those session keys are POJO without complex nested objects,
and we only perform read operation on those objects. Although we know objects without
marked *(a)org.wildfly.clustering.web.annotation.Immutable* will perform cache sync, we
believe that should not took for more than 1 second to commit the cache but it takes for
more than 15 seconds...
After future tracing, we found the following log message on another cluster node:
{noformat}
2016-07-21 13:01:53,410 TRACE [org.infinispan.commands.tx.PrepareCommand]
(Incoming-10,ee,10.1.3.30) {} Invoking remotely originated prepare: PrepareCommand
{modifications=[PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->XXX,
value=[B@f9f09c2, flags=[IGNORE_RETURN_VALUES], putIfAbsent=false,
valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1,
version=null}, successful=true},
PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY, value=[B@f72eca9,
flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS,
metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null},
successful=true}, PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->ZZZ,
value=[B@6f108f88, flags=[IGNORE_RETURN_VALUES], putIfAbsent=false,
valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1,
version=null}, successful=true},
PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl,
value=org.wildfly.clustering.web.infinispan.session.SimpleSessionAccessMetaData@126bb5d4,
flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS,
metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null},
successful=true}, PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->XXX,
value=[B@1a574fb5, flags=[IGNORE_RETURN_VALUES], putIfAbsent=false,
valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1,
version=null}, successful=true},
PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY, value=[B@3141bcb,
flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS,
metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null},
successful=true},
PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->ZZZ.menuName,
value=[B@ae3a302, flags=[FORCE_SYNCHRONOUS], putIfAbsent=false, valueMatcher=MATCH_ALWAYS,
metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null},
successful=true}, ...], onePhaseCommit=true, retried=false,
gtx=GlobalTransaction:<10.1.3.29>:458:remote, cacheName='app.war',
topologyId=2} with invocation context:
org.infinispan.context.impl.RemoteTxInvocationContext@3d21e5e9
{noformat}
In general, it synchronized XXX key {color:red}2{color} times, YYY key
{color:red}2272{color} times, ZZZ key {color:red}3{color} times!!!
The root cause should be this line:
https://github.com/wildfly/wildfly/blob/10.0.0.Final/clustering/web/infin...
where a new CacheEntryMutator.mutate() will trigger a new PutKeyValueCommand. For
consecutive get call on the same session key, it will trigger the PutKeyValueCommand
multiple times, but that is not necessary for transactional/ batched cache in general as
the MarshalledValue is lazily serialized, the actual value is pointing to the same
instance.
Attached the suggested change, which solved our issue. Basically use a Set to track the
mutated attribute names, to ensure the command is only issued once. Tested on
non-transactional, batched cache (non-transactional, non-batch cache may not work as I did
not think about that...)