[
https://issues.jboss.org/browse/ISPN-1602?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-1602:
------------------------------------
Erik I've been looking in the logs for stale locks but I couldn't find any key
that was never released. I did find a lot of errors acquiring locks, but nothing unusual.
Then I looked at the duration of cache view installations. Most took well below 1 second,
a few took ~ 10 seconds (because they had to wait for a LockControlCommand to time out),
and one took > 60 seconds:
{noformat}
2011-12-08 21:46:17,629 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl]
(CacheViewInstaller-4,dht10-14621(site01)) Installing new view CacheView{viewId=7,
members=[dht10-14621(site01), dht12-11638(site03), dht14-60002(site02)]} for cache
serviceGroup
2011-12-08 21:47:36,938 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl]
(CacheViewInstaller-4,dht10-14621(site01)) serviceGroup: Committing cache view 7
{noformat}
Most of that time seems to spent between node {{dht10-14621}} sending the {{COMMIT_VIEW}}
command and node {{dht14-60002}} executing the command:
{noformat}
server-dht10.log:2011-12-08 21:46:46,863 TRACE
[org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
(CacheViewInstaller-4,dht10-14621(site01)) Replication task sending
CacheViewControlCommand{cache=serviceGroup, type=COMMIT_VIEW, sender=dht10-14621(site01),
newViewId=7, newMembers=null, oldViewId=0, oldMembers=null} to addresses
[dht10-14621(site01), dht12-11638(site03), dht14-60002(site02)]
server-dht12.log:2011-12-08 21:46:46,858 TRACE
[org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
(OOB-71,erm-cluster-0a0a0e3c,dht12-11638(site03)) Attempting to execute command:
CacheViewControlCommand{cache=serviceGroup, type=COMMIT_VIEW, sender=dht10-14621(site01),
newViewId=7, newMembers=null, oldViewId=0, oldMembers=null} [sender=dht10-14621(site01)]
server-dht14.log:2011-12-08 21:47:36,932 TRACE
[org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
(OOB-118,erm-cluster-0a0a0e3c,dht14-60002(site02)) Attempting to execute command:
CacheViewControlCommand{cache=serviceGroup, type=COMMIT_VIEW, sender=dht10-14621(site01),
newViewId=7, newMembers=null, oldViewId=0, oldMembers=null} [sender=dht10-14621(site01)]
{noformat}
I didn't see any JGroups warnings about dropped messages from {{dht10-14621}}, so
I'm guessing the OOB thread pool is full.
Make sure you don't have {{oob_thread_pool.queue_enabled="true"}} in your
config, or you'll only use {{oob_thread_pool.min_threads}} most of the time.
Single view change causes stale locks
-------------------------------------
Key: ISPN-1602
URL:
https://issues.jboss.org/browse/ISPN-1602
Project: Infinispan
Issue Type: Bug
Components: Core API
Affects Versions: 5.1.0.CR1
Reporter: Erik Salter
Assignee: Dan Berindei
Priority: Critical
Fix For: 5.1.0.CR2
During load testing of 5.1.0.CR1, we're encountering JGroups 3.x dropping views. We
know due to ISPN-1581, if the number of view changes > 3, there could be a stale lock
on a failed commit. However, we're seeing stale locks occur on a single view change.
In the following logs, the affected cluster is the erm-cluster-xxxx
(We also don't know why JGroups 3.x is unstable. We suspected FLUSH and incorrect FD
settings, but we removed them, and we're still getting dropped messages)
The trace logs (It isn't long at all before the issue occurs) are at:
http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht10/server.l...
http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht11/server.l...
http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht12/server.l...
http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht13/server.l...
http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht14/server.l...
http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht15/server.l...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira