[infinispan-issues] [JBoss JIRA] (ISPN-1602) Single view change causes stale locks

Dan Berindei (Commented) (JIRA) jira-events at lists.jboss.org
Mon Dec 12 04:58:10 EST 2011


    [ https://issues.jboss.org/browse/ISPN-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649883#comment-12649883 ] 

Dan Berindei commented on ISPN-1602:
------------------------------------

Erik I've been looking in the logs for stale locks but I couldn't find any key that was never released. I did find a lot of errors acquiring locks, but nothing unusual.

Then I looked at the duration of cache view installations. Most took well below 1 second, a few took ~ 10 seconds (because they had to wait for a LockControlCommand to time out), and one took > 60 seconds:

{noformat}
2011-12-08 21:46:17,629 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-4,dht10-14621(site01)) Installing new view CacheView{viewId=7, members=[dht10-14621(site01), dht12-11638(site03), dht14-60002(site02)]} for cache serviceGroup
2011-12-08 21:47:36,938 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-4,dht10-14621(site01)) serviceGroup: Committing cache view 7
{noformat}

Most of that time seems to spent between node {{dht10-14621}} sending the {{COMMIT_VIEW}} command and node {{dht14-60002}} executing the command:
{noformat}
server-dht10.log:2011-12-08 21:46:46,863 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (CacheViewInstaller-4,dht10-14621(site01)) Replication task sending CacheViewControlCommand{cache=serviceGroup, type=COMMIT_VIEW, sender=dht10-14621(site01), newViewId=7, newMembers=null, oldViewId=0, oldMembers=null} to addresses [dht10-14621(site01), dht12-11638(site03), dht14-60002(site02)]
server-dht12.log:2011-12-08 21:46:46,858 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-71,erm-cluster-0a0a0e3c,dht12-11638(site03)) Attempting to execute command: CacheViewControlCommand{cache=serviceGroup, type=COMMIT_VIEW, sender=dht10-14621(site01), newViewId=7, newMembers=null, oldViewId=0, oldMembers=null} [sender=dht10-14621(site01)]
server-dht14.log:2011-12-08 21:47:36,932 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-118,erm-cluster-0a0a0e3c,dht14-60002(site02)) Attempting to execute command: CacheViewControlCommand{cache=serviceGroup, type=COMMIT_VIEW, sender=dht10-14621(site01), newViewId=7, newMembers=null, oldViewId=0, oldMembers=null} [sender=dht10-14621(site01)]
{noformat}

I didn't see any JGroups warnings about dropped messages from {{dht10-14621}}, so I'm guessing the OOB thread pool is full.

Make sure you don't have {{oob_thread_pool.queue_enabled="true"}} in your config, or you'll only use {{oob_thread_pool.min_threads}} most of the time.


                
> Single view change causes stale locks
> -------------------------------------
>
>                 Key: ISPN-1602
>                 URL: https://issues.jboss.org/browse/ISPN-1602
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core API
>    Affects Versions: 5.1.0.CR1
>            Reporter: Erik Salter
>            Assignee: Dan Berindei
>            Priority: Critical
>             Fix For: 5.1.0.CR2
>
>
> During load testing of 5.1.0.CR1, we're encountering JGroups 3.x dropping views.  We know due to ISPN-1581, if the number of view changes > 3, there could be a stale lock on a failed commit.  However, we're seeing stale locks occur on a single view change.
> In the following logs, the affected cluster is the erm-cluster-xxxx
> (We also don't know why JGroups 3.x is unstable.  We suspected FLUSH and incorrect FD settings, but we removed them, and we're still getting dropped messages)
> The trace logs (It isn't long at all before the issue occurs) are at:
> http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht10/server.log.gz
> http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht11/server.log.gz
> http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht12/server.log.gz
> http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht13/server.log.gz
> http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht14/server.log.gz
> http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht15/server.log.gz

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the infinispan-issues mailing list