[infinispan-issues] [JBoss JIRA] (ISPN-1838) State transfer takes more than 10 minutes with only 10K entries.

Thu Mar 8 07:55:37 EST 2012

    [ https://issues.jboss.org/browse/ISPN-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675104#comment-12675104 ] 

RH Bugzilla Integration commented on ISPN-1838:
-----------------------------------------------

Tristan Tarrant <ttarrant at redhat.com> changed the Status of [bug 786202|https://bugzilla.redhat.com/show_bug.cgi?id=786202] from VERIFIED to CLOSED
                
> State transfer takes more than 10 minutes with only 10K entries.
> ----------------------------------------------------------------
>
>                 Key: ISPN-1838
>                 URL: https://issues.jboss.org/browse/ISPN-1838
>             Project: Infinispan
>          Issue Type: Bug
>          Components: State transfer
>    Affects Versions: 5.1.1.CR1
>            Reporter: Michal Linhard
>            Assignee: Dan Berindei
>             Fix For: 5.1.2.FINAL
>
>         Attachments: apply_state.log, apply_state.txt, dan.xml, retransmissions.txt, uuperf-tcp.txt, uuperf-udp.txt, uuperf-unicast1.txt
>
>
> This could be categorized as a performance problem.
> It happened in resilience test run: http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/23
> originally to verify ISPN-1826
> It was run with infinispan special build from Galder's branch (https://github.com/galderz/infinispan/tree/t_1826_5)
> http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-build-infinispan-from-source/45/
> test starts 4 nodes, kills node2, starts node2 and sees what happens
> trace logging on server side was on. there were two runs
> 200 clients, 10K entries
> http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/23
> 20 clients, 1K entries
> http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/24
> in run 24 everyting looks nice:
> http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/job/edg-60-failover-dist-basic/24/artifact/report/stats-throughput.png
> in run 23 the state transfer takes forever (more than 10 min)
> these important views are installed on coordinator (node03):
> {code}
> 2012-02-02 05:11:00,560 TRACE [BaseStateTransferManagerImpl] (transport-thread-1) Received new cache view: testCache CacheView{viewId=6, members=[edg-perf04-45788, edg-perf03-36944, edg-perf02-51026, edg-perf01-47003]}
> 2012-02-02 05:15:13,591 TRACE [BaseStateTransferManagerImpl] (transport-thread-9) Received new cache view: testCache CacheView{viewId=7, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003]}
> 2012-02-02 05:18:17,219 TRACE [BaseStateTransferManagerImpl] (transport-thread-1) Received new cache view: testCache CacheView{viewId=8, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003, edg-perf02-21799]}
> 2012-02-02 05:28:17,511 TRACE [BaseStateTransferManagerImpl] (transport-thread-22) Received new cache view: testCache CacheView{viewId=10, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003, edg-perf02-21799]}
> {code}
> viewId=8 is the one that takes 10 min to prepare and after that the prepare fails:
> {code}
> 2012-02-02 05:28:17,219 ERROR [CacheViewsManagerImpl] (CacheViewInstaller-9,edg-perf03-36944) ISPN000172: Failed to prepare view CacheView{viewId=8, members=[edg-perf04-45788, edg-perf03-36944, edg-perf01-47003, edg-perf02-21799]} for cache  testCache, ro..
> java.util.concurrent.TimeoutException
> 	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:91)
> 	at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterPrepareView(CacheViewsManagerImpl.java:319)
> 	at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:250)
> 	at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:877)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> {code}
> viewId=10 is a retry and that succeeds quite quickly but the test is already ending about that time.
> It might be worth looking at the tracelogs since they're already there...
> 10K entries and 200 clients isn't such a big load ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira