[infinispan-dev] Hanging testsuite / bug in TestNG?

Dan Berindei dan.berindei at gmail.com
Tue Jun 26 00:36:53 EDT 2012


On Mon, Jun 25, 2012 at 10:48 PM, Sanne Grinovero <sanne at infinispan.org>wrote:

> On 25 June 2012 20:06, Dan Berindei <dan.berindei at gmail.com> wrote:
> > On Mon, Jun 25, 2012 at 5:58 PM, Sanne Grinovero <sanne at infinispan.org>
> > wrote:
> >>
> >> Hi all,
> >> the testsuite on master is regularly hanging for me, and I get all
> >> threads waiting while this one is spinning on the put() operation.
> >>
> >> java.lang.Thread.State: RUNNABLE
> >>        at java.util.HashMap.put(HashMap.java:374)
> >>        at org.testng.SuiteRunner.runTest(SuiteRunner.java:320)
> >>        at org.testng.SuiteRunner.access$000(SuiteRunner.java:34)
> >>        at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:351)
> >>        at
> >>
> org.testng.internal.thread.ThreadUtil$CountDownLatchedRunnable.run(ThreadUtil.java:147)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >>        at java.lang.Thread.run(Thread.java:662)
> >>
> >>
> >> Clearly the parallel testsuite is not synchronizing properly on the
> >> HashMap, this looks like a TestNG bug.
> >>
> >> So I've tried to update to latest 6.5.2; from the Changelog this looks
> >> like fixed, as is a very long list of other issues; not least
> >>
> >> "Fixed: Skipped tests were not always counted"
> >>
> >> which might address the strange numbers we had noticed on Jenkins.
> >>
> >
> > Excellent!
> >
> >
> >>
> >> But I'm still unable to run the full testsuite; what follows are the
> >> reasons from several different attempts:
> >>
> >>
> >> testng-SyncBasicSingleLockOptimisticTest) Exiting because
> >> lock.singlelock.SyncBasicSingleLockOptimisticTest has NOT shut down
> >> all the cache managers it has started !!!!!!!
> >>
> >> testng-SyncBasicSingleLockOptimisticTest) Exiting because
> >> lock.singlelock.SyncBasicSingleLockOptimisticTest has NOT shut down
> >> all the cache managers it has started !!!!!
> >>
> >> (testng-BasicSingleLockOptimisticTest) Exiting because
> >> lock.singlelock.BasicSingleLockOptimisticTest has NOT shut down all
> >> the cache managers it has started !!!!!!!
> >>
> >> (testng-BasicSingleLockOptimisticTest) Exiting because
> >> lock.singlelock.BasicSingleLockOptimisticTest has NOT shut down all
> >> the cache managers it has started !!!!!!!
> >>
> >> (testng-SyncBasicSingleLockOptimisticTest) Exiting because
> >> lock.singlelock.SyncBasicSingleLockOptimisticTest has NOT shut down
> >> all the cache managers it has started !!!!!!!
> >>
> >>
> >> Looking into the BasicSingleLockOptimisticTest I'm not seeing how this
> >> could be possible, so I'm wondering if this could be a problem related
> >> to the TestNG update? Any thoughts?
> >>
> >
> > Galder and I saw the same thing with another test in Jenkins:
> >
> https://issues.jboss.org/browse/ISPN-2117?focusedCommentId=12702926&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12702926
> >
> > To reiterate here, there was/is another concurrency issue in the
> > maven-surefire-plugin that caused it to sometimes skip @AfterMethod
> methods
> > after a failure. Galder upgraded the plugin to 2.12 hoping it would fix
> the
> > problem, but if you're still seeing I guess that wasn't enough...
>
> Thanks Dan! Strange that this happens to me consistently on the same
> test.. do you know what is supposed to trigger this @AfterMethod
> failure? Is there a reference to the Surefire issue?
>
>
I'm not sure why you can reproduce it so reliably, but it's probably
related to the fact that this test always fails for you (and it doesn't
fail for us).

I didn't find any related issue in the surefire JIRA, if you can still
reproduce it with maven-surefire-report-plugin 2.12 I'd say create a new
issue yourself and paste the ConcurrentModificationException stack trace
from your logs (or give it to me and I'll create the issue).

I had a quick look at the 2.12 source code and it seems like the problem is
still there: TestSetRunListener.writeTestOutput can still modify the list
of lines in testStdOut after a failure (because our caches keep on running
and logging stuff), and in parallel TestSetRunListener.testFailed will call
TestSetRunListener.getAsString, which will iterate over testStdOut. Since
there is no synchronization in either writeTestOutput or in getAsString,
it's possible to get a ConcurrentModificationException there.


Cheers
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20120626/53a99f04/attachment.html 


More information about the infinispan-dev mailing list