[jboss-jira] [JBoss JIRA] (AG-145) Active waiting deadlock in StampedCopyOnWriteArrayList

Thu Aug 6 11:02:00 EDT 2020

     [ https://issues.redhat.com/browse/AG-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luis Barreiro resolved AG-145.
------------------------------
    Fix Version/s: 1.9
       Resolution: Done

Thanks [~rornaz] for reporting this illusive bug! Race conditions are always hard to detect.

> Active waiting deadlock in StampedCopyOnWriteArrayList
> ------------------------------------------------------
>
>                 Key: AG-145
>                 URL: https://issues.redhat.com/browse/AG-145
>             Project: Agroal
>          Issue Type: Bug
>    Affects Versions: 1.8
>            Reporter: Rene Böing
>            Assignee: Luis Barreiro
>            Priority: Critical
>             Fix For: 1.9
>
>         Attachments: image-2020-07-31-08-39-28-630.png, image-2020-07-31-08-40-41-968.png
>
>
> While using agroal connection pool, we discovered some rare deadlock, which are causing 100% cpu on some threads. These deadlocks occur in the StampedCopyOnWriteArrayList class, when there is more than one thread trying to remove the same object.
>  
> A simple reproducer in junit (fails nearly every time on my machine):
>  
> {code:java}
> @Test
> public void testThis() {
>     ExecutorService service = Executors.newFixedThreadPool(10);
>     StampedCopyOnWriteArrayList<Object> list = new StampedCopyOnWriteArrayList<>(Object.class);
>     Object o = new Object();
>     list.add(new Object());
>     list.add(new Object());
>     list.add(new Object());
>     list.add(new Object());
>     list.add(o);
>     list.add(new Object());
>     List<Runnable> runnerList = new ArrayList<>(10);
>     List<Future> futureList = new ArrayList<>(10);
>     for (int i = 0; i < 10; i++) {
>         runnerList.add(new Runnable() {
>             @Override
>             public void run() {
>                 list.remove(o);
>                 System.out.println("Removed success!");
>             }
>         });
>     }
>     for (Runnable r : runnerList) {
>         futureList.add(service.submit(r));
>     }
>     for (Future r : futureList) {
>         try {
>             r.get(10000, TimeUnit.MILLISECONDS);
>         } catch (InterruptedException e) {
>             e.printStackTrace();
>         } catch (ExecutionException e) {
>             e.printStackTrace();
>         } catch (TimeoutException e) {
>             System.out.println("Seems like we have a deadlock!");
>         }
>     }
> }
> {code}
>  
> Originally this deadlock seems to occur, when agroal tries to flush a connection due to the config parameter
> <property name="hibernate.agroal.maxLifetime_m">60</property>
> If at the same time another thread using this connection calls session.close there is a possibility in the ConnectionPool.class getting called twice. The parameter goes through the following path:
> !image-2020-07-31-08-39-28-630.png!
>  
> The parallel session.close call does not find a checked_out connection and tries to flush it instead, hence two Threads are getting into the deadlock situation:
> !image-2020-07-31-08-40-41-968.png!
>  
> Kind regards,
> Rene

--
This message was sent by Atlassian Jira
(v7.13.8#713008)