[infinispan-issues] [JBoss JIRA] (ISPN-8110) OffHeapBoundedDataContainer.ensureSize() busy loop takes too long

Dan Berindei (JIRA) issues at jboss.org
Thu Jul 27 07:58:00 EDT 2017


     [ https://issues.jboss.org/browse/ISPN-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Berindei updated ISPN-8110:
-------------------------------
    Attachment: OffHeapBoundedSingleNodeTest_ISPN-5476_parallel_xsite_20170726.log.gz


> OffHeapBoundedDataContainer.ensureSize() busy loop takes too long
> -----------------------------------------------------------------
>
>                 Key: ISPN-8110
>                 URL: https://issues.jboss.org/browse/ISPN-8110
>             Project: Infinispan
>          Issue Type: Task
>          Components: Core
>    Affects Versions: 9.1.0.Final
>            Reporter: Dan Berindei
>            Assignee: William Burns
>         Attachments: OffHeapBoundedSingleNodeTest_ISPN-5476_parallel_xsite_20170726.log.gz
>
>
> {{OffHeapBoundedDataContainer.ensureSize()}} needs to acquire the {{lruLock}} in order to check the container size, and then it needs to acquire the write lock of the entry pointed at by the LRU list head ({{firstAddress}}) in order to remove it.
> Currently it tries to acquire the entry write lock with {{locks.getLockFromHashCode(hashCode).writeLock().tryLock()}} while holding the {{lruLock}}. But the entry's read-write lock is actually shared by a bucket of entries, so another thread reading a completely different entry fail the {{tryLock}} and will force {{ensureSize()}} to unlock {{lruLock}} and try again.
> Even worse, a regular put will acquire the entry/bucket's write lock first, and then try to acquire the {{lruLock}} while holding the bucket lock. Because {{lruLock}} isn't fair, it's possible for 2 or more {{ensureSize}} threads to block a writer thread for a long time.
> {{OffHeapBoundedSingleNodeTest.testMultiThreaded}} shows a different pathological scenario, one that I don't have an explanation for yet. At least with trace logging enabled, the {{ensureSize()}} loop can repeat for 100+ milliseconds, but without any other thread logging anything:
> {noformat}
> 15:52:20,012 TRACE (ForkThread-3,Test:[]) [BoundedOffHeapDataContainer] Removing first LRU node at 140339935456768
> 15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved long value 140339734124528 from address 140339935456784
> 15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Wrote long value 0 to address 140339734124536
> 15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnpooledOffHeapMemoryAllocator] Deallocating off heap memory at 140339935456768 with 28 bytes.  Total size: 8727
> 15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140340069670800 to [B at 38d5b85f offset by 16 with a total of 17 bytes
> 15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnpooledOffHeapMemoryAllocator] Deallocating off heap memory at 140340069670784 with 57 bytes.  Total size: 8670
> 15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved long value 140339868355200 from address 140339734124528
> 15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,012 TRACE (ForkThread-5,Test:[]) [BoundedOffHeapDataContainer] Removing entry: 140339868355200 due to eviction due to size 102 being larger than maximum of 100
> 15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355216 to [B at 5e23dca1 offset by 16 with a total of 17 bytes
> 15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355233 to [B at 50382f4a offset by 16 with a total of 11 bytes
> 15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved long value 0 from address 140339868355208
> 15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355216 to [B at 3b554f41 offset by 16 with a total of 17 bytes
> 15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355233 to [B at 472da290 offset by 16 with a total of 11 bytes
> 15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355244 to [B at 7aba800 offset by 16 with a total of 0 bytes
> 15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355244 to [B at 167885c0 offset by 16 with a total of 13 bytes
> 15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved long value 140339734124528 from address 140339868355200
> 15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> ...
> 15:52:20,147 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,147 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,147 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
> 15:52:20,147 TRACE (ForkThread-5,Test:[]) [BoundedOffHeapDataContainer] Removing first LRU node at 140339734124528
> {noformat}
> Off-topic, it's a bit weird that {{UnsafeWrapper}} logs every address being read from/written to, but the data container doesn't log the keys and values that it reads/writes...
> I propose (after discussing this with Will) that {{ensureSize()}} should acquire the bucket lock after releasing the {{lruLock}}, so it can use {{lock()}} instead of {{tryLock()}}.



--
This message was sent by Atlassian JIRA
(v7.2.3#72005)


More information about the infinispan-issues mailing list