]
Dan Berindei updated ISPN-8110:
-------------------------------
Attachment: OffHeapBoundedSingleNodeTest_ISPN-5476_parallel_xsite_20170726.log.gz
OffHeapBoundedDataContainer.ensureSize() busy loop takes too long
-----------------------------------------------------------------
Key: ISPN-8110
URL:
https://issues.jboss.org/browse/ISPN-8110
Project: Infinispan
Issue Type: Task
Components: Core
Affects Versions: 9.1.0.Final
Reporter: Dan Berindei
Assignee: William Burns
Attachments:
OffHeapBoundedSingleNodeTest_ISPN-5476_parallel_xsite_20170726.log.gz
{{OffHeapBoundedDataContainer.ensureSize()}} needs to acquire the {{lruLock}} in order to
check the container size, and then it needs to acquire the write lock of the entry pointed
at by the LRU list head ({{firstAddress}}) in order to remove it.
Currently it tries to acquire the entry write lock with
{{locks.getLockFromHashCode(hashCode).writeLock().tryLock()}} while holding the
{{lruLock}}. But the entry's read-write lock is actually shared by a bucket of
entries, so another thread reading a completely different entry fail the {{tryLock}} and
will force {{ensureSize()}} to unlock {{lruLock}} and try again.
Even worse, a regular put will acquire the entry/bucket's write lock first, and then
try to acquire the {{lruLock}} while holding the bucket lock. Because {{lruLock}}
isn't fair, it's possible for 2 or more {{ensureSize}} threads to block a writer
thread for a long time.
{{OffHeapBoundedSingleNodeTest.testMultiThreaded}} shows a different pathological
scenario, one that I don't have an explanation for yet. At least with trace logging
enabled, the {{ensureSize()}} loop can repeat for 100+ milliseconds, but without any other
thread logging anything:
{noformat}
15:52:20,012 TRACE (ForkThread-3,Test:[]) [BoundedOffHeapDataContainer] Removing first
LRU node at 140339935456768
15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved long value
140339734124528 from address 140339935456784
15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Wrote long value 0 to address
140339734124536
15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnpooledOffHeapMemoryAllocator] Deallocating
off heap memory at 140339935456768 with 28 bytes. Total size: 8727
15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Copying memory of object null
offset by 140340069670800 to [B@38d5b85f offset by 16 with a total of 17 bytes
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnpooledOffHeapMemoryAllocator] Deallocating
off heap memory at 140340069670784 with 57 bytes. Total size: 8670
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved long value
140339868355200 from address 140339734124528
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [BoundedOffHeapDataContainer] Removing entry:
140339868355200 due to eviction due to size 102 being larger than maximum of 100
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null
offset by 140339868355216 to [B@5e23dca1 offset by 16 with a total of 17 bytes
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null
offset by 140339868355233 to [B@50382f4a offset by 16 with a total of 11 bytes
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved long value 0 from
address 140339868355208
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null
offset by 140339868355216 to [B@3b554f41 offset by 16 with a total of 17 bytes
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null
offset by 140339868355233 to [B@472da290 offset by 16 with a total of 11 bytes
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null
offset by 140339868355244 to [B@7aba800 offset by 16 with a total of 0 bytes
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null
offset by 140339868355244 to [B@167885c0 offset by 16 with a total of 13 bytes
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved long value
140339734124528 from address 140339868355200
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
...
15:52:20,147 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,147 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,147 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583
from address 140339734124552
15:52:20,147 TRACE (ForkThread-5,Test:[]) [BoundedOffHeapDataContainer] Removing first
LRU node at 140339734124528
{noformat}
Off-topic, it's a bit weird that {{UnsafeWrapper}} logs every address being read
from/written to, but the data container doesn't log the keys and values that it
reads/writes...
I propose (after discussing this with Will) that {{ensureSize()}} should acquire the
bucket lock after releasing the {{lruLock}}, so it can use {{lock()}} instead of
{{tryLock()}}.