[infinispan-issues] [JBoss JIRA] (ISPN-8110) OffHeapBoundedDataContainer.ensureSize() busy loop takes too long

Dan Berindei (JIRA) issues at jboss.org
Thu Jul 27 04:01:00 EDT 2017


Dan Berindei created ISPN-8110:
----------------------------------

             Summary: OffHeapBoundedDataContainer.ensureSize() busy loop takes too long
                 Key: ISPN-8110
                 URL: https://issues.jboss.org/browse/ISPN-8110
             Project: Infinispan
          Issue Type: Task
          Components: Core
    Affects Versions: 9.1.0.Final
            Reporter: Dan Berindei
            Assignee: William Burns


{{OffHeapBoundedDataContainer.ensureSize()}} needs to acquire the {{lruLock}} in order to check the container size, and then it needs to acquire the write lock of the entry pointed at by the LRU list head ({{firstAddress}}) in order to remove it.

Currently it tries to acquire the entry write lock with {{locks.getLockFromHashCode(hashCode).writeLock().tryLock()}} while holding the {{lruLock}}. But the entry's read-write lock is actually shared by a bucket of entries, so another thread reading a completely different entry fail the {{tryLock}} and will force {{ensureSize()}} to unlock {{lruLock}} and try again.

Even worse, a regular put will acquire the entry/bucket's write lock first, and then try to acquire the {{lruLock}} while holding the bucket lock. Because {{lruLock}} isn't fair, it's possible for 2 or more {{ensureSize}} threads to block a writer thread for a long time.

{{OffHeapBoundedSingleNodeTest.testMultiThreaded}} shows a different pathological scenario, one that I don't have an explanation for yet. At least with trace logging enabled, the {{ensureSize()}} loop can repeat for 100+ milliseconds, but without any other thread logging anything:

{noformat}
15:52:20,012 TRACE (ForkThread-3,Test:[]) [BoundedOffHeapDataContainer] Removing first LRU node at 140339935456768
15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved long value 140339734124528 from address 140339935456784
15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Wrote long value 0 to address 140339734124536
15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnpooledOffHeapMemoryAllocator] Deallocating off heap memory at 140339935456768 with 28 bytes.  Total size: 8727
15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140340069670800 to [B at 38d5b85f offset by 16 with a total of 17 bytes
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,012 TRACE (ForkThread-3,Test:[]) [UnpooledOffHeapMemoryAllocator] Deallocating off heap memory at 140340069670784 with 57 bytes.  Total size: 8670
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved long value 140339868355200 from address 140339734124528
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [BoundedOffHeapDataContainer] Removing entry: 140339868355200 due to eviction due to size 102 being larger than maximum of 100
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355216 to [B at 5e23dca1 offset by 16 with a total of 17 bytes
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355233 to [B at 50382f4a offset by 16 with a total of 11 bytes
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved long value 0 from address 140339868355208
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355216 to [B at 3b554f41 offset by 16 with a total of 17 bytes
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355233 to [B at 472da290 offset by 16 with a total of 11 bytes
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355244 to [B at 7aba800 offset by 16 with a total of 0 bytes
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Copying memory of object null offset by 140339868355244 to [B at 167885c0 offset by 16 with a total of 13 bytes
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,012 TRACE (ForkThread-5,Test:[]) [UnsafeWrapper] Retrieved long value 140339734124528 from address 140339868355200
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,012 TRACE (ForkThread-2,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
...
15:52:20,147 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,147 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,147 TRACE (ForkThread-3,Test:[]) [UnsafeWrapper] Retrieved int value -1075141583 from address 140339734124552
15:52:20,147 TRACE (ForkThread-5,Test:[]) [BoundedOffHeapDataContainer] Removing first LRU node at 140339734124528
{noformat}

Off-topic, it's a bit weird that {{UnsafeWrapper}} logs every address being read from/written to, but the data container doesn't log the keys and values that it reads/writes...

I propose (after discussing this with Will) that {{ensureSize()}} should acquire the bucket lock after releasing the {{lruLock}}, so it can use {{lock()}} instead of {{tryLock()}}.



--
This message was sent by Atlassian JIRA
(v7.2.3#72005)


More information about the infinispan-issues mailing list