[infinispan-dev] PFER lock aquisitions sometimes lasting longer than 0ms

Galder Zamarreno galder at redhat.com
Fri Feb 19 06:42:08 EST 2010


As a FYI. We've detected that sometimes PFER lock acquisition succeed  
after > 0ms, which is odd. We're looking into this.

------- Forwarded message -------
From: "Galder Zamarreno" <galder at redhat.com>
To: "Galder Zamarreno" <galder at redhat.com>
Cc:
Subject: Re: Fwd: Re: Random ConcurrentWriteTest.testManyUsers() failure
Date: Fri, 19 Feb 2010 09:59:46 +0100

As a fyi as well, OwnableReentrantStripedLockContainer is in use here
hence the lock impl is OwnableReentrantLock.

On Fri, 19 Feb 2010 09:42:55 +0100, Galder Zamarreno <galder at redhat.com>
wrote:

> Hi guys,
>
> As a fyi, I've been looking at a random Infinispan 2LC testsuite failure  
> for a while and based on the logs below, I suspect there might be a bug  
> in PFER.
>
> In 03, Thread 5 tries to acquire a lock for PFER and successfully  
> acquires it in 12, 5ms after the acquisition attempt is logged. PFER is  
> 0 lock timeout, so that, in theory, should have failed.
>
> I'm looking into this.
>
> In case you wanna check the logs as well, http://drop.io/ypmes4v  
> contains the logs.
>
> Cheers,
>
> ------- Forwarded message -------
> From: "Galder Zamarreno" <galder at redhat.com>
> To: "Brian Stansberry" <brian.stansberry at redhat.com>
> Cc:
> Subject: Re: Random ConcurrentWriteTest.testManyUsers() failure
> Date: Thu, 18 Feb 2010 12:28:42 +0100
>
> Hi again,
>
> I added more logging. It appears that PutFromLoadValidator is not 100%
> fail proof. This is what is happening in hibernate.log attached (I'll use
> the absolute times that are shorter):
>
> Note: @ signs are followed by system identity hashcodes to differentiate
> instances.
>
> 01. 20780: Thread 5: PutFromLoadValidator: Checks whether put is valid  
> and
> it's valid since it's in pending puts
> 02. 20781: Thread 5: Invokes PFER for Customer.contacts#3 with
> CollectionCacheEntry[]
> 03. 20781: Thread 5: Attempting to lock
> MarshalledValue(instance=org.hibernate.test.cache.infinispan.functional.Customer.contacts#3.
> 04. 20784: Thread 3: PutFromLoadValidator: Record recent removal
> 05. 20784: Thread 3: Within GlobalTransaction:<eq-35632>:7276, invoked
> with command RemoveCommand for Customer.contacts#3
> 06. 20784: Thread 3: Attempting to lock
> MarshalledValue(instance=org.hibernate.test.cache.infinispan.functional.Customer.contacts#3.
> 07. 20784: Thread 3: Successfully acquired lock!
> 08. 20784: Thread 3: Executes RemoveCommand on Customer.contacts#3
> 09. 20785: Thread 3: At commit time, Customer.contacts#3 is updated with
> removed=true valid=false changed=true created=false ...
> 10. 20785: Thread 3: Attempting to unlock Customer.contacts#3
> 12. 20786: Thread 5: Successfully acquired lock!
> 13. 20786: Thread 5: Executes PFER and updates Customer.contacts#3 with
> MarshalledValue(instance=CollectionCacheEntry[]; cachedHashCode=0;
> serialized=false)@c63847
> 14. 20794: Thread 3: Within GlobalTransaction:<eq-35632>:7280, calls get
> on Customer.contacts#3
> 15. 20794: Thread 3: Fetches from container and returns
> MarshalledValue(instance=CollectionCacheEntry[]; cachedHashCode=0;
> serialized=false)@c63847
>
> So, as you can see, the PFER and remove calls interleaved in such way  
> that
> putFromLoad() is allowed to get in while there's a collection  
> modification
> going on. Lock attempts/releases look ok. Now, can we record a  
> forthcoming
> removal as a result of a collection update earlier than where is  
> happening
> in the log? Maybe an option would be to instead of recording a removal of
> the collection cache, record when an entity is updated? We could do a
> pre-invalidation, similar to what happens with update timestamps after an
> entity is modified. This would reduce the chances of a situation like the
> one above happening.
>
> WDYT?
>
> On Thu, 18 Feb 2010 09:51:53 +0100, Galder Zamarreno <galder at redhat.com>
> wrote:
>
>> Hmmmm, looks very similar actually, if not exactly the same scenario.  
>> If I look more closely in the logs, I see:
>>
>> ...
>> 12:22:12,152: Thread 3 insets new Contact#73 but JDBC connection is not  
>> flushed yet!!!
>> 12:22:12,152: Thread 3 calls put on entity cache for Contact#73
>> 12:22:12,164: Thread 3 calls remove on Customer.contacts#3 // so that  
>> removes collection
>> ...
>> 12:22:12,189: Thread 5 updates Customer.contacts#3 to  
>> CollectionCacheEntry[] (via a PFER call)
>> ...
>>
>> Now, if read correctly, the validator should be stopping Thread 5 from  
>> loading an empty collection, since the invalidation period is 10  
>> seconds.
>>
>> I'll add more logging around this to see if I can see why the validator  
>> is not working correctly cos I did port the validator and the tests  
>> back in Sept/Oct
>>
>> Thanks for the pointer!
>>
>> On Wed, 17 Feb 2010 17:47:42 +0100, Brian Stansberry  
>> <brian.stansberry at redhat.com> wrote:
>>
>>> Sorry; still haven't been able to really dig in on this. Have you  
>>> looked at  
>>> http://opensource.atlassian.com/projects/hibernate/browse/HHH-3817 ?
>>>
>>> On 02/17/2010 06:45 AM, Galder Zamarreno wrote:
>>>> Hi Brian,
>>>>
>>>> So, I've removed the assertions from the 2nd transaction, which is the
>>>> one that checks that the contact exists after addition, or the contact
>>>> does not exist after the removal. However, the test does still  
>>>> randomly
>>>> fail. Let me explain:
>>>>
>>>> 12:22:12,098: Thread 3 removed Contact#64 from Customer#3  
>>>> successfully.
>>>> Customer#3 has no Contacts
>>>> 12:22:12,144: Thread 3 starts process to add new Contact to Customer#3
>>>> 12:22:12,152: Thread 3 insets new Contact#73 but JDBC connection is  
>>>> not
>>>> flushed yet!!!
>>>> 12:22:12,157: Thread 5 starts reading all Customer's 1st contact
>>>> 12:22:12,160: Thread 5 checks the Customer.contacts#3 collection cache
>>>> and finds nothing.
>>>> 12:22:12,160: Thread 5 checks Contacts table in database and finds  
>>>> nothing
>>>> 12:22:12,165: Thread 3 flushes JDBC connection and Contact#73 is now
>>>> seen by the rest
>>>> 12:22:12,189: Thread 5 updates Customer.contacts#3 to
>>>> CollectionCacheEntry[] (via a PFER call)
>>>> 12:22:12,196: Thread 3 uses the Session to get Customer#3's contact  
>>>> but
>>>> gets nothing, since CollectionCacheEntry[] is returned by the  
>>>> collection
>>>> cache.
>>>> 12:22:12,262: Thread 3 tries to remove Customer#3's contact, but since
>>>> there's none, it fails.
>>>>
>>>> The culprit would appear to me to be line in 12:22:12,189. If it  
>>>> doesn't
>>>> find any contacts, it initialises to CollectionCacheEntry[], which is
>>>> different to not storing anything at all. If nothing had been stored,
>>>> 12:22:12,196 wouldn't have returned empty but instead would have gone  
>>>> to
>>>> the database and retrieved it from there. However, we, in the 2LC, we
>>>> don't control what's stored, this comes from whatever Hibernate passes
>>>> us, via CollectionLoadContext. Not sure how this can be fixed...
>>>> Anything that u can think of?
>>>>
>>>> Cheers,
>>>>
>>>> On Thu, 28 Jan 2010 18:34:24 +0100, Brian Stansberry
>>>> <brian.stansberry at redhat.com> wrote:
>>>>
>>>>> Have fun on holidays! Better ping me about this when you get back.
>>>>>
>>>>> On 01/28/2010 10:01 AM, Galder Zamarreno wrote:
>>>>>>
>>>>>>
>>>>>> On 01/28/2010 03:46 PM, Brian Stansberry wrote:
>>>>>>> Quick response; I'll look more later: if that scenario is what is
>>>>>>> happening, the test isn't correct. There's no reason Thread 2 *in a
>>>>>>> second tx* should have any expectations.
>>>>>>
>>>>>> Thanks for the quick reply. There's no rush at all on this. Remember
>>>>>> that I'll be away on PTO next week. Back on the 8th.
>>>>>>
>>>>>> FYI, the expectations are checked in the contactExists() method that
>>>>>> gets called after removing.
>>>>>>
>>>>>>>
>>>>>>> On 01/28/2010 06:46 AM, Galder Zamarreno wrote:
>>>>>>>> Hi Brian,
>>>>>>>>
>>>>>>>> I've been looking at a random failure of
>>>>>>>> ConcurrentWriteTest.testManyUsers() in the Infinispan cache
>>>>>>>> provider and
>>>>>>>> I wanted to pass my suspicions through you to see what you think.
>>>>>>>> There's no immediate rush on this.
>>>>>>>>
>>>>>>>> Some notes:
>>>>>>>> Customer.contacts#2 refers to Customer 2's contacts collection  
>>>>>>>> cache.
>>>>>>>>
>>>>>>>> Sometimes the test can fail because of this:
>>>>>>>>
>>>>>>>> 1. 12:21:40,434: Thread 1 calls readEveryonesFirstContact
>>>>>>>> 2. 12:21:40,434: Thread 1 starts tx1
>>>>>>>> 3. 12:21:40,454: Thread 1 reads Contact#302 from Customer#2 from  
>>>>>>>> db
>>>>>>>> 4. 12:21:40,483: Thread 2 calls removeContact for Customer#2
>>>>>>>> 5. 12:21:40,483: Thread 2 starts tx2
>>>>>>>> 6. 12:21:40,485: Thread 2 deletes Contact#302 from Customer#2 in  
>>>>>>>> db.
>>>>>>>> 4. 12:21:40,491: Thread 2 acquires lock for updating
>>>>>>>> Customer.contacts#2
>>>>>>>> collection cache.
>>>>>>>> 5. 12:21:40,492: Thread 2 deletes Contact#302 from  
>>>>>>>> Customer.contacts#2
>>>>>>>> collection cache.
>>>>>>>> 6. 12:21:40,492: Thread 2 releases lock
>>>>>>>> 7. 12:21:40,492: Thread 2 commits tx2
>>>>>>>> 7. 12:21:40,493: Thread 1 acquires lock for updating
>>>>>>>> Customer.contacts#2
>>>>>>>> 8. 12:21:40,493: Thread 1 updates Customer.contacts#2 collection  
>>>>>>>> cache
>>>>>>>> with Contact#302.
>>>>>>>> 9. 12:21:40,493: Thread 1 commits tx1.
>>>>>>>> 10. 12:21:40,499: Thread 2 starts a tx and checks whether any  
>>>>>>>> Contacts
>>>>>>>> exist for Customer#2.
>>>>>>>> 10. 12:21:40,542: Thread 2 fails because it expects not to be any
>>>>>>>> Contacts but finds one in the cache that Thread 1 had put.
>>>>>>>>
>>>>>>>> It looks to me that this is a failure of the test itself. The fact
>>>>>>>> that
>>>>>>>> the remove and check is done in different transactions allows  
>>>>>>>> another
>>>>>>>> thread to come in the middle and load some data into the cache.
>>>>>>>>
>>>>>>>> WDYT? Do you think I'm reading this correctly?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> p.s. I've attached the full log in case you wanna inspect it.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>


-- 
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache




More information about the infinispan-dev mailing list