|
Suppose you have two entities: Parent and Child. Each has some value-typed properties, and Parent has a one-to-many set of Children. The one-to-many and the Child entity itself are cached read-write in the second level cache.
Now suppose you have two threads. One thread runs the following code in a tight loop:
-
Open a new session and transaction.
-
Load the parent.
-
Call parent.getChildren().size().
-
Roll back the transaction and close the session.
The other thread runs the following in a tight loop:
-
Open a new session and transaction.
-
Load the parent.
-
Call parent.getChildren() to fetch the existing children.
-
Remove them from the set and delete them.
-
Add some new children to the set and persist them.
-
Commit the transaction and close the session.
Given enough runs, I'm pretty sure the first thread's call to parent.getChildren().size() will throw an EntityNotFoundException. Why? Because collection initialization is a two-step operation when the collection is cached:
-
First, the cached Child IDs are read from the cache.
-
Second, each Child ID is resolved and loaded from the cache.
It's possible for the second thread to complete its entire body of work in between these steps, in which case the second step will fail to find a Child that the second thread just deleted, triggering an EntityNotFoundException.
As far as I can tell, only the SERIALIZABLE transaction isolation level can prevent this, at a major cost to concurrent throughput. Even REPEATABLE_READ would allow this to happen. And the issue goes away if the collection wasn't cached: after all, the collection is lazy by default, so if it weren't cached Hibernate would issue a single "SELECT * from CHILDREN where PARENT_ID=?" to the database and atomically fetch the entire collection. So by caching the collection we're losing that atomicity.
We have an application that has threads that behave like this, so we'd really like to see collection initialization become atomic.
|