[
http://jira.jboss.com/jira/browse/JBCACHE-679?page=comments#action_12345589 ]
Manik Surtani commented on JBCACHE-679:
---------------------------------------
Yegor,
Now that JBCACHE-785 has been fixed in Branch_JBossCache_1_4_0, could you try this again
with a build from that branch? Just trying to confirm where the problem is. Note that
the fix for JBCACHE-785 is not the same as the patch that Owen provided here.
Thanks,
Manik
Deadlock using transactional JBossCache with Hibernate
------------------------------------------------------
Key: JBCACHE-679
URL:
http://jira.jboss.com/jira/browse/JBCACHE-679
Project: JBoss Cache
Issue Type: Bug
Security Level: Public(Everyone can see)
Components: Clustering, Replication
Affects Versions: 1.3.0.SP2
Environment: SUSE 10.1, kernel 2.6.15 SPM, JDK 1.5.0_06, PostgreSQL 8.0.1
Reporter: Yegor Yenikyeyev
Assigned To: Manik Surtani
Priority: Critical
Attachments: nloptc.zip
It seems like we discovered an unpredictable Hibernate and/or TreeCache behavior after
upgrade JBossCache from 1.2.4SP2 to JBossCache 1.3.0SP2. For now I can witness that the
same problem appears with 1.4.0CR2. I do not think that it's Hibernate-only or
TreeCache-only issue but I do think it's a kind of integration issue or
misunderstanding of how TreeCache transaction isolation is implemented.
Our application works in clustered environment and we use JBossCache as L2 cache
solution for Hibernate 3.1.3 (I checked this with 3.2.0CR2 as well). Our settings for
JBossCache are
REPL_SYNC, READ_COMMITED and our target business object methods (f1 and f2) have
PROPAGATION_REQUIRED and PROPAGATION_REQUIRES_NEW. Our JDBC driver is 3.0 compliant.
Our objects hierarchy is like: Occasion contains link to Round and Round contains link to
Tournament. Round is NOT configured as "lazy" field in Occasion mapping b/c we
always need to have it initialized.
Here is in short what we try to do in our application:
(1) Transaction1: Call f1 (PROPAGATION_REQUIRED) method of a business object and it
causes Occasion1 to be loaded via a cachable query. After that Hibernate initializes
Occasion1.round field and loads Round1.
(2) Transaction1: Hibernate puts loaded Occasion1 and Round1 in L2 cache.
(3) Transaction1: TreeCache creates
com/companyname/Occasion/com.companyname.Occasion#1 region and obtains WriteLock (WL1)
(4) Transaction1: TreeCache creates com/companyname/Round/com.companyname.Round#1
region and obtains WriteLock (WL2)
(5) Transaction1: Do some business logic stuff
(6) Transaction1: We expect current transaction to be long and we want to change
status of Occasoin1 in DB very quickly. At this point we need an exclusive lock for
appropriate row in DB table to change the status and commit it. In order to do this we
call f2 (PROPAGATION_REQUIRES_NEW) which suppose to be a REALLY short transaction which
release lock on the DB row as fast as possible.
(7) Transaction2: Transaction1 SUSPENDED at this point. We call HibernateTemplate (we
use Spring as well) to load Occasion1 for update with LockMode.UPGRADE flag and get
exclusive lock.
(8) Transaction2: Hibernate does NOT check for an instance of Occasion1 in L2 cache (
I suppose it's b/c we obviously do want to lock it for update )
(9) Transaction2: Hibernate does check for an instance of Round1 in L2 cache and it
calls get() on TreeCache to obtain com/companyname/Round/com.companyname.Round#1
(10) Transaction2: At this point 1.3.0SP2 tries to obtain ReadLock for
com/companyname/Round/com.companyname.Round#1 and it can't b/c there is a WL for that
node in suspended Transaction1 !!! It can't obtain ReadLock for Round#1 anyhow!
(11) Transaction2: Stuck waiting for WL2 to be released in TreeCache but it can't be
released as soon as Transaction1 suspended and waits for Transaction2 to finish.
Obviously this situation is ridiculous - a legal sequence of operations causes a deadlock
on TreeCache. We do not expect com/companyname/Round/com.companyname.Round#1 to be visible
in Transaction2 b/c we use READ_COMMITED but WL2 must not affect Transaction2 in this
way. As soon as TreeCache prevents other transactions from reading
com/companyname/Round/com.companyname.Round#1 it must not tell other transactions that the
node exists to keep READ_COMMITED behavior consistent. For now it simply preventing
everybody from using PROPAGATION_REQUIRES_NEW.
The described scenario works with 1.2.4SP2 without a problem and I have serious concern
that READ_COMMITED strategy is really implemented in v1.2.4 but at least the behavior is
more consistent comparing to v1.3.0. As far as i understand this is result of JBCACHE-218
bugfix.
We tried to change PROPAGATION_REQUIRES_NEW to PROPAGATION_NESTED and take advantage of
nested transactions. We assume that com/companyname/Round/com.companyname.Round#1 would be
available in a nested Transaction2 from Transaction1. But PROPAGATION_NESTED isn't
supported by current JBossTransaction implementation (see line 209 in TxManager.java from
4.0.4.GA).
We could change isolation to READ_UNCOMMITED but it's simply impossible in many other
places of our application.
We could make a trick and load Occasion1 with Round1 in a separate Transaction0 before
starting Transaction1 but we HAVE to use LRU policy. That is why there is no chance for us
to make sure that eviction won't happen between Transaction0 and Transaction1. If it
happened then we are in the same situation as described above.
Finally we could stop using Transaction2 but our application is intend to handle large
amount of traffic and as soon as Transaction1 takes up to 3sec (comparing to 50ms for
Transaction2) we might get up to 700-1000 transactions on queue waiting for table row lock
to be released and we just can't allow this.
From what I see in Hibernate TreeCache sources and I have no idea how to avoid the
situation described above. One of my developers told me that probably it's possible to
put stuff into L2 cache on transaction commit which would decrease WL time and resolve the
issue with the deadlock. Honestly I'm seriously concerned how it applies to existing
Hibernate. I think small issues like performance issue of loading the same object during 1
transaction more then once can be resolved by using L1 cache or JDBC driver abilities. But
I guess there are a plenty of work to make this working for cachable queries.
Another option I see is to do a trick and put values for Round1 and Occasion1 into a new
region for Transaction2 if we know that Transaction1 suspended and owns WLs for various
nodes. I really do not like this way b/c in fact it's not a pure pessimistic locking.
But the issue described before is worse price for "pure" READ_COMMITED strategy.
In fact it showstopper assuming there is no way to use PROPAGATION_NESTED.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira