[JBoss JIRA] Commented: (JBCACHE-679) Deadlock using transactional JBossCache with Hibernate
by Yegor Yenikyeyev (JIRA)
[ http://jira.jboss.com/jira/browse/JBCACHE-679?page=comments#action_12344756 ]
Yegor Yenikyeyev commented on JBCACHE-679:
------------------------------------------
I would like to provide update from todays test.
Basically i'm not really sure that JBCACHE-679 and JBCACHE-785 are about the same problem. I applied the patch to hibernate 3.2 source and rebuilt it. Put updated jar into Jboss and ran my original test.
The symptoms are exactly the same as before: WL2 still in place after for the moment when TX2 tries to obtain write lock for that node.
Owen, Could you please try the patch with the sample i have attached? I guess it should better explain what i mean.
Thank you very much!
> Deadlock using transactional JBossCache with Hibernate
> ------------------------------------------------------
>
> Key: JBCACHE-679
> URL: http://jira.jboss.com/jira/browse/JBCACHE-679
> Project: JBoss Cache
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Clustering, Replication
> Affects Versions: 1.3.0.SP2
> Environment: SUSE 10.1, kernel 2.6.15 SPM, JDK 1.5.0_06, PostgreSQL 8.0.1
> Reporter: Yegor Yenikyeyev
> Assigned To: Manik Surtani
> Priority: Critical
> Attachments: nloptc.zip
>
>
> It seems like we discovered an unpredictable Hibernate and/or TreeCache behavior after upgrade JBossCache from 1.2.4SP2 to JBossCache 1.3.0SP2. For now I can witness that the same problem appears with 1.4.0CR2. I do not think that it's Hibernate-only or TreeCache-only issue but I do think it's a kind of integration issue or misunderstanding of how TreeCache transaction isolation is implemented.
> Our application works in clustered environment and we use JBossCache as L2 cache solution for Hibernate 3.1.3 (I checked this with 3.2.0CR2 as well). Our settings for JBossCache are
> REPL_SYNC, READ_COMMITED and our target business object methods (f1 and f2) have PROPAGATION_REQUIRED and PROPAGATION_REQUIRES_NEW. Our JDBC driver is 3.0 compliant.
> Our objects hierarchy is like: Occasion contains link to Round and Round contains link to Tournament. Round is NOT configured as "lazy" field in Occasion mapping b/c we always need to have it initialized.
>
> Here is in short what we try to do in our application:
>
> (1) Transaction1: Call f1 (PROPAGATION_REQUIRED) method of a business object and it causes Occasion1 to be loaded via a cachable query. After that Hibernate initializes Occasion1.round field and loads Round1.
> (2) Transaction1: Hibernate puts loaded Occasion1 and Round1 in L2 cache.
> (3) Transaction1: TreeCache creates com/companyname/Occasion/com.companyname.Occasion#1 region and obtains WriteLock (WL1)
> (4) Transaction1: TreeCache creates com/companyname/Round/com.companyname.Round#1 region and obtains WriteLock (WL2)
> (5) Transaction1: Do some business logic stuff
> (6) Transaction1: We expect current transaction to be long and we want to change status of Occasoin1 in DB very quickly. At this point we need an exclusive lock for appropriate row in DB table to change the status and commit it. In order to do this we call f2 (PROPAGATION_REQUIRES_NEW) which suppose to be a REALLY short transaction which release lock on the DB row as fast as possible.
>
> (7) Transaction2: Transaction1 SUSPENDED at this point. We call HibernateTemplate (we use Spring as well) to load Occasion1 for update with LockMode.UPGRADE flag and get exclusive lock.
> (8) Transaction2: Hibernate does NOT check for an instance of Occasion1 in L2 cache ( I suppose it's b/c we obviously do want to lock it for update )
> (9) Transaction2: Hibernate does check for an instance of Round1 in L2 cache and it calls get() on TreeCache to obtain com/companyname/Round/com.companyname.Round#1
> (10) Transaction2: At this point 1.3.0SP2 tries to obtain ReadLock for com/companyname/Round/com.companyname.Round#1 and it can't b/c there is a WL for that node in suspended Transaction1 !!! It can't obtain ReadLock for Round#1 anyhow!
> (11) Transaction2: Stuck waiting for WL2 to be released in TreeCache but it can't be released as soon as Transaction1 suspended and waits for Transaction2 to finish.
>
> Obviously this situation is ridiculous - a legal sequence of operations causes a deadlock on TreeCache. We do not expect com/companyname/Round/com.companyname.Round#1 to be visible in Transaction2 b/c we use READ_COMMITED but WL2 must not affect Transaction2 in this way. As soon as TreeCache prevents other transactions from reading com/companyname/Round/com.companyname.Round#1 it must not tell other transactions that the node exists to keep READ_COMMITED behavior consistent. For now it simply preventing everybody from using PROPAGATION_REQUIRES_NEW.
> The described scenario works with 1.2.4SP2 without a problem and I have serious concern that READ_COMMITED strategy is really implemented in v1.2.4 but at least the behavior is more consistent comparing to v1.3.0. As far as i understand this is result of JBCACHE-218 bugfix.
>
> We tried to change PROPAGATION_REQUIRES_NEW to PROPAGATION_NESTED and take advantage of nested transactions. We assume that com/companyname/Round/com.companyname.Round#1 would be available in a nested Transaction2 from Transaction1. But PROPAGATION_NESTED isn't supported by current JBossTransaction implementation (see line 209 in TxManager.java from 4.0.4.GA).
> We could change isolation to READ_UNCOMMITED but it's simply impossible in many other places of our application.
> We could make a trick and load Occasion1 with Round1 in a separate Transaction0 before starting Transaction1 but we HAVE to use LRU policy. That is why there is no chance for us to make sure that eviction won't happen between Transaction0 and Transaction1. If it happened then we are in the same situation as described above.
> Finally we could stop using Transaction2 but our application is intend to handle large amount of traffic and as soon as Transaction1 takes up to 3sec (comparing to 50ms for Transaction2) we might get up to 700-1000 transactions on queue waiting for table row lock to be released and we just can't allow this.
>
> From what I see in Hibernate TreeCache sources and I have no idea how to avoid the situation described above. One of my developers told me that probably it's possible to put stuff into L2 cache on transaction commit which would decrease WL time and resolve the issue with the deadlock. Honestly I'm seriously concerned how it applies to existing Hibernate. I think small issues like performance issue of loading the same object during 1 transaction more then once can be resolved by using L1 cache or JDBC driver abilities. But I guess there are a plenty of work to make this working for cachable queries.
> Another option I see is to do a trick and put values for Round1 and Occasion1 into a new region for Transaction2 if we know that Transaction1 suspended and owns WLs for various nodes. I really do not like this way b/c in fact it's not a pure pessimistic locking. But the issue described before is worse price for "pure" READ_COMMITED strategy. In fact it showstopper assuming there is no way to use PROPAGATION_NESTED.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
18 years, 3 months
[JBoss JIRA] Commented: (JBCACHE-679) Deadlock using transactional JBossCache with Hibernate
by Yegor Yenikyeyev (JIRA)
[ http://jira.jboss.com/jira/browse/JBCACHE-679?page=comments#action_12344748 ]
Yegor Yenikyeyev commented on JBCACHE-679:
------------------------------------------
Thank's for your feedback, Owen!
I'll give a try today and let you know if it helps.
> Deadlock using transactional JBossCache with Hibernate
> ------------------------------------------------------
>
> Key: JBCACHE-679
> URL: http://jira.jboss.com/jira/browse/JBCACHE-679
> Project: JBoss Cache
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Clustering, Replication
> Affects Versions: 1.3.0.SP2
> Environment: SUSE 10.1, kernel 2.6.15 SPM, JDK 1.5.0_06, PostgreSQL 8.0.1
> Reporter: Yegor Yenikyeyev
> Assigned To: Manik Surtani
> Priority: Critical
> Attachments: nloptc.zip
>
>
> It seems like we discovered an unpredictable Hibernate and/or TreeCache behavior after upgrade JBossCache from 1.2.4SP2 to JBossCache 1.3.0SP2. For now I can witness that the same problem appears with 1.4.0CR2. I do not think that it's Hibernate-only or TreeCache-only issue but I do think it's a kind of integration issue or misunderstanding of how TreeCache transaction isolation is implemented.
> Our application works in clustered environment and we use JBossCache as L2 cache solution for Hibernate 3.1.3 (I checked this with 3.2.0CR2 as well). Our settings for JBossCache are
> REPL_SYNC, READ_COMMITED and our target business object methods (f1 and f2) have PROPAGATION_REQUIRED and PROPAGATION_REQUIRES_NEW. Our JDBC driver is 3.0 compliant.
> Our objects hierarchy is like: Occasion contains link to Round and Round contains link to Tournament. Round is NOT configured as "lazy" field in Occasion mapping b/c we always need to have it initialized.
>
> Here is in short what we try to do in our application:
>
> (1) Transaction1: Call f1 (PROPAGATION_REQUIRED) method of a business object and it causes Occasion1 to be loaded via a cachable query. After that Hibernate initializes Occasion1.round field and loads Round1.
> (2) Transaction1: Hibernate puts loaded Occasion1 and Round1 in L2 cache.
> (3) Transaction1: TreeCache creates com/companyname/Occasion/com.companyname.Occasion#1 region and obtains WriteLock (WL1)
> (4) Transaction1: TreeCache creates com/companyname/Round/com.companyname.Round#1 region and obtains WriteLock (WL2)
> (5) Transaction1: Do some business logic stuff
> (6) Transaction1: We expect current transaction to be long and we want to change status of Occasoin1 in DB very quickly. At this point we need an exclusive lock for appropriate row in DB table to change the status and commit it. In order to do this we call f2 (PROPAGATION_REQUIRES_NEW) which suppose to be a REALLY short transaction which release lock on the DB row as fast as possible.
>
> (7) Transaction2: Transaction1 SUSPENDED at this point. We call HibernateTemplate (we use Spring as well) to load Occasion1 for update with LockMode.UPGRADE flag and get exclusive lock.
> (8) Transaction2: Hibernate does NOT check for an instance of Occasion1 in L2 cache ( I suppose it's b/c we obviously do want to lock it for update )
> (9) Transaction2: Hibernate does check for an instance of Round1 in L2 cache and it calls get() on TreeCache to obtain com/companyname/Round/com.companyname.Round#1
> (10) Transaction2: At this point 1.3.0SP2 tries to obtain ReadLock for com/companyname/Round/com.companyname.Round#1 and it can't b/c there is a WL for that node in suspended Transaction1 !!! It can't obtain ReadLock for Round#1 anyhow!
> (11) Transaction2: Stuck waiting for WL2 to be released in TreeCache but it can't be released as soon as Transaction1 suspended and waits for Transaction2 to finish.
>
> Obviously this situation is ridiculous - a legal sequence of operations causes a deadlock on TreeCache. We do not expect com/companyname/Round/com.companyname.Round#1 to be visible in Transaction2 b/c we use READ_COMMITED but WL2 must not affect Transaction2 in this way. As soon as TreeCache prevents other transactions from reading com/companyname/Round/com.companyname.Round#1 it must not tell other transactions that the node exists to keep READ_COMMITED behavior consistent. For now it simply preventing everybody from using PROPAGATION_REQUIRES_NEW.
> The described scenario works with 1.2.4SP2 without a problem and I have serious concern that READ_COMMITED strategy is really implemented in v1.2.4 but at least the behavior is more consistent comparing to v1.3.0. As far as i understand this is result of JBCACHE-218 bugfix.
>
> We tried to change PROPAGATION_REQUIRES_NEW to PROPAGATION_NESTED and take advantage of nested transactions. We assume that com/companyname/Round/com.companyname.Round#1 would be available in a nested Transaction2 from Transaction1. But PROPAGATION_NESTED isn't supported by current JBossTransaction implementation (see line 209 in TxManager.java from 4.0.4.GA).
> We could change isolation to READ_UNCOMMITED but it's simply impossible in many other places of our application.
> We could make a trick and load Occasion1 with Round1 in a separate Transaction0 before starting Transaction1 but we HAVE to use LRU policy. That is why there is no chance for us to make sure that eviction won't happen between Transaction0 and Transaction1. If it happened then we are in the same situation as described above.
> Finally we could stop using Transaction2 but our application is intend to handle large amount of traffic and as soon as Transaction1 takes up to 3sec (comparing to 50ms for Transaction2) we might get up to 700-1000 transactions on queue waiting for table row lock to be released and we just can't allow this.
>
> From what I see in Hibernate TreeCache sources and I have no idea how to avoid the situation described above. One of my developers told me that probably it's possible to put stuff into L2 cache on transaction commit which would decrease WL time and resolve the issue with the deadlock. Honestly I'm seriously concerned how it applies to existing Hibernate. I think small issues like performance issue of loading the same object during 1 transaction more then once can be resolved by using L1 cache or JDBC driver abilities. But I guess there are a plenty of work to make this working for cachable queries.
> Another option I see is to do a trick and put values for Round1 and Occasion1 into a new region for Transaction2 if we know that Transaction1 suspended and owns WLs for various nodes. I really do not like this way b/c in fact it's not a pure pessimistic locking. But the issue described before is worse price for "pure" READ_COMMITED strategy. In fact it showstopper assuming there is no way to use PROPAGATION_NESTED.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
18 years, 3 months
[JBoss JIRA] Updated: (BPEL-9) fault handling
by Alejandro Guizar (JIRA)
[ http://jira.jboss.com/jira/browse/BPEL-9?page=all ]
Alejandro Guizar updated BPEL-9:
--------------------------------
Description:
-Implement the state transitions for scope instances when faulted signals are received
-Execute fault handling logic
was:
-Implement the state transitions for scope instances when cancel or faulted signals are received
-Provide the fault catching logic
> fault handling
> --------------
>
> Key: BPEL-9
> URL: http://jira.jboss.com/jira/browse/BPEL-9
> Project: JBoss jBPM BPEL
> Issue Type: Sub-task
> Components: Engine
> Affects Versions: jBPM BPEL 1.0 alpha 4
> Reporter: Juan Cantu
> Assigned To: Alejandro Guizar
> Fix For: jBPM BPEL 1.1
>
>
> -Implement the state transitions for scope instances when faulted signals are received
> -Execute fault handling logic
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
18 years, 3 months