[JBoss JIRA] Created: (JBMESSAGING-1159) Deadlocks with ClusterRoundRobinDistributor
by Travis Brown (JIRA)
Deadlocks with ClusterRoundRobinDistributor
-------------------------------------------
Key: JBMESSAGING-1159
URL: http://jira.jboss.com/jira/browse/JBMESSAGING-1159
Project: JBoss Messaging
Issue Type: Bug
Components: Messaging Core Distributed Support
Affects Versions: Stable branch
Environment: Platform AS 4.2.0.GA_CP01, JBM snapshot_stable 12-Nov-2007, Oracle 10g R2, RedHat 4, Sun Java 1.5.0_10-b03
Reporter: Travis Brown
Assigned To: Tim Fox
Attachments: threaddumps.zip
During volume testing with 2 nodes I now see deadlocks in the thread dumps (platform AS 4.2.0.GA_CP01, JBM snapshot_stable 12-Nov-2007). My message processing comes to a halt, but I doubt it's fully related to these 6 threads (dealing with OutOfMemory too). Full thread dumps will be attached.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
17 years, 1 month
[JBoss JIRA] Created: (JBCACHE-923) Aquired node lock never released after transaction timeout
by Jacek Halat (JIRA)
Aquired node lock never released after transaction timeout
----------------------------------------------------------
Key: JBCACHE-923
URL: http://jira.jboss.com/jira/browse/JBCACHE-923
Project: JBoss Cache
Issue Type: Bug
Security Level: Public (Everyone can see)
Components: PojoCache
Affects Versions: 1.4.0.SP1
Environment: Windows XP, Weblogic 9.1
Reporter: Jacek Halat
Assigned To: Ben Wang
Priority: Critical
Under weblogic 9.1 (and probably earlier version too, not tested) in some situations locks aquired on nodes are never released. Only way to release locks is stop and start again treecache.
On WebLogic the rollback after a tx timeout is handled in a separate thread (and all registered transaction synchronization handlers are invoked in separated thread! In treecache information about InvocationContext (GlobalTransaction, Transaction etc) is stored in ThreadLocal. In this situation SynchronizationHandler is scrubbing NOT correct invocationContext with information about currend rolled-back transaction, but new created empty invocationContext !!!! Informations from TransactionTable are correctly removed, because mapping of transactions is not stored in ThreadLocal variable.
This caused unexpected effects:
Information about GlobalTransaction is not cleared and when in main thread TransactionTimedOutException is catched and put/get operations without explicity created transaction are invoked, TreeCache is trying aquire RL/WL on node and caller is set to non-exist and cleared GlobalTransaction. Lock is aquired but not added to LockTable! (and not released).
>From now every access to locked node failed, because RL/WL can't be aquired in 10 secs. Only way to get system working is stop and start again TreeCache.
2007-01-04 10:13:42,092 DEBUG - calling aftercompletion for GlobalTransaction:<192.168.141.1:2100>:1
2007-01-04 10:13:42,092 DEBUG - Running rollback phase
2007-01-04 10:13:42,092 DEBUG - running rollback for GlobalTransaction:<192.168.141.1:2100>:1
2007-01-04 10:13:42,092 DEBUG - PessimisticLockInterceptor invoked for method rollback(GlobalTransaction:<192.168.141.1:2100>:1)
2007-01-04 10:13:42,092 DEBUG - called to rollback cache with GlobalTransaction=GlobalTransaction:<192.168.141.1:2100>:1
2007-01-04 10:13:42,092 DEBUG - _remove(GlobalTransaction:<192.168.141.1:2100>:1, "/NODEA/NODEB", systemProperty)
2007-01-04 10:13:42,092 DEBUG - _remove(GlobalTransaction:<192.168.141.1:2100>:1, "/NODEA/NODEB")
2007-01-04 10:13:42,092 DEBUG - removed child NODEB
2007-01-04 10:13:42,092 DEBUG - releasing lock for /NODEA/NODEB (<unlocked>)
2007-01-04 10:13:42,092 DEBUG - releasing lock for /NODEA (read owners=[GlobalTransaction:<192.168.141.1:2100>:1])
2007-01-04 10:13:42,092 DEBUG - removing local transaction Name=FooManager.foo,Xid=BEA1-00068143BD8A02E8BAA1(4771853),Status=Rolled back. [Reason=weblogic.transaction.internal.TimedOutException: Transaction timed out after 20 seconds
BEA1-00068143BD8A02E8BAA1],numRepliesOwedMe=0,numRepliesOwedOthers=0,seconds since begin=30,seconds left=20,activeThread=Thread[Timer-5,5,Pooled Threads],XAServerResourceInfo[dataSource]=(ServerResourceInfo[dataSource]=(state=rolledback,assigned=AdminServer),xar=dataSource,re-Registered = false),SCInfo[sample_domain+AdminServer]=(state=rolledback),properties=({weblogic.transaction.name=FooManager.foo}),local properties=({weblogic.jdbc.jta.dataSource=[ No XAConnection is attached to this TxInfo ]}),OwnerTransactionManager=ServerTM[ServerCoordinatorDescriptor=(CoordinatorURL=AdminServer+10.1.1.218:9002+sample_domain+admin+, XAResources={dataSource, WLStore_sample_domain__WLS_AdminServer},NonXAResources={})],CoordinatorURL=AdminServer+10.1.1.218:9002+sample_domain+admin+) and global transaction GlobalTransaction:<192.168.141.1:2100>:1
2007-01-04 10:13:42,092 DEBUG - bypassed locking as method rollback() doesn't require locking
2007-01-04 10:13:42,092 DEBUG - Suppressing invocation of method rollback(GlobalTransaction:<192.168.141.1:2100>:1) on cache.
2007-01-04 10:13:42,092 DEBUG - Attempting to release locks on current thread. Lock table is {}
2007-01-04 10:13:42,092 DEBUG - Finished local commit/rollback method for GlobalTransaction:<192.168.141.1:2100>:1
2007-01-04 10:13:42,092 DEBUG - Finished rollback phase
(...)
2007-01-04 10:14:12,302 DEBUG - Non-tx and non crud meth
2007-01-04 10:14:37,310 DEBUG - (192.168.141.1:2100) call on method [_get(/NODEC, NODE_VALUE, true)]
2007-01-04 10:14:37,310 DEBUG - PessimisticLockInterceptor invoked for method _get(/NODEC, NODE_VALUE, true)
2007-01-04 10:14:37,310 DEBUG - Attempting to lock node /NODEC for owner GlobalTransaction:<192.168.141.1:2100>:1
2007-01-04 10:14:37,310 DEBUG - acquiring RL: fqn=/NODEC, caller=GlobalTransaction:<192.168.141.1:2100>:1, lock=<unlocked>
2007-01-04 10:14:37,310 DEBUG - acquired RL: fqn=/NODEC, caller=GlobalTransaction:<192.168.141.1:2100>:1, lock=read owners=[GlobalTransaction:<192.168.141.1:2100>:1]
2007-01-04 10:14:37,310 ERROR - transaction entry not found for (gtx=GlobalTransaction:<192.168.141.1:2100>:1)
2007-01-04 10:14:37,310 DEBUG - Invoking method _get(/NODEC, NODE_VALUE, true) on cache.
2007-01-04 10:14:37,310 DEBUG - _get("/NODEC", NODE_VALUE, "true")
2007-01-04 10:14:37,310 DEBUG - Attempting to release locks on current thread. Lock table is {}
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
17 years, 1 month
[JBoss JIRA] Created: (JBCACHE-1210) Add deprecated log message when using timeToIdleSeconds rather than timeToLiveSeconds
by Galder Zamarreno (JIRA)
Add deprecated log message when using timeToIdleSeconds rather than timeToLiveSeconds
--------------------------------------------------------------------------------------
Key: JBCACHE-1210
URL: http://jira.jboss.com/jira/browse/JBCACHE-1210
Project: JBoss Cache
Issue Type: Task
Security Level: Public (Everyone can see)
Components: Eviction
Affects Versions: 2.1.0.BETA1, 1.4.1.SP5
Reporter: Galder Zamarreno
Assigned To: Manik Surtani
Priority: Minor
Back in JBossCache 1.1.x, evictions where configured with timeToIdleSeconds
rather than timeToLiveSeconds (which was introduced in JBC 1.2)
Hibernate in Action book, which is 3 years old comes with examples using
timeToIdleSeconds which can mislead some customers to using this deprecate
parameter.
I suggest we had a deprecation message (INFO or WARN) if this parameter is
used to inform the customers of the deprecated use of it.
Potentially add something to docu?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
17 years, 1 month
[JBoss JIRA] Created: (JBCACHE-1211) Cache marshaller breaks when marshalling more than 32767 of "certain" objects in a single stream
by Manik Surtani (JIRA)
Cache marshaller breaks when marshalling more than 32767 of "certain" objects in a single stream
------------------------------------------------------------------------------------------------
Key: JBCACHE-1211
URL: http://jira.jboss.com/jira/browse/JBCACHE-1211
Project: JBoss Cache
Issue Type: Bug
Security Level: Public (Everyone can see)
Components: Replication
Affects Versions: 2.0.0.GA
Reporter: Manik Surtani
Assigned To: Manik Surtani
Priority: Critical
Fix For: 2.1.0.GA
This is a nasty. What started life as an optimisation for certain types of objects in a marshalled stream (Fqn, GlobalTransactio, String and Serializable) has become a major limitation in that a single stream can only hold up to 32767 different (not equal()) instances of such objects.
Basically the optimisation was, for example, instead of writing "hello" to a stream twice, just write it once and use a reference for all subsequent times. Unfortunately this reference was encoded as a short, hence the limitation of 32767.
Fixing this will definitely break wire compatibility with JBoss Cache 2.0.0, although JBC does allow backward compatibility by specifying replication version in your configuration, thanks to the VersionAwareMarshaller.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
17 years, 1 month
[JBoss JIRA] Created: (JBCACHE-1220) DataGravitatorInterceptor logs ERROR when alone in cluster
by Brian Stansberry (JIRA)
DataGravitatorInterceptor logs ERROR when alone in cluster
----------------------------------------------------------
Key: JBCACHE-1220
URL: http://jira.jboss.com/jira/browse/JBCACHE-1220
Project: JBoss Cache
Issue Type: Bug
Security Level: Public (Everyone can see)
Components: Clustering
Reporter: Brian Stansberry
Assigned To: Manik Surtani
Priority: Critical
Fix For: 1.4.1.SP7
Logging after making 1 request for a static page in a distributable webapp when the node was the only cluster member:
11:45:04,122 ERROR [DataGravitatorInterceptor] No replies to call _gravitateData; id:35(/JSESSION/localhost/ROOT/FC5F1D24DE0C89413E2D633331C6F030, true, false). Perhaps we're alone in the cluster?
11:45:04,128 ERROR [DataGravitatorInterceptor] No replies to call _gravitateData; id:35(/JSESSION/localhost/ROOT/FC5F1D24DE0C89413E2D633331C6F030, true, false). Perhaps we're alone in the cluster?
11:47:35,104 ERROR [DataGravitatorInterceptor] No replies to call _gravitateData; id:35(/JSESSION/localhost/ROOT/FC5F1D24DE0C89413E2D633331C6F030, true, false). Perhaps we're alone in the cluster?
11:47:35,107 ERROR [DataGravitatorInterceptor] No replies to call _gravitateData; id:35(/JSESSION/localhost/ROOT/FC5F1D24DE0C89413E2D633331C6F030, true, false). Perhaps we're alone in the cluster?
11:47:35,195 ERROR [DataGravitatorInterceptor] No replies to call _gravitateData; id:35(/JSESSION/localhost/ROOT/FC5F1D24DE0C89413E2D633331C6F030, true, false). Perhaps we're alone in the cluster?
11:47:35,198 ERROR [DataGravitatorInterceptor] No replies to call _gravitateData; id:35(/JSESSION/localhost/ROOT/FC5F1D24DE0C89413E2D633331C6F030, true, false). Perhaps we're alone in the cluster?
11:47:35,328 ERROR [DataGravitatorInterceptor] No replies to call _gravitateData; id:35(/JSESSION/localhost/ROOT/FC5F1D24DE0C89413E2D633331C6F030, true, false). Perhaps we're alone in the cluster?
11:47:35,331 ERROR [DataGravitatorInterceptor] No replies to call _gravitateData; id:35(/JSESSION/localhost/ROOT/FC5F1D24DE0C89413E2D633331C6F030, true, false). Perhaps we're alone in the cluster?
Problem is this in DataGravitatorInterceptor:
List resps = cache.callRemoteMethods(mbrs, dGrav, GroupRequest.GET_ALL, true, buddyManager.getBuddyCommunicationTimeout());
if (resps == null)
{
log.error("No replies to call " + dGrav + ". Perhaps we're alone in the cluster?");
return new Object[]{null, null};
}
TreeCache.callRemoteMethods *will* return null if "we're alone in the cluster".
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
17 years, 1 month