[JBoss JIRA] (ISPN-1513) Enhance distributed executor framework to be more topology aware
by David B (Created) (JIRA)
Enhance distributed executor framework to be more topology aware
----------------------------------------------------------------
Key: ISPN-1513
URL: https://issues.jboss.org/browse/ISPN-1513
Project: Infinispan
Issue Type: Enhancement
Components: Distributed Cache
Affects Versions: 5.0.1.FINAL
Reporter: David B
Assignee: Manik Surtani
Priority: Minor
Our environment has 2 local infinispan/jgroups clusters with a jgroups relay cluster to handle geographic failover. Our sites are geographically distant over a WAN. Currently DistributedExecutorService's submitEverywhere() sends Callables to every node in both local clusters. We would rather have additional methods provided to DistributedExecutorService to constrain submission on Callables to the same/local site.
Currently I have extended DefaultExecutorService with my own TopologyAwareExecutorService and added a submitSameSite() method using the TopologyAwareAddress.isSameSite(). I did need to patch DistributedRunnableFuture in DefaultExecutorService to mark it protected vs. private.
This could be extended to also provide submitSameRack() & submitSameMachine() though currently we don't have a use case for that.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months
[JBoss JIRA] (ISPN-2347) SuspectedException while replaying remote commands will cause all subsequent txs to fail.
by Erik Salter (JIRA)
Erik Salter created ISPN-2347:
---------------------------------
Summary: SuspectedException while replaying remote commands will cause all subsequent txs to fail.
Key: ISPN-2347
URL: https://issues.jboss.org/browse/ISPN-2347
Project: Infinispan
Issue Type: Bug
Components: Cross-Site Replication
Affects Versions: 5.2.0.Alpha4
Reporter: Erik Salter
Assignee: Mircea Markus
If a suspected exception (or any exception) happens when the BackupCacheUpdater (on the bridge end) is applying modifications in replayModifications, the tx is automatically rolled back.
However, the subsequent remote command will cause the TransactionManager to throw an exception with:
javax.transaction.NotSupportedException: BaseTransaction.checkTransactionState - ARJUNA016051: thread is already associated with a transaction!
After that, no remote transaction will succeed. If another site uses this site with the backup policy of SYNC, no writes will succeed.
The trace log can be found at: https://dl.dropbox.com/u/50401510/5.2.0.ALPHA4/xsite_erm/10.30.12.158/ser... (Also note that visitCommitCommand has the wrong trace message, and we should really have a trace for visitRollbackCommand)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months
[JBoss JIRA] (ISPN-2316) Distributed deadlock in StateTransferInterceptor
by Radim Vansa (JIRA)
Radim Vansa created ISPN-2316:
---------------------------------
Summary: Distributed deadlock in StateTransferInterceptor
Key: ISPN-2316
URL: https://issues.jboss.org/browse/ISPN-2316
Project: Infinispan
Issue Type: Feature Request
Components: State transfer, Transactions
Affects Versions: 5.2.0.Alpha3
Reporter: Radim Vansa
Assignee: Mircea Markus
When using transactions, a distributed deadlock may occur when a node is joining under these circumstances:
1) the new node requests transactions using GET_TRANSACTIONS
2) the old node tries to commit a transaction, broadcasting PrepareCommand - in StateTransferIntreceptor it locks the transactionLock in shared way
3) the request GET_TRANSACTIONS comes on the new node, the node is waiting for the transactionLock (it requires it exclusively)
4) transaction commit on new node is somehow delayed (for one minute) in the interceptor chain between InvocationContextInterceptor and OptimisticLockingInterceptor (it looks like the only place is waiting for StateTransferInterceptor transactionLock (shared), however, I cannot find any trace that it is held by anyone)
After the transaction commit times out on old node releasing the lock, the GET_TRANSACTION request may continue, but the state transfer itself can also timeout if not set properly longer.
Note that the transaction commit continues on the new node after the ST times out, until it is found invalid (rolled back).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months
[JBoss JIRA] (ISPN-2340) AsyncStore.remove() load the bucket from store inducing performance problem
by Christophe Domas (JIRA)
Christophe Domas created ISPN-2340:
--------------------------------------
Summary: AsyncStore.remove() load the bucket from store inducing performance problem
Key: ISPN-2340
URL: https://issues.jboss.org/browse/ISPN-2340
Project: Infinispan
Issue Type: Bug
Components: Loaders and Stores
Affects Versions: 5.2.0.Alpha4
Reporter: Christophe Domas
Assignee: Mircea Markus
I'm currently using infinispan-5.1.5 as a local persistent cache (like a persistent map) with no eviction and write-behind for maximum performance.
I tried the 5.2.0alpha4 version and item processing mean decreased from 4ms to 60 ms.
So I profiled the application and discovered that the Async.remove(Object) implementation differs from 5.2 to 5.1:
+5.1+:
{code}
@Override
public boolean remove(Object key) {
enqueue(new Remove(key));
return true;
}
{code}
+5.2+:
{code}
@Override
public boolean remove(Object key) {
try {
InternalCacheEntry load = load(key);
if (load != null) {
enqueue(new Remove(key));
return true;
}
return false;
} catch (CacheLoaderException e) {
throw new CacheException("Could not load key/value entries from cacheloader", e);
}
}
{code}
So each time I remove an entry from my cache, I pay the cost of a bucket load from the store:
{code}
Thread [pool-9-thread-1] (Suspended)
ExtendedRiverUnmarshaller(RiverUnmarshaller).doReadObject(boolean) line: 209
ExtendedRiverUnmarshaller(AbstractObjectInput).readObject() line: 37
JBossMarshaller(AbstractJBossMarshaller).objectFromObjectStream(ObjectInput) line: 163
VersionAwareMarshaller.objectFromObjectStream(ObjectInput) line: 190
CacheMarshaller(AbstractDelegatingMarshaller).objectFromObjectStream(ObjectInput) line: 79
FileCacheStore.objectFromInputStreamInReentrantMode(InputStream) line: 433
FileCacheStore.loadBucket(File) line: 306
FileCacheStore.loadBucket(Integer) line: 280
FileCacheStore(BucketBasedCacheStore).loadLockSafe(Object, Integer) line: 60
FileCacheStore(BucketBasedCacheStore).loadLockSafe(Object, Object) line: 49
FileCacheStore(LockSupportCacheStore<L>).load(Object) line: 142
AsyncStore(AbstractDelegatingStore).load(Object) line: 120
AsyncStore.remove(Object) line: 163
CacheStoreInterceptor.visitRemoveCommand(InvocationContext, RemoveCommand) line: 221
RemoveCommand.acceptVisitor(InvocationContext, Visitor) line: 72
CacheLoaderInterceptor(CommandInterceptor).invokeNextInterceptor(InvocationContext, VisitableCommand) line: 118
CacheLoaderInterceptor.visitRemoveCommand(InvocationContext, RemoveCommand) line: 138
RemoveCommand.acceptVisitor(InvocationContext, Visitor) line: 72
EntryWrappingInterceptor(CommandInterceptor).invokeNextInterceptor(InvocationContext, VisitableCommand) line: 118
EntryWrappingInterceptor.invokeNextAndApplyChanges(InvocationContext, FlagAffectedCommand) line: 211
EntryWrappingInterceptor.visitRemoveCommand(InvocationContext, RemoveCommand) line: 161
RemoveCommand.acceptVisitor(InvocationContext, Visitor) line: 72
NonTransactionalLockingInterceptor(CommandInterceptor).invokeNextInterceptor(InvocationContext, VisitableCommand) line: 118
NonTransactionalLockingInterceptor.visitRemoveCommand(InvocationContext, RemoveCommand) line: 104
RemoveCommand.acceptVisitor(InvocationContext, Visitor) line: 72
NotificationInterceptor(CommandInterceptor).invokeNextInterceptor(InvocationContext, VisitableCommand) line: 118
NotificationInterceptor(CommandInterceptor).handleDefault(InvocationContext, VisitableCommand) line: 132
NotificationInterceptor(AbstractVisitor).visitRemoveCommand(InvocationContext, RemoveCommand) line: 68
RemoveCommand.acceptVisitor(InvocationContext, Visitor) line: 72
IsMarshallableInterceptor(CommandInterceptor).invokeNextInterceptor(InvocationContext, VisitableCommand) line: 118
IsMarshallableInterceptor(CommandInterceptor).handleDefault(InvocationContext, VisitableCommand) line: 132
IsMarshallableInterceptor(AbstractVisitor).visitRemoveCommand(InvocationContext, RemoveCommand) line: 68
IsMarshallableInterceptor.visitRemoveCommand(InvocationContext, RemoveCommand) line: 125
RemoveCommand.acceptVisitor(InvocationContext, Visitor) line: 72
InvocationContextInterceptor(CommandInterceptor).invokeNextInterceptor(InvocationContext, VisitableCommand) line: 118
InvocationContextInterceptor.handleAll(InvocationContext, VisitableCommand) line: 129
InvocationContextInterceptor.handleDefault(InvocationContext, VisitableCommand) line: 93
InvocationContextInterceptor(AbstractVisitor).visitRemoveCommand(InvocationContext, RemoveCommand) line: 68
RemoveCommand.acceptVisitor(InvocationContext, Visitor) line: 72
InterceptorChain.invoke(InvocationContext, VisitableCommand) line: 347
CacheImpl<K,V>.executeCommandAndCommitIfNeeded(InvocationContext, VisitableCommand) line: 999
CacheImpl<K,V>.remove(Object, EnumSet<Flag>, ClassLoader) line: 290
DecoratedCache<K,V>.remove(Object) line: 324
CacheBidiMap<K,V>(AbstractDualBidiMap<K,V>).remove(Object) line: 199
OrderCache.removePendingByClientId(String) line: 168
OrderEventCacheManager(EventCacheManager).receiveAck(OrderId) line: 166
OrderEventCacheManager(EventCacheManager).cacheEventNotification(IamapiEvent, OrderId) line: 101
OrderMarketFixService.onOrderEvent(OrderEvent) line: 88
OrderEventDispatcher.fireEvent(OrderEvent) line: 52
OrderEventDispatcher.run() line: 67
ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1110
ThreadPoolExecutor$Worker.run() line: 603
Thread.run() line: 722
{code}
My config file is:
{code}
<default>
<loaders passivation="false" preload="true" shared="false">
<loader class="org.infinispan.loaders.file.FileCacheStore">
<properties>
<property name="location" value="cache" />
<property name="fsyncMode" value="perWrite" />
</properties>
<async enabled="true" />
</loader>
</loaders>
<expiration reaperEnabled="false" />
</default>
{code}
(I use perWrite for fsyncMode because we reached linux open files ulimit maximum)
We cannot consider that as a bug, but this performance problem is a show-stopper for us to upgrade to 5.2.
Best regards,
Christophe
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months
[JBoss JIRA] (ISPN-1827) De-couple cache view installation and state transfer (consistent hash installation)s
by Dan Berindei (JIRA)
Dan Berindei created ISPN-1827:
----------------------------------
Summary: De-couple cache view installation and state transfer (consistent hash installation)s
Key: ISPN-1827
URL: https://issues.jboss.org/browse/ISPN-1827
Project: Infinispan
Issue Type: Task
Components: State transfer
Affects Versions: 5.1.0.FINAL
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 5.2.0.FINAL
We need to know the primary owner of a key in order to do pretty much anything, and every member of a cache view should compute the same primary owner at all times. So we need a 2PC cache view installation immediately after any leave to ensure that every node determines the primary owner in the same way - we can't coalesce leaves.
However, it's highly desirable to coalesce state transfers caused by a node leaving - perhaps because we are shutting down half of the cluster to do an upgrade. So we should separate the state transfer from the cache view installation, and each one should have its own 2PC process.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months