[JBoss JIRA] Updated: (ISPN-902) Data consistency across rehashing
by Erik Salter (JIRA)
[ https://issues.jboss.org/browse/ISPN-902?page=com.atlassian.jira.plugin.s... ]
Erik Salter updated ISPN-902:
-----------------------------
Attachment: cacheTest.zip
> Data consistency across rehashing
> ---------------------------------
>
> Key: ISPN-902
> URL: https://issues.jboss.org/browse/ISPN-902
> Project: Infinispan
> Issue Type: Bug
> Reporter: Erik Salter
> Assignee: Manik Surtani
> Priority: Critical
> Attachments: cacheTest.zip
>
>
> There are two scenarios we're seeing on rehashing, both of which are critical.
> 1. On a node leaving a running cluster, we're seeing an inordinate amount of timeout errors, such as the one below. The end result of this is that the cluster ends up losing data.
> org.infinispan.util.concurrent.TimeoutException: Timed out waiting for valid responses!
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:417)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:101)
> at org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(DistributionManagerImpl.java:341)
> at org.infinispan.interceptors.DistributionInterceptor.realRemoteGet(DistributionInterceptor.java:143)
> at org.infinispan.interceptors.DistributionInterceptor.remoteGetAndStoreInL1(DistributionInterceptor.java:131)
> 06:07:44,097 WARN [GMS] cms-node-20192: merge leader did not get data from all partition coordinators [cms-node-20192, mydht1-18445], merge is cancelled at org.infinispan.commands.read.GetKeyValueCommand.acceptVisitor(GetKeyValueCommand.java:59)
> 2. Joining a node into a running cluster causes transactional failures on the other nodes. Most of the time, depending on the load, a node can take upwards of 8 minutes to join.
> I've attached a unit test that can reproduce these issues.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 11 months
[JBoss JIRA] Created: (ISPN-879) Stale locks held when transactions are aborted by a transaction manager
by Manik Surtani (JIRA)
Stale locks held when transactions are aborted by a transaction manager
-----------------------------------------------------------------------
Key: ISPN-879
URL: https://issues.jboss.org/browse/ISPN-879
Project: Infinispan
Issue Type: Bug
Components: Locking and Concurrency, Transactions
Affects Versions: 4.2.0.Final
Reporter: Manik Surtani
Assignee: Manik Surtani
Priority: Blocker
Fix For: 4.2.1.Final
This is an issue with the way the LockManager attempts to acquire locks. The crux of the problem is that when the LM waits on a lock that is already held, while within the context of a transaction, the transaction manager may decide to abort that transaction. This could be for various reasons, including a transaction timing out. Typically this would involve the transaction manager calling XAResource.end() followed by XAResource.rollback(). Unfortunately, this will almost always happen on a separate maintenance thread, and when the transaction is cleaned up, the user's original thread - still waiting for a lock - is not notified. But the transaction is rolled back and as far as the transaction manager is concerned, all is well.
Subsequently, when the lock is released, the original thread waiting on the lock acquires the lock, and continues. However this transaction never completes since it has already "rolled back", leaving a stale lock in the system.
The solution is after a lock is acquired, always check if it is in a transactional context and that the transaction is valid; if not, release the lock and throw an exception.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 11 months
[JBoss JIRA] Created: (ISPN-860) Rehashing into a running cluster causes lock timeouts and lock cleanup errors
by Erik Salter (JIRA)
Rehashing into a running cluster causes lock timeouts and lock cleanup errors
-----------------------------------------------------------------------------
Key: ISPN-860
URL: https://issues.jboss.org/browse/ISPN-860
Project: Infinispan
Issue Type: Bug
Reporter: Erik Salter
Assignee: Manik Surtani
Attachments: multinode-rehash.zip
We are seeing some severe issues with a new node joining a cluster running transactions. Specifically, when a new node added to the system, some transactions running against the previous nodes will fail due to locks never being released. There will be a lot of lock timeouts as well.
All of our caches are in DIST mode. The number of owners is 3. We are also making liberal use of the new "eagerLockSingleNode" flag.
The attached test case illustrates the lock timeout problem.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 11 months
[JBoss JIRA] Commented: (ISPN-186) Smart L1 cache invalidation
by Manik Surtani (JIRA)
[ https://issues.jboss.org/browse/ISPN-186?page=com.atlassian.jira.plugin.s... ]
Manik Surtani commented on ISPN-186:
------------------------------------
Yeah, a threshold may be set to do this I suppose.
> Smart L1 cache invalidation
> ---------------------------
>
> Key: ISPN-186
> URL: https://issues.jboss.org/browse/ISPN-186
> Project: Infinispan
> Issue Type: Feature Request
> Components: Distributed Cache
> Reporter: Manik Surtani
> Assignee: Pete Muir
> Labels: l1
> Fix For: 5.0.0.BETA1, 5.0.0.Final
>
>
> Need to build a mechanism in which L1 invalidation is NOT multicast, but instead is unicast _if necessary_ to specific nodes that may have cached a given entry. This can be detected by maintaining a list of nodes who have requested a key via a remote get, but this would need to be relayed by all data owners.
> Benefits would be performance by removing unnecessary invalidation where this is not needed, and by reducing noise in network stacks of most nodes.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 11 months
[JBoss JIRA] Commented: (ISPN-186) Smart L1 cache invalidation
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/ISPN-186?page=com.atlassian.jira.plugin.s... ]
Bela Ban commented on ISPN-186:
-------------------------------
Additional logic that may make this better:
- If the transport is multicast capable and we have more than N/3 (or whatever subset is picked) members to send the invalidation to --> use a multicast
- Else use several unicasts
- Note that daisychaining might be helpful in sending 'multicasts' in TCP stacks
> Smart L1 cache invalidation
> ---------------------------
>
> Key: ISPN-186
> URL: https://issues.jboss.org/browse/ISPN-186
> Project: Infinispan
> Issue Type: Feature Request
> Components: Distributed Cache
> Reporter: Manik Surtani
> Assignee: Pete Muir
> Labels: l1
> Fix For: 5.0.0.BETA1, 5.0.0.Final
>
>
> Need to build a mechanism in which L1 invalidation is NOT multicast, but instead is unicast _if necessary_ to specific nodes that may have cached a given entry. This can be detected by maintaining a list of nodes who have requested a key via a remote get, but this would need to be relayed by all data owners.
> Benefits would be performance by removing unnecessary invalidation where this is not needed, and by reducing noise in network stacks of most nodes.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 11 months
[JBoss JIRA] Created: (ISPN-845) Memory leaks for 2 phase commit readonly transaction
by Changgeng Li (JIRA)
Memory leaks for 2 phase commit readonly transaction
----------------------------------------------------
Key: ISPN-845
URL: https://issues.jboss.org/browse/ISPN-845
Project: Infinispan
Issue Type: Bug
Components: Transactions
Affects Versions: 4.2.0.Final
Reporter: Changgeng Li
Assignee: Manik Surtani
Priority: Blocker
If a method that is declared with read only transactional attribute by '@Transactional(readOnly=true)', using jboss transaction manager, and inside this method two caches are visited, there will be memory leak after the method is invoked.
The leak happens in TransactionTable instances, localTransactions map and xid2LocalTx map. It seems the commit phrase is skipped and the cleanup method is failed to be called to clean entries in these two maps.
My test shows if the transaction just enlists one cache, there won't be any problems, but I haven't test whether using DummyTransactionManagerLookup will resolve this.
This bug break our system where readOnly DAO methods are heavily used with infinispan as hibernate second level cache.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 11 months
[JBoss JIRA] Assigned: (ISPN-887) XAResource implementation improvements (TransactionalXAResource)
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-887?page=com.atlassian.jira.plugin.s... ]
Mircea Markus reassigned ISPN-887:
----------------------------------
Assignee: Mircea Markus (was: Manik Surtani)
> XAResource implementation improvements (TransactionalXAResource)
> ----------------------------------------------------------------
>
> Key: ISPN-887
> URL: https://issues.jboss.org/browse/ISPN-887
> Project: Infinispan
> Issue Type: Feature Request
> Components: Transactions
> Affects Versions: 4.2.0.Final
> Reporter: Mircea Markus
> Assignee: Mircea Markus
> Fix For: 5.0.0.Final
>
>
> Reviewing the code with Jonathan Halliday has brought the following aspects (see TODOs below):
> package org.infinispan.transaction.xa;
> import org.infinispan.commands.CommandsFactory;
> import org.infinispan.commands.tx.CommitCommand;
> import org.infinispan.commands.tx.PrepareCommand;
> import org.infinispan.commands.tx.RollbackCommand;
> import org.infinispan.config.Configuration;
> import org.infinispan.context.InvocationContextContainer;
> import org.infinispan.context.impl.LocalTxInvocationContext;
> import org.infinispan.interceptors.InterceptorChain;
> import org.infinispan.util.logging.Log;
> import org.infinispan.util.logging.LogFactory;
> import javax.transaction.xa.XAException;
> import javax.transaction.xa.XAResource;
> import javax.transaction.xa.Xid;
> /**
> * This acts both as an local {@link org.infinispan.transaction.xa.CacheTransaction} and implementor of an {@link
> * javax.transaction.xa.XAResource} that will be called by tx manager on various tx stages.
> *
> * @author Mircea.Markus(a)jboss.com
> * @since 4.0
> */
> public class TransactionXaAdapter implements XAResource {
> private static final Log log = LogFactory.getLog(TransactionXaAdapter.class);
> private static boolean trace = log.isTraceEnabled();
> //todo - comment why timeout is not used
> // - it is useful only if TM and client are in separate processes and TM might fail. this is because a client might tm.begin and then the TM (running separate process) crashes
> // - in this scenario the TM won't ever call XAResource.rollback, so these resources would be held there forever
> // - not affecting us as in all scenarios TM&XAResource are collocated
> private int txTimeout;
> private final InvocationContextContainer icc;
> private final InterceptorChain invoker;
> private final CommandsFactory commandsFactory;
> private final Configuration configuration;
> private final TransactionTable txTable;
> /**
> * XAResource is associated with a transaction between enlistment (XAResource.start()) XAResource.end(). It's only the
> * boundary methods (prepare, commit, rollback) that need to be "stateless".
> * Reefer to section 3.4.4 from JTA spec v.1.1
> */
> private final LocalTransaction localTransaction;
> public TransactionXaAdapter(LocalTransaction localTransaction, TransactionTable txTable, CommandsFactory commandsFactory,
> Configuration configuration, InterceptorChain invoker, InvocationContextContainer icc) {
> this.localTransaction = localTransaction;
> this.txTable = txTable;
> this.commandsFactory = commandsFactory;
> this.configuration = configuration;
> this.invoker = invoker;
> this.icc = icc;
> }
> /**
> * This can be call for any transaction object. See Section 3.4.6 (Resource Sharing) from JTA spec v1.1.
> */
> public int prepare(Xid xid) throws XAException {
> //todo if I throw an exception here then I should also cleanup resources as .rollback might never be called!!
> LocalTransaction localTransaction = getLocalTransactionAndValidate(xid);
> //todo - same as last comment
> validateNotMarkedForRollback(localTransaction);
> if (configuration.isOnePhaseCommit()) {
> if (trace) log.trace("Received prepare for tx: {0}. Skipping call as 1PC will be used.", xid);
> return XA_OK;
> }
> PrepareCommand prepareCommand = commandsFactory.buildPrepareCommand(localTransaction.getGlobalTransaction(), localTransaction.getModifications(), configuration.isOnePhaseCommit());
> if (trace) log.trace("Sending prepare command through the chain: " + prepareCommand);
> LocalTxInvocationContext ctx = icc.createTxInvocationContext();
> ctx.setLocalTransaction(localTransaction);
> try {
> invoker.invoke(ctx, prepareCommand);
> if (localTransaction.isReadOnly()) {
> if (trace) log.trace("Readonly transaction: " + localTransaction.getGlobalTransaction());
> // force a cleanup to release any objects held. Some TMs don't call commit if it is a READ ONLY tx. See ISPN-845
> commit(xid, false);
> return XA_RDONLY;
> } else {
> return XA_OK;
> }
> } catch (Throwable e) {
> // todo if I throw this exception make sure that all locks are 100% cleaned up, as TM won't do any rollback call on it.
> // todo - handle this! -> if only a node fails to ack tx prepare, and that node is still part of the cluster, it needs to be sync with tx state.
> // one way of doing this is by pushing the tx state to that node until one of two happens: a) node ack or b) node is shunned from the cluster
> log.error("Error while processing PrepareCommand", e);
> throw new XAException(XAException.XAER_RMERR);
> }
> }
> /**
> * Same comment as for {@link #prepare(javax.transaction.xa.Xid)} applies for commit.
> */
> public void commit(Xid xid, boolean isOnePhase) throws XAException {
> LocalTransaction localTransaction = getLocalTransactionAndValidate(xid);
> if (trace) log.trace("committing transaction {0}", localTransaction.getGlobalTransaction());
> try {
> LocalTxInvocationContext ctx = icc.createTxInvocationContext();
> ctx.setLocalTransaction(localTransaction);
> // todo this needs to be split in two:
> // - configuration.isOnePhaseCommit() this is not "as important", as the user ack that it doesn't "really" need consistency
> // - on the other case ("isOnePhase"==true) make sure that this method either commits successfully or it fails and cleans up logs eventually
> if (configuration.isOnePhaseCommit() || isOnePhase) {
> validateNotMarkedForRollback(localTransaction);
> if (trace) log.trace("Doing an 1PC prepare call on the interceptor chain");
> PrepareCommand command = commandsFactory.buildPrepareCommand(localTransaction.getGlobalTransaction(), localTransaction.getModifications(), true);
> try {
> invoker.invoke(ctx, command);
> } catch (Throwable e) {
> log.error("Error while processing 1PC PrepareCommand", e);
> throw new XAException(XAException.XAER_RMERR);
> }
> } else {
> CommitCommand commitCommand = commandsFactory.buildCommitCommand(localTransaction.getGlobalTransaction());
> try {
> invoker.invoke(ctx, commitCommand);
> } catch (Throwable e) {
> log.error("Error while processing 1PC PrepareCommand", e);
> throw new XAException(XAException.XAER_RMERR);
> }
> }
> } finally {
> cleanup(localTransaction);
> }
> }
> /**
> * Same comment as for {@link #prepare(javax.transaction.xa.Xid)} applies for commit.
> */
> public void rollback(Xid xid) throws XAException {
> rollbackImpl(xid, commandsFactory, icc, invoker, txTable);
> }
> public static void rollbackImpl(Xid xid, CommandsFactory commandsFactory, InvocationContextContainer icc, InterceptorChain invoker, TransactionTable txTable) throws XAException {
> LocalTransaction localTransaction = txTable.getLocalTransaction(xid);
> if (localTransaction == null) {
> if (trace) log.trace("no tx found for {0}", xid);
> throw new XAException(XAException.XAER_NOTA);
> }
> if (trace) log.trace("rollback transaction {0} ", localTransaction.getGlobalTransaction());
> RollbackCommand rollbackCommand = commandsFactory.buildRollbackCommand(localTransaction.getGlobalTransaction());
> LocalTxInvocationContext ctx = icc.createTxInvocationContext();
> ctx.setLocalTransaction(localTransaction);
> try {
> invoker.invoke(ctx, rollbackCommand);
> } catch (Throwable e) {
> log.error("Exception while rollback", e);
> throw new XAException(XAException.XA_HEURHAZ);
> } finally {
> cleanupImpl(localTransaction, txTable, icc);
> }
> }
> private LocalTransaction getLocalTransactionAndValidate(Xid xid) throws XAException {
> LocalTransaction localTransaction1 = txTable.getLocalTransaction(xid);
> if (localTransaction1 == null) {
> log.error("This should not happen when XAResource and TM are in the same process! No tx found for {0}", xid);
> throw new XAException(XAException.XAER_NOTA);
> }
> return localTransaction1;
> }
> public void start(Xid xid, int i) throws XAException {
> localTransaction.setXid(xid);
> txTable.addLocalTransactionMapping(localTransaction);
> if (trace) log.trace("start called on tx " + this.localTransaction.getGlobalTransaction());
> }
> public void end(Xid xid, int i) throws XAException {
> if (trace) log.trace("end called on tx " + this.localTransaction.getGlobalTransaction());
> }
> public void forget(Xid xid) throws XAException {
> if (trace) log.trace("forget called");
> }
> public int getTransactionTimeout() throws XAException {
> if (trace) log.trace("start called");
> return txTimeout;
> }
> public boolean isSameRM(XAResource xaResource) throws XAException {
> if (!(xaResource instanceof TransactionXaAdapter)) {
> return false;
> }
> TransactionXaAdapter other = (TransactionXaAdapter) xaResource;
> return other.equals(this);
> }
> public Xid[] recover(int i) throws XAException {
> if (trace) log.trace("recover called: " + i);
> return null;
> }
> public boolean setTransactionTimeout(int i) throws XAException {
> this.txTimeout = i;
> return true;
> }
> @Override
> public boolean equals(Object o) {
> if (this == o) return true;
> if (!(o instanceof TransactionXaAdapter)) return false;
> TransactionXaAdapter that = (TransactionXaAdapter) o;
> return this.localTransaction.equals(that.localTransaction);
> }
> @Override
> public int hashCode() {
> return localTransaction.getGlobalTransaction().hashCode();
> }
> @Override
> public String toString() {
> return "TransactionXaAdapter{" +
> "localTransaction=" + localTransaction +
> '}';
> }
> private void validateNotMarkedForRollback(LocalTransaction localTransaction) throws XAException {
> if (localTransaction.isMarkedForRollback()) {
> if (trace) log.trace("Transaction already marked for rollback: {0}", localTransaction);
> throw new XAException(XAException.XA_RBROLLBACK);
> }
> }
> private void cleanup(LocalTransaction localTransaction) {
> TransactionXaAdapter.cleanupImpl(localTransaction, txTable, icc);
> }
> private static void cleanupImpl(LocalTransaction localTransaction, TransactionTable txTable, InvocationContextContainer icc) {
> txTable.removeLocalTransaction(localTransaction);
> icc.suspend();
> }
> }
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 11 months
[JBoss JIRA] Created: (ISPN-882) Size and isEmpty returning wrong values within tx
by Galder Zamarreño (JIRA)
Size and isEmpty returning wrong values within tx
-------------------------------------------------
Key: ISPN-882
URL: https://issues.jboss.org/browse/ISPN-882
Project: Infinispan
Issue Type: Bug
Affects Versions: 5.0.0.ALPHA2, 4.2.1.CR1, 4.2.0.Final
Reporter: Galder Zamarreño
Assignee: Galder Zamarreño
Fix For: 4.2.1.Final, 5.0.0.BETA1
size() and isEmpty() returning wrong values when called within tx, for example, the following test fails:
public void testSizeIsEmptyWithinTx(Method m) throws Exception {
Cache<String, String> localCache = cacheManager.getCache(m.getName());
tm().begin();
assert localCache.size() == 0;
assert localCache.isEmpty();
localCache.put("k-" + m.getName(), "v-" + m.getName());
assert localCache.size() == 0;
assert localCache.isEmpty();
tm().commit();
assert localCache.size() == 1;
assert !localCache.isEmpty();
}
This used to work in 4.0, see http://community.jboss.org/docs/DOC-14695
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 11 months