[JBoss JIRA] (ISPN-2903) CLONE - Eviction causes lost AtomicMap entries
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-2903?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño edited comment on ISPN-2903 at 3/8/13 8:42 AM:
----------------------------------------------------------------
I tried to modify the original test case I created for Infinispan to see if it failed but couldn't get it to fail.
I've tried running it on AS8 and it fails in the same place as it did before:
{code}org.jboss.as.test.clustering.cluster.AtomicMapTestCase.test(AtomicMapTestCase.java:107){code}
I'll try to add some more TRACE to see more closely how the contents of the AtomicHashMap progress.
was (Author: galder.zamarreno):
I tried to modify the original test case I created for Infinispan to see if it failed but couldn't get it to fail.
I've tried running it on AS8 and it fails in the same place as it did before:
org.jboss.as.test.clustering.cluster.AtomicMapTestCase.test(AtomicMapTestCase.java:107)
I'll try to add some more TRACE to see more closely how the contents of the AtomicHashMap progress.
> CLONE - Eviction causes lost AtomicMap entries
> ----------------------------------------------
>
> Key: ISPN-2903
> URL: https://issues.jboss.org/browse/ISPN-2903
> Project: Infinispan
> Issue Type: Bug
> Components: Eviction
> Affects Versions: 5.2.3.Final
> Reporter: Paul Ferraro
> Assignee: Galder Zamarreño
> Priority: Critical
> Labels: jdg6
> Fix For: 5.2.4.Final, 5.3.0.Alpha1, 5.3.0.Final
>
> Attachments: AtomicMapServlet.java, AtomicMapTestCase.java, server.log, server.log
>
>
> Here's the scenario:
> Given 2 nodes with REPL_SYNC cache with passivating cache store (e.g. default web cache in AS7).
> 1. Create cache entry containing atomic map with 2 map entries on node1.
> 2. Passivate that cache entry on node2 via manual evict.
> 3. Modify 1 of the atomic map entries within the cache entry on node1.
> 4. Lookup atomic map on node2. It only contains 1 map entry - the map entry modified in step 3. The other map entry is lost.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months
[JBoss JIRA] (ISPN-2903) CLONE - Eviction causes lost AtomicMap entries
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-2903?page=com.atlassian.jira.plugin.... ]
Galder Zamarreño commented on ISPN-2903:
----------------------------------------
I'm a bit confused with this test. I can't find any tests named as AtomicMapTestCase in the AS source tree in https://github.com/jbossas/jboss-as, nor can I find a test that creates a atomic-map.war deployment (in case the name of the test has changed). Why is the test not included there in the first place?? The reason I ask is cos I'm confused by the test HTTP parameters: `operation=put&key=1&value=a`, and the log messages:
{code}Invoked with command PutKeyValueCommand{key=a, value=AtomicHashMap...{code}
Unfortunately, the toString implementation of AtomicHashMap does not show contents any more due to concurrent iterator issues with the previous toString implementation. Looking to see if I can trace back changes to it via atomic hash map delta logging.
Also, is the test still failing in the same line as it did before? Or is failing in a different line?
> CLONE - Eviction causes lost AtomicMap entries
> ----------------------------------------------
>
> Key: ISPN-2903
> URL: https://issues.jboss.org/browse/ISPN-2903
> Project: Infinispan
> Issue Type: Bug
> Components: Eviction
> Affects Versions: 5.2.3.Final
> Reporter: Paul Ferraro
> Assignee: Galder Zamarreño
> Priority: Critical
> Labels: jdg6
> Fix For: 5.2.4.Final, 5.3.0.Alpha1, 5.3.0.Final
>
> Attachments: AtomicMapServlet.java, AtomicMapTestCase.java, server.log, server.log
>
>
> Here's the scenario:
> Given 2 nodes with REPL_SYNC cache with passivating cache store (e.g. default web cache in AS7).
> 1. Create cache entry containing atomic map with 2 map entries on node1.
> 2. Passivate that cache entry on node2 via manual evict.
> 3. Modify 1 of the atomic map entries within the cache entry on node1.
> 4. Lookup atomic map on node2. It only contains 1 map entry - the map entry modified in step 3. The other map entry is lost.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months
[JBoss JIRA] (ISPN-2847) Puts done by state transfer can fail with TimeoutException if lock cannot be acquired
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-2847?page=com.atlassian.jira.plugin.... ]
Adrian Nistor commented on ISPN-2847:
-------------------------------------
The (now disabled) DistPessimisticTxOperationsDuringStateTransferTest and ReplPessimisticTxOperationsDuringStateTransferTest were added to show the problem.
> Puts done by state transfer can fail with TimeoutException if lock cannot be acquired
> -------------------------------------------------------------------------------------
>
> Key: ISPN-2847
> URL: https://issues.jboss.org/browse/ISPN-2847
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.2.2.Final, 5.3.0.Final
> Reporter: Adrian Nistor
> Assignee: Adrian Nistor
> Labels: 5.2.x
> Fix For: 5.3.0.Final
>
>
> On pessimistic tx caches state transfer can fail for individual keys if they are locked by another tx for too long and timeout expires for the state transfer tx. The error is logged but nothing is done to try to recover the error. The value is lost.
> This can be reproduced easily by running OperationsDuringStateTransferTest configured with pessimistic tx.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months
[JBoss JIRA] (ISPN-2619) DistSyncCacheStoreNotSharedTest fails randomly
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-2619?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-2619:
--------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
Integrated in master. Thanks!
> DistSyncCacheStoreNotSharedTest fails randomly
> ----------------------------------------------
>
> Key: ISPN-2619
> URL: https://issues.jboss.org/browse/ISPN-2619
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite
> Affects Versions: 5.2.0.Beta5
> Reporter: Dan Berindei
> Assignee: Galder Zamarreño
> Labels: testsuite_stability
> Fix For: 5.3.0.Final
>
>
> DistSyncCacheStoreNotSharedTest fails (pretty rarely) with this exception:
> {noformat}
> 09:29:51,069 ERROR (testng-DistSyncCacheStoreNotSharedTest:) [UnitTestTestNGListener] Test testAtomicPutIfAbsentFromNonOwner(org.infinispan.distribution.DistSyncCacheStoreNotSharedTest) failed.
> java.lang.AssertionError
> at org.infinispan.distribution.DistSyncCacheStoreNotSharedTest.testAtomicPutIfAbsentFromNonOwner(DistSyncCacheStoreNotSharedTest.java:293)
> {noformat}
> What happens is that at the end of the previous test there is a get() that goes remotely, and one of the ClusteredGetCommands (to node C) is delayed. The test framework then clears the cache store and the data container on node C, but the get command already had the entry in its context and will write it to the data container when it finishes executing, so that the data container is not empty when testAtomicPutIfAbsentFromNonOwner starts.
> {noformat}
> 09:29:51,033 TRACE (OOB-1,ISPN,NodeC-33167:) [CommandAwareRpcDispatcher] Attempting to execute command: ClusteredGetCommand{key=k2, flags=null} [sender=NodeB-52644]
> 09:29:51,062 TRACE (testng-DistSyncCacheStoreNotSharedTest:) [DummyInMemoryCacheStore] Clear store
> 09:29:51,062 DEBUG (testng-DistSyncCacheStoreNotSharedTest:) [TestingUtil] Cleaning data for cache 'dist' on a cache manager at address NodeC-33167
> 09:29:51,065 DEBUG (testng-DistSyncCacheStoreNotSharedTest:) [TestingUtil] removeInMemoryData(): dataContainerBefore == []
> 09:29:51,062 TRACE (OOB-1,ISPN,NodeC-33167:dist) [ReadCommittedEntry] Updating entry (key=k2 removed=false valid=true changed=true created=false value=value2]
> 09:29:51,065 TRACE (OOB-1,ISPN,NodeC-33167:dist) [EntryWrappingInterceptor] Committed entry ReadCommittedEntry(64d90a46){key=k2, value=value2, oldValue=null, isCreated=false, isChanged=false, isRemoved=false, isValid=true}
> 09:29:51,065 DEBUG (testng-DistSyncCacheStoreNotSharedTest:) [TestingUtil] removeInMemoryData(): dataContainerAfter == []
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months
[JBoss JIRA] (ISPN-2619) DistSyncCacheStoreNotSharedTest fails randomly
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-2619?page=com.atlassian.jira.plugin.... ]
Adrian Nistor commented on ISPN-2619:
-------------------------------------
[~galderz] Do we want this for 5.2.x too?
> DistSyncCacheStoreNotSharedTest fails randomly
> ----------------------------------------------
>
> Key: ISPN-2619
> URL: https://issues.jboss.org/browse/ISPN-2619
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite
> Affects Versions: 5.2.0.Beta5
> Reporter: Dan Berindei
> Assignee: Galder Zamarreño
> Labels: testsuite_stability
> Fix For: 5.3.0.Final
>
>
> DistSyncCacheStoreNotSharedTest fails (pretty rarely) with this exception:
> {noformat}
> 09:29:51,069 ERROR (testng-DistSyncCacheStoreNotSharedTest:) [UnitTestTestNGListener] Test testAtomicPutIfAbsentFromNonOwner(org.infinispan.distribution.DistSyncCacheStoreNotSharedTest) failed.
> java.lang.AssertionError
> at org.infinispan.distribution.DistSyncCacheStoreNotSharedTest.testAtomicPutIfAbsentFromNonOwner(DistSyncCacheStoreNotSharedTest.java:293)
> {noformat}
> What happens is that at the end of the previous test there is a get() that goes remotely, and one of the ClusteredGetCommands (to node C) is delayed. The test framework then clears the cache store and the data container on node C, but the get command already had the entry in its context and will write it to the data container when it finishes executing, so that the data container is not empty when testAtomicPutIfAbsentFromNonOwner starts.
> {noformat}
> 09:29:51,033 TRACE (OOB-1,ISPN,NodeC-33167:) [CommandAwareRpcDispatcher] Attempting to execute command: ClusteredGetCommand{key=k2, flags=null} [sender=NodeB-52644]
> 09:29:51,062 TRACE (testng-DistSyncCacheStoreNotSharedTest:) [DummyInMemoryCacheStore] Clear store
> 09:29:51,062 DEBUG (testng-DistSyncCacheStoreNotSharedTest:) [TestingUtil] Cleaning data for cache 'dist' on a cache manager at address NodeC-33167
> 09:29:51,065 DEBUG (testng-DistSyncCacheStoreNotSharedTest:) [TestingUtil] removeInMemoryData(): dataContainerBefore == []
> 09:29:51,062 TRACE (OOB-1,ISPN,NodeC-33167:dist) [ReadCommittedEntry] Updating entry (key=k2 removed=false valid=true changed=true created=false value=value2]
> 09:29:51,065 TRACE (OOB-1,ISPN,NodeC-33167:dist) [EntryWrappingInterceptor] Committed entry ReadCommittedEntry(64d90a46){key=k2, value=value2, oldValue=null, isCreated=false, isChanged=false, isRemoved=false, isValid=true}
> 09:29:51,065 DEBUG (testng-DistSyncCacheStoreNotSharedTest:) [TestingUtil] removeInMemoryData(): dataContainerAfter == []
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months
[JBoss JIRA] (ISPN-2903) CLONE - Eviction causes lost AtomicMap entries
by Galder Zamarreño (JIRA)
[ https://issues.jboss.org/browse/ISPN-2903?page=com.atlassian.jira.plugin.... ]
Work on ISPN-2903 started by Galder Zamarreño.
> CLONE - Eviction causes lost AtomicMap entries
> ----------------------------------------------
>
> Key: ISPN-2903
> URL: https://issues.jboss.org/browse/ISPN-2903
> Project: Infinispan
> Issue Type: Bug
> Components: Eviction
> Affects Versions: 5.2.3.Final
> Reporter: Paul Ferraro
> Assignee: Galder Zamarreño
> Priority: Critical
> Labels: jdg6
> Fix For: 5.2.4.Final, 5.3.0.Alpha1, 5.3.0.Final
>
> Attachments: AtomicMapServlet.java, AtomicMapTestCase.java, server.log, server.log
>
>
> Here's the scenario:
> Given 2 nodes with REPL_SYNC cache with passivating cache store (e.g. default web cache in AS7).
> 1. Create cache entry containing atomic map with 2 map entries on node1.
> 2. Passivate that cache entry on node2 via manual evict.
> 3. Modify 1 of the atomic map entries within the cache entry on node1.
> 4. Lookup atomic map on node2. It only contains 1 map entry - the map entry modified in step 3. The other map entry is lost.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months
[JBoss JIRA] (ISPN-2904) Race condition in cache startup causes state transfer timeout
by Dennis Reed (JIRA)
[ https://issues.jboss.org/browse/ISPN-2904?page=com.atlassian.jira.plugin.... ]
Dennis Reed commented on ISPN-2904:
-----------------------------------
Related to ISPN-1979 and related JIRAs, but the fix for those just fixed the shutdown use case, not startup.
> Race condition in cache startup causes state transfer timeout
> -------------------------------------------------------------
>
> Key: ISPN-2904
> URL: https://issues.jboss.org/browse/ISPN-2904
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.1.7.Final
> Reporter: Dennis Reed
> Assignee: Mircea Markus
>
> When starting multiple caches at the same time (as EAP domain mode deployment does), one cache can timeout during state transfer and abort startup.
> This is caused by a race condition where the master node accepts requests while it can't process them because it's still starting.
> Because of this, the other node's REQUEST_JOIN is ignored, and it finally times out.
> [node1]
> 10:47:23,390 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (ServerService Thread Pool -- 65) dests=[master:server-two/web], command=CacheViewControlCommand{cache=repl, type=REQUEST_JOIN, sender=master:server-one/web, newViewId=0, newMembers=null, oldViewId=0, oldMembers=null}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=60000
> 10:47:23,396 TRACE [org.jgroups.protocols.TCP] (ServerService Thread Pool -- 65) sending msg to master:server-two/web, src=master:server-one/web, headers are RequestCorrelator: id=200, type=REQ, id=7, rsp_expected=true, RSVP: REQ(4), UNICAST2: DATA, seqno=27, TCP: [channel_name=web]
> ...
> 10:48:23,404 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 65) MSC000001: Failed to start service jboss.infinispan.web.repl: org.jboss.msc.service.StartException in service jboss.infinispan.web.repl: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.BaseStateTransferManagerImpl.waitForJoinToComplete() throws java.lang.InterruptedException on object of type ReplicatedStateTransferManagerImpl
> [node2]
> 10:47:23,352 TRACE [org.infinispan.factories.GlobalComponentRegistry] (MSC service thread 1-6) Registering component Component{instance=org.infinispan.marshall.jboss.ExternalizerTable@3f9c437d, name=org.infinispan.marshall.jboss.ExternalizerTable} under name org.infinispan.marshall.jboss.ExternalizerTable
> ...
> 10:47:23,397 TRACE [org.jgroups.protocols.TCP] (OOB-19,null) received [dst: master:server-two/web, src: master:server-one/web (4 headers), size=54 bytes, flags=OOB|DONT_BUNDLE|RSVP], headers are RequestCorrelator: id=200, type=REQ, id=7, rsp_expected=true, RSVP: REQ(4), UNICAST2: DATA, seqno=27, TCP: [channel_name=web]
> 10:47:23,398 TRACE [org.jgroups.blocks.RequestCorrelator] (OOB-19,null) calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 7
> 10:47:23,398 TRACE [org.infinispan.marshall.jboss.ExternalizerTable] (OOB-19,null) Either the marshaller has stopped or hasn't started. Read externalizers are not properly populated: {}
> 10:47:23,398 TRACE [org.infinispan.marshall.jboss.ExternalizerTable] (OOB-19,null) Cache manager is shutting down and type (id=74) cannot be resolved (thread not interrupted)
> 10:47:23,400 TRACE [org.jgroups.blocks.RequestCorrelator] (OOB-19,null) sending rsp for 7 to master:server-one/web
> ...
> 10:47:23,522 TRACE [org.infinispan.factories.GlobalComponentRegistry] (ServerService Thread Pool -- 64) Invoking start method public void org.infinispan.marshall.jboss.ExternalizerTable.start() on component org.infinispan.marshall.jboss.ExternalizerTable
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months
[JBoss JIRA] (ISPN-2904) Race condition in cache startup causes state transfer timeout
by Dennis Reed (JIRA)
Dennis Reed created ISPN-2904:
---------------------------------
Summary: Race condition in cache startup causes state transfer timeout
Key: ISPN-2904
URL: https://issues.jboss.org/browse/ISPN-2904
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 5.1.7.Final
Reporter: Dennis Reed
Assignee: Mircea Markus
When starting multiple caches at the same time (as EAP domain mode deployment does), one cache can timeout during state transfer and abort startup.
This is caused by a race condition where the master node accepts requests while it can't process them because it's still starting.
Because of this, the other node's REQUEST_JOIN is ignored, and it finally times out.
[node1]
10:47:23,390 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (ServerService Thread Pool -- 65) dests=[master:server-two/web], command=CacheViewControlCommand{cache=repl, type=REQUEST_JOIN, sender=master:server-one/web, newViewId=0, newMembers=null, oldViewId=0, oldMembers=null}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=60000
10:47:23,396 TRACE [org.jgroups.protocols.TCP] (ServerService Thread Pool -- 65) sending msg to master:server-two/web, src=master:server-one/web, headers are RequestCorrelator: id=200, type=REQ, id=7, rsp_expected=true, RSVP: REQ(4), UNICAST2: DATA, seqno=27, TCP: [channel_name=web]
...
10:48:23,404 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 65) MSC000001: Failed to start service jboss.infinispan.web.repl: org.jboss.msc.service.StartException in service jboss.infinispan.web.repl: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.BaseStateTransferManagerImpl.waitForJoinToComplete() throws java.lang.InterruptedException on object of type ReplicatedStateTransferManagerImpl
[node2]
10:47:23,352 TRACE [org.infinispan.factories.GlobalComponentRegistry] (MSC service thread 1-6) Registering component Component{instance=org.infinispan.marshall.jboss.ExternalizerTable@3f9c437d, name=org.infinispan.marshall.jboss.ExternalizerTable} under name org.infinispan.marshall.jboss.ExternalizerTable
...
10:47:23,397 TRACE [org.jgroups.protocols.TCP] (OOB-19,null) received [dst: master:server-two/web, src: master:server-one/web (4 headers), size=54 bytes, flags=OOB|DONT_BUNDLE|RSVP], headers are RequestCorrelator: id=200, type=REQ, id=7, rsp_expected=true, RSVP: REQ(4), UNICAST2: DATA, seqno=27, TCP: [channel_name=web]
10:47:23,398 TRACE [org.jgroups.blocks.RequestCorrelator] (OOB-19,null) calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 7
10:47:23,398 TRACE [org.infinispan.marshall.jboss.ExternalizerTable] (OOB-19,null) Either the marshaller has stopped or hasn't started. Read externalizers are not properly populated: {}
10:47:23,398 TRACE [org.infinispan.marshall.jboss.ExternalizerTable] (OOB-19,null) Cache manager is shutting down and type (id=74) cannot be resolved (thread not interrupted)
10:47:23,400 TRACE [org.jgroups.blocks.RequestCorrelator] (OOB-19,null) sending rsp for 7 to master:server-one/web
...
10:47:23,522 TRACE [org.infinispan.factories.GlobalComponentRegistry] (ServerService Thread Pool -- 64) Invoking start method public void org.infinispan.marshall.jboss.ExternalizerTable.start() on component org.infinispan.marshall.jboss.ExternalizerTable
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 10 months