[JBoss JIRA] (ISPN-4565) ReplTotalOrderVersionedStateTransferTest.testStateTransfer random failures
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4565?page=com.atlassian.jira.plugin.... ]
Work on ISPN-4565 started by Pedro Ruivo.
> ReplTotalOrderVersionedStateTransferTest.testStateTransfer random failures
> --------------------------------------------------------------------------
>
> Key: ISPN-4565
> URL: https://issues.jboss.org/browse/ISPN-4565
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Core, State Transfer, Test Suite - Core
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Pedro Ruivo
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.Beta1
>
>
> A NullPointerException appears while processing the 2nd tx:
> {noformat}
> 04:27:12,078 DEBUG (remote-thread-ReplTotalOrderVersionedStateTransferTest-NodeB-p12450-t4:) [TotalOrderInterceptor] Exception while rollback transaction ReplTotalOrderVersionedStateTransferTest-NodeC-12055:56786
> java.lang.NullPointerException
> at org.infinispan.transaction.impl.WriteSkewHelper.performTotalOrderWriteSkewCheckAndReturnNewVersions(WriteSkewHelper.java:76)
> at org.infinispan.interceptors.locking.ClusteringDependentLogic$AbstractClusteringDependentLogic.totalOrderCreateNewVersionsAndCheckForWriteSkews(ClusteringDependentLogic.java:133)
> at org.infinispan.interceptors.locking.ClusteringDependentLogic$AbstractClusteringDependentLogic.createNewVersionsAndCheckForWriteSkews(ClusteringDependentLogic.java:93)
> at org.infinispan.interceptors.totalorder.TotalOrderVersionedEntryWrappingInterceptor.visitPrepareCommand(TotalOrderVersionedEntryWrappingInterceptor.java:62)
> at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:124)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.NotificationInterceptor.visitPrepareCommand(NotificationInterceptor.java:36)
> at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:124)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.TxInterceptor.invokeNextInterceptorAndVerifyTransaction(TxInterceptor.java:124)
> at org.infinispan.interceptors.TxInterceptor.visitPrepareCommand(TxInterceptor.java:111)
> at org.infinispan.interceptors.TxInterceptor.visitCommitCommand(TxInterceptor.java:184)
> at org.infinispan.commands.tx.CommitCommand.acceptVisitor(CommitCommand.java:32)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.totalorder.TotalOrderInterceptor.visitSecondPhaseCommand(TotalOrderInterceptor.java:148)
> at org.infinispan.interceptors.totalorder.TotalOrderInterceptor.visitCommitCommand(TotalOrderInterceptor.java:125)
> {noformat}
> (The error message is misleading, this is a commit and not a rollback.)
> The 1st tx still fails with a WriteSkewException, but then the test fails because the 2nd tx didn't update the value:
> {noformat}
> 04:27:12,286 ERROR (testng-ReplTotalOrderVersionedStateTransferTest:) [UnitTestTestNGListener] Test testStateTransfer(org.infinispan.tx.totalorder.statetransfer.ReplTotalOrderVersionedStateTransferTest) failed.
> java.lang.AssertionError: expected:<new world> but was:<world>
> at org.testng.AssertJUnit.fail(AssertJUnit.java:59)
> at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364)
> at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80)
> at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:88)
> at org.infinispan.container.versioning.VersionedReplStateTransferTest.testStateTransfer(VersionedReplStateTransferTest.java:89)
> {noformat}
> Full log here: http://ci.infinispan.org/viewLog.html?buildId=9816&buildTypeId=Infinispan...
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 5 months
[JBoss JIRA] (ISPN-4575) Map/Reduce incorrect results with a non-shared non-tx intermediate cache
by Dan Berindei (JIRA)
Dan Berindei created ISPN-4575:
----------------------------------
Summary: Map/Reduce incorrect results with a non-shared non-tx intermediate cache
Key: ISPN-4575
URL: https://issues.jboss.org/browse/ISPN-4575
Project: Infinispan
Issue Type: Bug
Security Level: Public (Everyone can see)
Components: Core, Distributed Execution and Map/Reduce
Affects Versions: 7.0.0.Alpha5
Reporter: Dan Berindei
Assignee: Vladimir Blagojevic
Fix For: 7.0.0.Beta1
In a non-tx cache, if a command is started with topology id {{T}}, and when it is replicated on another node the distribution interceptor sees topology {{T+1}}, it throws an {{OutdatedTopologyException}}. The originator of the command will then retry the command, setting topology {{T+1}}.
When this happens with a {{PutKeyValueCommand(k, MapReduceManagerImpl.DeltaAwareList)}}, it can lead to duplicate intermediate values.
Say _A_ is the primary owner of {{k}} in {{T}}, _B_ is a backup owner both in {{T}} and {{T+1}}, and _C_ is the backup owner in {{T}} and the primary owner in {{T+1}} (i.e. _C_ just joined and a rebalance is in progress during {{T}} - see {{NonTxBackupOwnerBecomingPrimaryOwnerTest}}).
_A_ starts the {{PutKeyValueCommand}} and replicates it to _B_ and _C_. _C_ applies the command, but _B_ already has topology {{T+1}} and throws an {{OutdatedTopologyException}}. _A_ installs topology {{T+1}}, sends the command to _C_ (as the new primary owner), which replicates it to _B_ and then applies it locally a second time.
This scenario can happen during a M/R task even without nodes joining or leaving. That's because {{CreateCacheCommand}} only calls {{getCache()}} on each member, it doesn't wait for the cache to have a certain number of members or for state transfer to be complete for all the members. The last member to join the intermediate cache is guaranteed to have topology {{T+1}}, but the others may have topology {{T}} by the time the combine phase starts inserting values in the intermediate cache.
I have seen the {{OutdatedTopologyException}} happen pretty often during the test suite, especially after I removed the duplicate {{invokeRemotely}} call in {{MapReduceTask.executeTaskInit()}}. Most of them were harmless, but there was one failure in CI: http://ci.infinispan.org/viewLog.html?buildId=9811&tab=buildResultsDiv&bu...
A short-term fix would be to wait for all the members to finish joining in {{CreateCacheCommand}}. Long-term, M/R tasks should be resilient to topology changes, so we should investigate making {{PutKeyValue(k, DeltaAwareList)}} handle {{OutdatedTopologyException}} s.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 5 months
[JBoss JIRA] (ISPN-4173) SuspectExceptions thrown during MapReduceTask while removing the intermediate cache
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4173?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4173:
-------------------------------
Assignee: Vladimir Blagojevic (was: Dan Berindei)
> SuspectExceptions thrown during MapReduceTask while removing the intermediate cache
> -----------------------------------------------------------------------------------
>
> Key: ISPN-4173
> URL: https://issues.jboss.org/browse/ISPN-4173
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Distributed Execution and Map/Reduce
> Affects Versions: 6.0.1.Final, 7.0.0.Alpha2
> Reporter: Alan Field
> Assignee: Vladimir Blagojevic
>
> While running the Map/Reduce benchmark with multiple value sizes, I have been seeing this error in the logs from Infinispan 6 and 7:
> {noformat}
> 16:13:51,325 ERROR [org.radargun.stages.MapReduceStage] (pool-1-thread-1) executeMapReduceTask() returned an exception
> org.infinispan.commons.CacheException: Error removing cache
> at org.infinispan.manager.DefaultCacheManager.removeCache(DefaultCacheManager.java:471)
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:353)
> at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:634)
> at org.radargun.cachewrappers.InfinispanMapReduce.executeMapReduceTask(InfinispanMapReduce.java:91)
> at org.radargun.cachewrappers.Infinispan51Wrapper.executeMapReduceTask(Infinispan51Wrapper.java:198)
> at org.radargun.stages.MapReduceStage.executeMapReduceTask(MapReduceStage.java:212)
> at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:164)
> at org.radargun.Slave$2.run(Slave.java:103)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from edg-perf06-46939, see cause for remote stack trace
> at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:41)
> at org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:66)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:547)
> at org.infinispan.manager.DefaultCacheManager.removeCache(DefaultCacheManager.java:463)
> ... 12 more
> Caused by: org.infinispan.commons.CacheException: Problems invoking command.
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:221)
> at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:460)
> at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:377)
> at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:247)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:665)
> at org.jgroups.JChannel.up(JChannel.java:708)
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1015)
> at org.jgroups.protocols.RSVP.up(RSVP.java:187)
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:370)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:381)
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1010)
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
> at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:390)
> at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:774)
> at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:570)
> at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147)
> at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:184)
> at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:301)
> at org.jgroups.protocols.MERGE2.up(MERGE2.java:209)
> at org.jgroups.protocols.Discovery.up(Discovery.java:379)
> at org.jgroups.protocols.TP.passMessageUp(TP.java:1370)
> at org.jgroups.protocols.TP$MyHandler.run(TP.java:1556)
> ... 3 more
> Caused by: java.lang.NullPointerException
> at org.infinispan.commands.RemoteCommandsFactory.fromStream(RemoteCommandsFactory.java:195)
> at org.infinispan.marshall.exts.ReplicableCommandExternalizer.fromStream(ReplicableCommandExternalizer.java:106)
> at org.infinispan.marshall.exts.CacheRpcCommandExternalizer.readObject(CacheRpcCommandExternalizer.java:147)
> at org.infinispan.marshall.exts.CacheRpcCommandExternalizer.readObject(CacheRpcCommandExternalizer.java:59)
> at org.infinispan.marshall.core.ExternalizerTable$ExternalizerAdapter.readObject(ExternalizerTable.java:389)
> at org.infinispan.marshall.core.ExternalizerTable.readObject(ExternalizerTable.java:205)
> at org.infinispan.marshall.core.JBossMarshaller$ExternalizerTableProxy.readObject(JBossMarshaller.java:152)
> at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:355)
> at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:213)
> at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
> at org.infinispan.commons.marshall.jboss.AbstractJBossMarshaller.objectFromObjectStream(AbstractJBossMarshaller.java:136)
> at org.infinispan.marshall.core.VersionAwareMarshaller.objectFromByteBuffer(VersionAwareMarshaller.java:101)
> at org.infinispan.commons.marshall.AbstractDelegatingMarshaller.objectFromByteBuffer(AbstractDelegatingMarshaller.java:80)
> at org.infinispan.remoting.transport.jgroups.MarshallerAdapter.objectFromBuffer(MarshallerAdapter.java:28)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:206)
> ... 25 more
> {noformat}
> These exceptions are happening during the execution of the MapReduceTask. I have also seen SuspectExceptions in these logs. This could be related to shutting down the intermediate cache (https://issues.jboss.org/browse/ISPN-4144), so I will check again once this is addressed. If this is the case the fix for ISPN-4144 will need to be fixed in both versions.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 5 months
[JBoss JIRA] (ISPN-4570) Remove UFC from JGroups TCP configurations
by Pedro Ruivo (JIRA)
[ https://issues.jboss.org/browse/ISPN-4570?page=com.atlassian.jira.plugin.... ]
Pedro Ruivo updated ISPN-4570:
------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> Remove UFC from JGroups TCP configurations
> ------------------------------------------
>
> Key: ISPN-4570
> URL: https://issues.jboss.org/browse/ISPN-4570
> Project: Infinispan
> Issue Type: Task
> Security Level: Public(Everyone can see)
> Components: Configuration, Core, Server
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 7.0.0.Beta1
>
>
> The UFC protocol is not needed when the protocol is TCP, and JGroups actually logs a warning on startup:
> {noformat}
> 12:22:37,426 INFO (testng-ReplicationExceptionTest:) [UFC] UFC is not needed (and can be removed) as we're running on a TCP transport
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 5 months
[JBoss JIRA] (ISPN-4573) Unable to serialize List<LuceneWork>
by Sanne Grinovero (JIRA)
[ https://issues.jboss.org/browse/ISPN-4573?page=com.atlassian.jira.plugin.... ]
Sanne Grinovero updated ISPN-4573:
----------------------------------
Priority: Critical (was: Major)
> Unable to serialize List<LuceneWork>
> ------------------------------------
>
> Key: ISPN-4573
> URL: https://issues.jboss.org/browse/ISPN-4573
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Embedded Querying
> Affects Versions: 7.0.0.Alpha5
> Reporter: Radim Vansa
> Assignee: Sanne Grinovero
> Priority: Critical
>
> When I try to fill distributed cache with indexing enabled concurrently from multiple threads, I get
> {code}
> org.hibernate.search.exception.SearchException: HSEARCH000083: Unable to serialize List<LuceneWork>
> at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.toSerializedModel(LuceneWorkSerializerImpl.java:92)
> ...
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 6
> at java.util.ArrayList.add(ArrayList.java:412)
> at org.hibernate.search.indexes.serialization.avro.impl.AvroSerializer.addFieldWithStringData(AvroSerializer.java:258)
> at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.buildDocument(LuceneWorkSerializerImpl.java:174)
> at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.toSerializedModel(LuceneWorkSerializerImpl.java:80)
> {code}
> There are multiple different exceptions as cause, but the reason is one: {{AvroSerializer}} is not thead-safe. {{AvroSerializerProvider}} returns always the same instance, and so does {{InfinispanIndexManager}} (or rather {{DirectoryBasedIndexManager}}) return the same instance of {{LuceneWorkSerializerImpl}} wrapping the {{AvroSerializerProvider}}.
> This results in failed writes, corrupted data and another fun.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 5 months
[JBoss JIRA] (ISPN-4573) Unable to serialize List<LuceneWork>
by Sanne Grinovero (JIRA)
[ https://issues.jboss.org/browse/ISPN-4573?page=com.atlassian.jira.plugin.... ]
Sanne Grinovero updated ISPN-4573:
----------------------------------
Fix Version/s: 7.0.0.Beta1
> Unable to serialize List<LuceneWork>
> ------------------------------------
>
> Key: ISPN-4573
> URL: https://issues.jboss.org/browse/ISPN-4573
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Embedded Querying
> Affects Versions: 7.0.0.Alpha5
> Reporter: Radim Vansa
> Assignee: Sanne Grinovero
> Priority: Critical
> Fix For: 7.0.0.Beta1
>
>
> When I try to fill distributed cache with indexing enabled concurrently from multiple threads, I get
> {code}
> org.hibernate.search.exception.SearchException: HSEARCH000083: Unable to serialize List<LuceneWork>
> at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.toSerializedModel(LuceneWorkSerializerImpl.java:92)
> ...
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 6
> at java.util.ArrayList.add(ArrayList.java:412)
> at org.hibernate.search.indexes.serialization.avro.impl.AvroSerializer.addFieldWithStringData(AvroSerializer.java:258)
> at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.buildDocument(LuceneWorkSerializerImpl.java:174)
> at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.toSerializedModel(LuceneWorkSerializerImpl.java:80)
> {code}
> There are multiple different exceptions as cause, but the reason is one: {{AvroSerializer}} is not thead-safe. {{AvroSerializerProvider}} returns always the same instance, and so does {{InfinispanIndexManager}} (or rather {{DirectoryBasedIndexManager}}) return the same instance of {{LuceneWorkSerializerImpl}} wrapping the {{AvroSerializerProvider}}.
> This results in failed writes, corrupted data and another fun.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
11 years, 5 months