[JBoss JIRA] (ISPN-3811) Initial ST leaves node as member without data after MERGE
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-3811?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-3811:
-----------------------------------------------
Tristan Tarrant <ttarrant(a)redhat.com> changed the Status of [bug 1040046|https://bugzilla.redhat.com/show_bug.cgi?id=1040046] from NEW to ON_QA
> Initial ST leaves node as member without data after MERGE
> ---------------------------------------------------------
>
> Key: ISPN-3811
> URL: https://issues.jboss.org/browse/ISPN-3811
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Affects Versions: 6.0.0.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.Final
>
>
> Under certain circumstances, JGroups can issue a MERGE view when a node is joining the cache. The new node joins the cluster, and all nodes have the same cache topology (not containing the joiner yet).
> During the merge, the CH's are joined (through CHFactory.union) and as all report the same topology/hash, the resulting hash is identical. However, the joiner is added to the members list and therefore it can finish the initial state transfer, although no data have been assigned to him.
> Later, the coordinator starts rebalance and the node begins to receive some data, but the thread which started the cluster manager (and should wait until the cluster becomes properly replicated through initial ST) is already released.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-4851) Make SyncConsistentHashFactory the default CH factory
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4851?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4851:
-------------------------------
Status: Open (was: New)
> Make SyncConsistentHashFactory the default CH factory
> -----------------------------------------------------
>
> Key: ISPN-4851
> URL: https://issues.jboss.org/browse/ISPN-4851
> Project: Infinispan
> Issue Type: Feature Request
> Components: Configuration, Core
> Affects Versions: 7.0.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 7.1.0.Alpha1
>
>
> With ISPN-4682 fixed, SyncConsistentHashFactory should be good enough to be the default. It still allows for more variation in the number of owned segments per node (+/-10% owned segments and +/-20% for primary-owned segments), but that should be acceptable for most purposes.
> The major advantage of SCHF is that it depends only on the cache members and not on the order they joined. Users expect a key to map to the same node in all caches (as long as the caches have the same members).
> One downside of SCHF, especially for testing, is that the segment ownership differs between test runs (being based on the random address assigned to each node). However, most tests that depend on key ownership should use {{ControlledConsistentHashFactory}} anyway.
> We also need to verify that the number of segments moved by SCHF is comparable to the number of segments moved by DefaultConsistentHashFactory (ISPN-3729).
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-4988) TopologyAwareDistAsyncFuncTest fails with SIGSEGV exception with Azul JDK
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4988?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4988:
-------------------------------
Status: Open (was: New)
> TopologyAwareDistAsyncFuncTest fails with SIGSEGV exception with Azul JDK
> -------------------------------------------------------------------------
>
> Key: ISPN-4988
> URL: https://issues.jboss.org/browse/ISPN-4988
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 7.0.1.Final
> Reporter: Vitalii Chepeliuk
> Assignee: Dan Berindei
> Labels: testsuite_stability
>
> {noformat}
> Test suite progress: tests succeeded: 2602, failed: 0, skipped: 0.
> 2014-11-16 04:38:48,750 WARN [Configurator] (testng-TopologyAwareDistAsyncFuncTest) JGRP000014: TP.loopback has been deprecated: enabled by default
> 2014-11-16 04:38:48,750 WARN [Configurator] (testng-TopologyAwareDistAsyncFuncTest) JGRP000014: TP.physical_addr_max_fetch_attempts has been deprecated: will be ignored
> 2014-11-16 04:38:48,829 WARN [TCP] (testng-TopologyAwareDistAsyncFuncTest) JGRP000046: bundler_type=old has been removed; using sender-sends-with-timer
> 2014-11-16 04:38:48,945 WARN [Configurator] (testng-TopologyAwareDistAsyncFuncTest) JGRP000014: TP.loopback has been deprecated: enabled by default
> 2014-11-16 04:38:48,945 WARN [Configurator] (testng-TopologyAwareDistAsyncFuncTest) JGRP000014: TP.physical_addr_max_fetch_attempts has been deprecated: will be ignored
> 2014-11-16 04:38:49,023 WARN [TCP] (testng-TopologyAwareDistAsyncFuncTest) JGRP000046: bundler_type=old has been removed; using sender-sends-with-timer
> 2014-11-16 04:38:49,249 WARN [Configurator] (testng-TopologyAwareDistAsyncFuncTest) JGRP000014: TP.loopback has been deprecated: enabled by default
> 2014-11-16 04:38:49,249 WARN [Configurator] (testng-TopologyAwareDistAsyncFuncTest) JGRP000014: TP.physical_addr_max_fetch_attempts has been deprecated: will be ignored
> 2014-11-16 04:38:49,326 WARN [TCP] (testng-TopologyAwareDistAsyncFuncTest) JGRP000046: bundler_type=old has been removed; using sender-sends-with-timer
> Signum: [11] - Exiting due to unhandled SIGSEGV exception.
> 0: rip=0x0000000020d9eb48 @rip=[0xffffffffffffffff] (hotspot_os_backtrace_callback+40) [gcc frame, calls gcc]
> 1: rip=0x00007f526459da3f @rip=[0x0000440000bffb38] (os_backtrace+31) [gcc frame, calls gcc]
> 2: rip=0x0000000020d99ef5 @rip=[0x0000440000bffba8] (jvm_unexpected_exception_handler+165) [gcc frame, calls gcc]
> 3: rip=0x00007f526459cf52 @rip=[0x0000440000bffc58] (jvm_unexpected_exception_handler_wrapper+82) [gcc frame, calls gcc]
> 4: rip=0x0000000020838c30 @rip=[0x0000440000bffc78] (GPGC_GCManagerMark::process_mutator_stack(HeapRefBuffer*)+112) [gcc frame, calls gcc]
> 5: rip=0x0000000020876355 @rip=[0x0000440000bffcd8] (void GPGC_MarkAlgorithm::drain_stacks<GPGC_GCManagerOldStrong>(GPGC_GCManagerOldStrong*)+149) [gcc frame, calls gcc]
> 6: rip=0x000000002087748c @rip=[0x0000440000bffda8] (void GPGC_MarkAlgorithm::drain_and_steal_stacks<GPGC_GCManagerOldStrong>(GPGC_GCManagerOldStrong*)+28) [gcc frame, calls gcc]
> 7: rip=0x00000000209ba1ed @rip=[0x0000440000bffe68] (PGCTaskThread::run()+589) [gcc frame, calls gcc]
> 8: rip=0x00007f526459fc49 @rip=[0x0000440000bfff88] (alternate_stack_create+153) [gcc frame, calls gcc]
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> # Segmentation fault (0xb) at pc=0x20838c30, pid=25165, tid=25171
> #
> # Java VM: Zing 64-Bit Tiered VM (1.7.0-zing_5.10.1.0-b9-product-azlinuxM-X86_64, mixed mode)
> # Problematic frame:
> # C [libjvm.so+0x438c30] GPGC_GCManagerMark::process_mutator_stack(HeapRefBuffer*)+0x70
> #
> # An error report file with more information is saved as:
> # /qa/hudson_workspace/workspace/jdg-63-ispn-testsuite-rhel-azul/c6098cff/infinispan/core/hs_err_pid25165.log
> 2014-11-16 04:38:49,693 WARN [Configurator] (testng-TopologyAwareDistAsyncFuncTest) JGRP000014: TP.loopback has been deprecated: enabled by default
> 2014-11-16 04:38:49,693 WARN [Configurator] (testng-TopologyAwareDistAsyncFuncTest) JGRP000014: TP.physical_addr_max_fetch_attempts has been deprecated: will be ignored
> #
> # If you would like to submit a bug report, please visit:
> # http://www.azulsystems.com/support/
> #
> {noformat}
> More info from jenkins jobs here
> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-63-ispn-testsuit...
> https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/jdg-63-ispn-testsuit...
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-3421) Transaction is sometimes not applied on all owners if originator dies during commit
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3421?page=com.atlassian.jira.plugin.... ]
Dan Berindei reassigned ISPN-3421:
----------------------------------
Assignee: Dan Berindei
> Transaction is sometimes not applied on all owners if originator dies during commit
> -----------------------------------------------------------------------------------
>
> Key: ISPN-3421
> URL: https://issues.jboss.org/browse/ISPN-3421
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Affects Versions: 5.2.7.Final
> Reporter: Erik Salter
> Assignee: Dan Berindei
> Priority: Critical
>
> There's a hole in state transfer mechanism that can occur when a node is leaving the cluster, but it was creating the entries and was only able to replicate the data to some of the nodes.
> The problem occurs when the segment ownership of the node doesn't change after the rebalance. Since state transfer does not request state for keys in which it is already an owner, the cache could be left in a state where a key is resident < numOwners nodes. In addition, this could be any subset of the primary OR backup nodes.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-4908) Clustered cache with FileStore (shared=false) is inconsistent after restarting one node if entries are deleted during restart
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4908?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4908:
-------------------------------
Status: Open (was: New)
> Clustered cache with FileStore (shared=false) is inconsistent after restarting one node if entries are deleted during restart
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4908
> URL: https://issues.jboss.org/browse/ISPN-4908
> Project: Infinispan
> Issue Type: Bug
> Environment: Clustered REPL cache, preloaded, no eviction/expiration
> Reporter: Wolf-Dieter Fink
> Assignee: William Burns
>
> If a cache instance with a cache store is down and the cache is changed until the instance is back and join the cluster the cache can become inconsisstent.
> If entries are deleted during downtime,
> - the FileStore with stale object is loaded first if preload=true
> - the local entries are updated with new and changed objects from the cluster
> - removed entries from the cluster are not seen and therefore not deleted
> After complete sync (only) this instance will have stale objects.
> From a consistence and performance perspective the FileStore should be pruned on cluster-join by default in this case
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-3811) Initial ST leaves node as member without data after MERGE
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3811?page=com.atlassian.jira.plugin.... ]
Dan Berindei resolved ISPN-3811.
--------------------------------
Assignee: Dan Berindei
Fix Version/s: 7.0.0.Final
Resolution: Done
I believe I have actually fixed this with the partition handling work, we no longer use {{CHF.union}} after merge and we have special precautions for ReplicatedCHF.
> Initial ST leaves node as member without data after MERGE
> ---------------------------------------------------------
>
> Key: ISPN-3811
> URL: https://issues.jboss.org/browse/ISPN-3811
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Affects Versions: 6.0.0.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.Final
>
>
> Under certain circumstances, JGroups can issue a MERGE view when a node is joining the cache. The new node joins the cluster, and all nodes have the same cache topology (not containing the joiner yet).
> During the merge, the CH's are joined (through CHFactory.union) and as all report the same topology/hash, the resulting hash is identical. However, the joiner is added to the members list and therefore it can finish the initial state transfer, although no data have been assigned to him.
> Later, the coordinator starts rebalance and the node begins to receive some data, but the thread which started the cluster manager (and should wait until the cluster becomes properly replicated through initial ST) is already released.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month
[JBoss JIRA] (ISPN-4016) Write operations in invalidation mode can fail with OutdatedTopologyException
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4016?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4016:
-------------------------------
Priority: Critical (was: Major)
> Write operations in invalidation mode can fail with OutdatedTopologyException
> -----------------------------------------------------------------------------
>
> Key: ISPN-4016
> URL: https://issues.jboss.org/browse/ISPN-4016
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Affects Versions: 6.0.1.Final, 7.0.0.Alpha1, 7.0.0.Alpha2, 7.0.0.Alpha3
> Reporter: Dan Berindei
> Priority: Critical
> Labels: 630
> Fix For: 7.1.0.Alpha1
>
> Attachments: NonTxStateTransferInvalidationTest.log.zip
>
>
> I introduced this problem with the fix for ISPN-3873.
> Invalidation commands now have a topology id, so they can wait for the initial topology to be installed on a joiner. However, that means EntryWrappingInterceptor also checks the topology id, and if it has changed it will throw an OutdatedTopologyException. The exception is propagated all the way to the caller.
> OutdatedTopologyExceptions are not useful in invalidation mode, since the invalidation is always sent to the entire cluster. So EntryWrappingInterceptor should ignore the topology id in invalidation mode.
> {noformat}
> Tests run: 4052, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 313.505 sec <<< FAILURE!testInvalidationDuringStateTransfer(org.infinispan.statetransfer.NonTxStateTransferInvalidationTest) Time elapsed: 0.004 sec <<< FAILURE!java.util.concurrent.ExecutionException: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from NonTxStateTransferInvalidationTest-NodeB-5833, see cause for remote stack trace
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:202)
> at org.infinispan.commons.util.concurrent.NotifyingFutureImpl.get(NotifyingFutureImpl.java:84)
> at org.infinispan.statetransfer.NonTxStateTransferInvalidationTest.testInvalidationDuringStateTransfer(NonTxStateTransferInvalidationTest.java:115)
> Caused by: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from NonTxStateTransferInvalidationTest-NodeB-5833, see cause for remote stack trace
> at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:41)
> at org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:66)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:547)
> at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:280)
> at org.infinispan.interceptors.InvalidationInterceptor.invalidateAcrossCluster(InvalidationInterceptor.java:227)
> at org.infinispan.interceptors.InvalidationInterceptor.handleInvalidate(InvalidationInterceptor.java:143)
> at org.infinispan.interceptors.InvalidationInterceptor.visitPutKeyValueCommand(InvalidationInterceptor.java:80)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.EntryWrappingInterceptor.invokeNextAndApplyChanges(EntryWrappingInterceptor.java:326)
> at org.infinispan.interceptors.EntryWrappingInterceptor.setSkipRemoteGetsAndInvokeNextForDataCommand(EntryWrappingInterceptor.java:407)
> at org.infinispan.interceptors.EntryWrappingInterceptor.visitPutKeyValueCommand(EntryWrappingInterceptor.java:164)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitPutKeyValueCommand(AbstractLockingInterceptor.java:68)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)
> at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:32)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:148)
> at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:134)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:110)
> at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:73)
> at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:32)
> at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:70)
> at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)
> at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1403)
> at org.infinispan.CacheImpl.putInternal(CacheImpl.java:881)
> at org.infinispan.CacheImpl.access$100(CacheImpl.java:106)
> at org.infinispan.CacheImpl$2.call(CacheImpl.java:1015) ... 4 more
> Caused by: org.infinispan.statetransfer.OutdatedTopologyException: Cache topology changed while the command was executing: expected 2, got 3
> at org.infinispan.interceptors.EntryWrappingInterceptor.invokeNextAndApplyChanges(EntryWrappingInterceptor.java:347)
> at org.infinispan.interceptors.EntryWrappingInterceptor.setSkipRemoteGetsAndInvokeNextForDataCommand(EntryWrappingInterceptor.java:407)
> at org.infinispan.interceptors.EntryWrappingInterceptor.visitInvalidateCommand(EntryWrappingInterceptor.java:139)
> at org.infinispan.commands.write.InvalidateCommand.acceptVisitor(InvalidateCommand.java:118)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitInvalidateCommand(AbstractLockingInterceptor.java:87)
> at org.infinispan.commands.write.InvalidateCommand.acceptVisitor(InvalidateCommand.java:118)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)
> at org.infinispan.commands.AbstractVisitor.visitInvalidateCommand(AbstractVisitor.java:111)
> at org.infinispan.commands.write.InvalidateCommand.acceptVisitor(InvalidateCommand.java:118)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)
> at org.infinispan.commands.AbstractVisitor.visitInvalidateCommand(AbstractVisitor.java:111)
> at org.infinispan.commands.write.InvalidateCommand.acceptVisitor(InvalidateCommand.java:118)
> at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)
> at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:110)
> at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:73)
> at org.infinispan.commands.AbstractVisitor.visitInvalidateCommand(AbstractVisitor.java:111)
> at org.infinispan.commands.write.InvalidateCommand.acceptVisitor(InvalidateCommand.java:118)
> at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333)
> at org.infinispan.commands.remote.BaseRpcInvokingCommand.processVisitableCommand(BaseRpcInvokingCommand.java:39)
> at org.infinispan.commands.remote.SingleRpcCommand.perform(SingleRpcCommand.java:48)
> at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:95)
> at org.infinispan.remoting.InboundInvocationHandlerImpl.access$000(InboundInvocationHandlerImpl.java:50)
> at org.infinispan.remoting.InboundInvocationHandlerImpl$2.run(InboundInvocationHandlerImpl.java:178) ... 3 moreResults :
> Failed tests: NonTxStateTransferInvalidationTest.testInvalidationDuringStateTransfer:115 » Execution
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
10 years, 1 month