[JBoss JIRA] (ISPN-4154) Cancelled segment transfer causes future entry transfer to be ignored
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4154?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4154:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 1104045|https://bugzilla.redhat.com/show_bug.cgi?id=1104045] from VERIFIED to CLOSED
> Cancelled segment transfer causes future entry transfer to be ignored
> ---------------------------------------------------------------------
>
> Key: ISPN-4154
> URL: https://issues.jboss.org/browse/ISPN-4154
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Affects Versions: 7.0.0.Alpha1
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 7.0.0.Alpha5
>
>
> Distributed transactional cache.
> 1) Coordinator is gracefully leaving the cluster, sends a REBALANCE_START with topologyId 14, ST begins.
> 2) Node receives chunk from segment X, writes entry K=V to the container.
> 3) New coordinator jumps in with CH_UPDATE topology 16
> 4) Node receives CANCEL_STATE_TRANSFER and cancels transfer of segment X, invalidating K. In CommitManager, this operation is tracked and DiscardPolicy is set to DISCARD_STATE_TRANSFER for key K.
> 5) New coordinator starts rebalance with topology 17
> 6) Node starts new ST for segment X
> 7) Node receives the X: K=V, but in CommitManager it finds out that the policy is set to DISCARD_STATE_TRANSFER and ignores this update.
> Result: entry value is lost on some node.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 2 months
[JBoss JIRA] (ISPN-4484) Outbound transfers can be cancelled by old CANCEL_STATE_TRANSFER command
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4484?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4484:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 1116969|https://bugzilla.redhat.com/show_bug.cgi?id=1116969] from VERIFIED to CLOSED
> Outbound transfers can be cancelled by old CANCEL_STATE_TRANSFER command
> ------------------------------------------------------------------------
>
> Key: ISPN-4484
> URL: https://issues.jboss.org/browse/ISPN-4484
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 7.0.0.Alpha5
>
>
> This appeared during the 32-nodes elasticity test in the Hyperion environment.
> Just as apex947 left, it started a rebalance, which apex948 dutifully cancelled as it became the new coordinator. apex949 had already requested segments from apex959, so it sent a StateRequestCommand(CANCEL_STATE_TRANSFER) asynchronously to apex959. Then apex948 started a new rebalance, and apex949 asked apex959 for the same segments. When apex959 finally received the cancel request, it didn't check the topology id and it incorrectly cancelled the outbound transfer to apex949.
> The solution would be to verify the topology id in the CANCEL_STATE_TRANSFER command before cancelling the transfer. I also think we can avoid sending the cancel command completely in this case, and only send it as we are about to stop.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 2 months
[JBoss JIRA] (ISPN-4480) Messages sent to leavers can clog the JGroups bundler thread
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4480?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4480:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 1104045|https://bugzilla.redhat.com/show_bug.cgi?id=1104045] from VERIFIED to CLOSED
> Messages sent to leavers can clog the JGroups bundler thread
> ------------------------------------------------------------
>
> Key: ISPN-4480
> URL: https://issues.jboss.org/browse/ISPN-4480
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 7.0.0.Beta1
>
>
> In a stress test that repeatedly kills nodes while performing read/write operations, the TransferQueueBundler thread seems to spend a lot of time waiting for physical addresses:
> {noformat}
> 06:40:10,316 WARN [org.radargun.utils.Utils] (pool-5-thread-1) Stack for thread TransferQueueBundler,default,apex953-14666:
> java.lang.Thread.sleep(Native Method)
> org.jgroups.util.Util.sleep(Util.java:1504)
> org.jgroups.util.Util.sleepRandom(Util.java:1574)
> org.jgroups.protocols.TP.sendToSingleMember(TP.java:1685)
> org.jgroups.protocols.TP.doSend(TP.java:1670)
> org.jgroups.protocols.TP$TransferQueueBundler.sendBundledMessages(TP.java:2476)
> org.jgroups.protocols.TP$TransferQueueBundler.sendMessages(TP.java:2392)
> org.jgroups.protocols.TP$TransferQueueBundler.run(TP.java:2383)
> java.lang.Thread.run(Thread.java:744)
> {noformat}
> There are 2 bugs related to this already fixed in JGroups 3.5.0.Beta2+: JGRP-1814, JGRP-1815
> There is also a special case where the physical address could be removed from the cache too soon, exacerbating the effect of JGRP-1815: JGRP-1858
> We can work around the problem by changing the JGroups configuration:
> * TP.logical_addr_cache_expiration=86400000
> ** Only expire addresses after 1 day
> * TP.physical_addr_max_fetch_attempts=1
> ** Sleep for only 20ms waiting for the physical address (default 3 - 1500ms)
> * UNICAST3_conn_close_timeout=10000
> ** Drop the pending messages to leavers sooner
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 2 months
[JBoss JIRA] (ISPN-4235) RHQ JMX does not support multiple cache managers properly
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4235?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4235:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 1054165|https://bugzilla.redhat.com/show_bug.cgi?id=1054165] from VERIFIED to CLOSED
> RHQ JMX does not support multiple cache managers properly
> ---------------------------------------------------------
>
> Key: ISPN-4235
> URL: https://issues.jboss.org/browse/ISPN-4235
> Project: Infinispan
> Issue Type: Bug
> Components: JMX, reporting and management
> Affects Versions: 7.0.0.Alpha3
> Reporter: William Burns
> Assignee: William Burns
> Labels: 63gablocker
> Fix For: 7.0.0.Alpha5, 7.0.0.Final
>
>
> The JMX RHQ pluging currently queries all cache managers and lumps them together. Thus if you have multiple cache managers in separate domains they all show as 1 cache manager with all the caches for each underneath it. We need to separate each cache manager by domain in the plugin so that each can be administered separately.
> It appears the CacheManagerDiscovery lumps all the managers by type in createDiscoveredResources. We would then have to change the CacheManagerComponent to properly query the specific mbean.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 2 months
[JBoss JIRA] (ISPN-2895) org.infinispan.lucene.InfinispanDirectoryIOTest.testReadWholeFile fails randomly
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-2895?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-2895:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 918448|https://bugzilla.redhat.com/show_bug.cgi?id=918448] from VERIFIED to CLOSED
> org.infinispan.lucene.InfinispanDirectoryIOTest.testReadWholeFile fails randomly
> --------------------------------------------------------------------------------
>
> Key: ISPN-2895
> URL: https://issues.jboss.org/browse/ISPN-2895
> Project: Infinispan
> Issue Type: Bug
> Components: Lucene Directory
> Affects Versions: 5.2.2.Final
> Reporter: Anna Manukyan
> Assignee: Sanne Grinovero
> Labels: 630, stable_embedded_query, testsuite_stability
> Fix For: 7.0.0.Alpha1, 7.0.0.Final
>
>
> The test fails randomly and the error message is:
> {code}
> java.lang.NullPointerException
> at org.infinispan.lucene.DirectoryIntegrityCheck.verifyDirectoryStructure(DirectoryIntegrityCheck.java:76)
> at org.infinispan.lucene.DirectoryIntegrityCheck.verifyDirectoryStructure(DirectoryIntegrityCheck.java:56)
> at org.infinispan.lucene.InfinispanDirectoryIOTest.verifyOnBuffer(InfinispanDirectoryIOTest.java:184)
> at org.infinispan.lucene.InfinispanDirectoryIOTest.testReadWholeFile(InfinispanDirectoryIOTest.java:153)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:715)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:907)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1237)
> at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
> at org.testng.TestRunner.privateRun(TestRunner.java:767)
> at org.testng.TestRunner.run(TestRunner.java:617)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
> at org.testng.SuiteRunner.access$000(SuiteRunner.java:37)
> at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:368)
> at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 2 months
[JBoss JIRA] (ISPN-4426) Transaction replayed but not committed
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4426?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4426:
-----------------------------------------------
Dave Stahl <dstahl(a)redhat.com> changed the Status of [bug 1111644|https://bugzilla.redhat.com/show_bug.cgi?id=1111644] from VERIFIED to CLOSED
> Transaction replayed but not committed
> --------------------------------------
>
> Key: ISPN-4426
> URL: https://issues.jboss.org/browse/ISPN-4426
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Affects Versions: 7.0.0.Alpha4
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
> Labels: 63gablocker
> Fix For: 7.0.0.Alpha5
>
>
> Dist TX cache, node C is joining. In previous topology, entry is owned by A (primary) and B (backup). In new topology, primary ownership is transferred to C, B stays backup.
> 1. TX is prepared in old topology and is being committed, when topology changes
> 2. on C (the new owner), the TX info is received and later even the old entry
> 3. C receives the CommitCommand, therefore, it correctly replays the PrepareCommand.
> 4. When the entries are about to be committed, in TxInterceptor the transaction is found to be already completed as it has lower TxID.
> Result: the transaction is not being executed and stale data stay on the node (with my algortihm it eventually led to complete entry loss).
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years, 2 months