[JBoss JIRA] (ISPN-4490) Members can miss the rebalance cancellation on coordinator change
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4490?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4490:
-----------------------------------------------
Alan Field <afield(a)redhat.com> changed the Status of [bug 1104045|https://bugzilla.redhat.com/show_bug.cgi?id=1104045] from ON_QA to VERIFIED
> Members can miss the rebalance cancellation on coordinator change
> -----------------------------------------------------------------
>
> Key: ISPN-4490
> URL: https://issues.jboss.org/browse/ISPN-4490
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Alpha4
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 7.0.0.Alpha5
>
>
> The new coordinator sends first a CH_UPDATE command to cancel the existing rebalance, and then a REBALANCE_START command to start a new rebalance. But the CH_UPDATE command is sent asynchronously, so it's possible for some members to receive it after the REBALANCE_START command.
> If that happens, that node will assume that it will receive the segments it requested for the previous rebalance. But with the ISPN-4484 fix, the provider node cancels the outbound transfer tasks when receiving a CH_UPDATE without a pendingCH, so the state requestor will never receive its segments.
> Even without the ISPN-4484 fix this is a problem, although less obvious. Between the provider node receiving the CH_UPDATE and the REBALANCE_START commands, it won't have the requestor in its write CH, so the requestor can miss transactions.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 6 months
[JBoss JIRA] (ISPN-4480) Messages sent to leavers can clog the JGroups bundler thread
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4480?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4480:
-----------------------------------------------
Alan Field <afield(a)redhat.com> changed the Status of [bug 1104045|https://bugzilla.redhat.com/show_bug.cgi?id=1104045] from ON_QA to VERIFIED
> Messages sent to leavers can clog the JGroups bundler thread
> ------------------------------------------------------------
>
> Key: ISPN-4480
> URL: https://issues.jboss.org/browse/ISPN-4480
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Core
> Affects Versions: 6.0.2.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
>
> In a stress test that repeatedly kills nodes while performing read/write operations, the TransferQueueBundler thread seems to spend a lot of time waiting for physical addresses:
> {noformat}
> 06:40:10,316 WARN [org.radargun.utils.Utils] (pool-5-thread-1) Stack for thread TransferQueueBundler,default,apex953-14666:
> java.lang.Thread.sleep(Native Method)
> org.jgroups.util.Util.sleep(Util.java:1504)
> org.jgroups.util.Util.sleepRandom(Util.java:1574)
> org.jgroups.protocols.TP.sendToSingleMember(TP.java:1685)
> org.jgroups.protocols.TP.doSend(TP.java:1670)
> org.jgroups.protocols.TP$TransferQueueBundler.sendBundledMessages(TP.java:2476)
> org.jgroups.protocols.TP$TransferQueueBundler.sendMessages(TP.java:2392)
> org.jgroups.protocols.TP$TransferQueueBundler.run(TP.java:2383)
> java.lang.Thread.run(Thread.java:744)
> {noformat}
> There are 2 bugs related to this already fixed in JGroups 3.5.0.Beta2+: JGRP-1814, JGRP-1815
> There is also a special case where the physical address could be removed from the cache too soon, exacerbating the effect of JGRP-1815: JGRP-1858
> We can work around the problem by changing the JGroups configuration:
> * TP.logical_addr_cache_expiration=86400000
> ** Only expire addresses after 1 day
> * TP.physical_addr_max_fetch_attempts=1
> ** Sleep for only 20ms waiting for the physical address (default 3 - 1500ms)
> * UNICAST3_conn_close_timeout=10000
> ** Drop the pending messages to leavers sooner
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 6 months
[JBoss JIRA] (ISPN-4154) Cancelled segment transfer causes future entry transfer to be ignored
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4154?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4154:
-----------------------------------------------
Alan Field <afield(a)redhat.com> changed the Status of [bug 1104045|https://bugzilla.redhat.com/show_bug.cgi?id=1104045] from ON_QA to VERIFIED
> Cancelled segment transfer causes future entry transfer to be ignored
> ---------------------------------------------------------------------
>
> Key: ISPN-4154
> URL: https://issues.jboss.org/browse/ISPN-4154
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: State Transfer
> Affects Versions: 7.0.0.Alpha1
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 7.0.0.Alpha5
>
>
> Distributed transactional cache.
> 1) Coordinator is gracefully leaving the cluster, sends a REBALANCE_START with topologyId 14, ST begins.
> 2) Node receives chunk from segment X, writes entry K=V to the container.
> 3) New coordinator jumps in with CH_UPDATE topology 16
> 4) Node receives CANCEL_STATE_TRANSFER and cancels transfer of segment X, invalidating K. In CommitManager, this operation is tracked and DiscardPolicy is set to DISCARD_STATE_TRANSFER for key K.
> 5) New coordinator starts rebalance with topology 17
> 6) Node starts new ST for segment X
> 7) Node receives the X: K=V, but in CommitManager it finds out that the policy is set to DISCARD_STATE_TRANSFER and ignores this update.
> Result: entry value is lost on some node.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 6 months
[JBoss JIRA] (ISPN-4490) Members can miss the rebalance cancellation on coordinator change
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4490?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration updated ISPN-4490:
------------------------------------------
Bugzilla References: https://bugzilla.redhat.com/show_bug.cgi?id=1117948, https://bugzilla.redhat.com/show_bug.cgi?id=1104045 (was: https://bugzilla.redhat.com/show_bug.cgi?id=1117948)
> Members can miss the rebalance cancellation on coordinator change
> -----------------------------------------------------------------
>
> Key: ISPN-4490
> URL: https://issues.jboss.org/browse/ISPN-4490
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Alpha4
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 7.0.0.Alpha5
>
>
> The new coordinator sends first a CH_UPDATE command to cancel the existing rebalance, and then a REBALANCE_START command to start a new rebalance. But the CH_UPDATE command is sent asynchronously, so it's possible for some members to receive it after the REBALANCE_START command.
> If that happens, that node will assume that it will receive the segments it requested for the previous rebalance. But with the ISPN-4484 fix, the provider node cancels the outbound transfer tasks when receiving a CH_UPDATE without a pendingCH, so the state requestor will never receive its segments.
> Even without the ISPN-4484 fix this is a problem, although less obvious. Between the provider node receiving the CH_UPDATE and the REBALANCE_START commands, it won't have the requestor in its write CH, so the requestor can miss transactions.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 6 months
[JBoss JIRA] (ISPN-4523) Test for HR PLAIN auth against LDAP over SSL
by Vojtech Juranek (JIRA)
Vojtech Juranek created ISPN-4523:
-------------------------------------
Summary: Test for HR PLAIN auth against LDAP over SSL
Key: ISPN-4523
URL: https://issues.jboss.org/browse/ISPN-4523
Project: Infinispan
Issue Type: Task
Security Level: Public (Everyone can see)
Components: Test Suite - Server
Reporter: Vojtech Juranek
Assignee: Vojtech Juranek
Implement integration test that would verify Hot Rod SASL PLAIN authentication against LDAP over SSL/TLS and thus verify a secure way how HR client can authenticate against LDAP.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 6 months
[JBoss JIRA] (ISPN-4522) Test for HR PLAIN auth against LDAP
by Vojtech Juranek (JIRA)
Vojtech Juranek created ISPN-4522:
-------------------------------------
Summary: Test for HR PLAIN auth against LDAP
Key: ISPN-4522
URL: https://issues.jboss.org/browse/ISPN-4522
Project: Infinispan
Issue Type: Task
Security Level: Public (Everyone can see)
Components: Test Suite - Server
Reporter: Vojtech Juranek
Assignee: Vojtech Juranek
Implement integration test that would verify Hot Rod SASL PLAIN authentication against LDAP.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 6 months
[JBoss JIRA] (ISPN-4512) CacheManagerTest.testCacheManagerRestartReusingConfigurations random failures
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4512?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4512:
-------------------------------
Assignee: William Burns (was: Dan Berindei)
> CacheManagerTest.testCacheManagerRestartReusingConfigurations random failures
> -----------------------------------------------------------------------------
>
> Key: ISPN-4512
> URL: https://issues.jboss.org/browse/ISPN-4512
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Core, Test Suite - Core
> Affects Versions: 7.0.0.Alpha4
> Reporter: Dan Berindei
> Assignee: William Burns
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.Alpha5
>
> Attachments: CacheManagerTest_t_ISPN-4154_failing_elasticity_test_20140707.log.gz
>
>
> When a new cache manager is started with the same configuration, it uses the JGroupsTransport instance. In some rare cases, the JGroupsTransport keeps using the old marshaller, which doesn't work, and the cache fails to start:
> {noformat}
> 23:54:08,203 TRACE (testng-CacheManagerTest:___defaultcache) [JGroupsTransport] dests=[NodeB-24139], command=CacheTopologyControlCommand{cache=___defaultcache, type=JOIN, sender=NodeA-33664, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.impl.ReplicatedConsistentHashFactory@b8c8791, hashFunction=MurmurHash3, numSegments=60, numOwners=2, timeout=240000, totalOrder=false, distributed=false}, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=3}, mode=SYNCHRONOUS, timeout=240000
> 23:54:08,207 DEBUG (testng-CacheManagerTest:___defaultcache) [VersionAwareMarshaller] Object is not serializable
> java.io.NotSerializableException: org.infinispan.topology.CacheTopologyControlCommand
> at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:890)
> at org.jboss.marshalling.AbstractObjectOutput.writeObject(AbstractObjectOutput.java:58)
> at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:111)
> at org.infinispan.commons.marshall.jboss.AbstractJBossMarshaller.objectToObjectStream(AbstractJBossMarshaller.java:73)
> at org.infinispan.marshall.core.VersionAwareMarshaller.objectToBuffer(VersionAwareMarshaller.java:77)
> at org.infinispan.commons.marshall.AbstractMarshaller.objectToBuffer(AbstractMarshaller.java:41)
> at org.infinispan.commons.marshall.AbstractDelegatingMarshaller.objectToBuffer(AbstractDelegatingMarshaller.java:85)
> at org.infinispan.remoting.transport.jgroups.MarshallerAdapter.objectToBuffer(MarshallerAdapter.java:23)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.marshallCall(CommandAwareRpcDispatcher.java:335)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:352)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:165)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:526)
> at org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinator(LocalTopologyManagerImpl.java:290)
> at org.infinispan.topology.LocalTopologyManagerImpl.join(LocalTopologyManagerImpl.java:100)
> at org.infinispan.statetransfer.StateTransferManagerImpl.start(StateTransferManagerImpl.java:104)
> {noformat}
> The only test that does this is CacheManagerTest.testCacheManagerRestartReusingConfigurations.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 6 months