March 2016 - infinispan-issues - Jboss List Archives

[JBoss JIRA] (ISPN-6275) Double invalidate of invalid Hot Rod connections

by Galder Zamarreño (JIRA)

[ https://issues.jboss.org/browse/ISPN-6275?page=com.atlassian.jira.plugin.... ] Galder Zamarreño updated ISPN-6275: ----------------------------------- Status: Pull Request Sent (was: Coding In Progress) Git Pull Request: https://github.com/infinispan/infinispan/pull/4135 > Double invalidate of invalid Hot Rod connections > ------------------------------------------------ > > Key: ISPN-6275 > URL: https://issues.jboss.org/browse/ISPN-6275 > Project: Infinispan > Issue Type: Bug > Components: Remote Protocols > Affects Versions: 6.0.2.Final > Reporter: Dennis Reed > Assignee: Galder Zamarreño > Fix For: 9.0.0.Final > > > When there's a problem with a Hot Rod operation, RetryOnFailureOperation invalidates the connection twice (once in a catch block, and once in a finally block). > This causes the GenericKeyedObjectPool counts to get off, and anything relying on that count (such as the maxTotal configuration for the pool) to break. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-6275) Double invalidate of invalid Hot Rod connections

by Galder Zamarreño (JIRA)

[ https://issues.jboss.org/browse/ISPN-6275?page=com.atlassian.jira.plugin.... ] Work on ISPN-6275 started by Galder Zamarreño. ---------------------------------------------- > Double invalidate of invalid Hot Rod connections > ------------------------------------------------ > > Key: ISPN-6275 > URL: https://issues.jboss.org/browse/ISPN-6275 > Project: Infinispan > Issue Type: Bug > Components: Remote Protocols > Affects Versions: 6.0.2.Final > Reporter: Dennis Reed > Assignee: Galder Zamarreño > > When there's a problem with a Hot Rod operation, RetryOnFailureOperation invalidates the connection twice (once in a catch block, and once in a finally block). > This causes the GenericKeyedObjectPool counts to get off, and anything relying on that count (such as the maxTotal configuration for the pool) to break. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-6140) Avoid refresh after starting or stopping a node

by Vladimir Blagojevic (JIRA)

[ https://issues.jboss.org/browse/ISPN-6140?page=com.atlassian.jira.plugin.... ] Vladimir Blagojevic updated ISPN-6140: -------------------------------------- Status: Pull Request Sent (was: Open) Git Pull Request: https://github.com/infinispan/infinispan-management-console/pull/72 > Avoid refresh after starting or stopping a node > ------------------------------------------------ > > Key: ISPN-6140 > URL: https://issues.jboss.org/browse/ISPN-6140 > Project: Infinispan > Issue Type: Enhancement > Components: Console > Reporter: Pedro Zapata > Assignee: Vladimir Blagojevic > Priority: Minor > > Node status should be automatically changed without relaying on a manual refresh. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-6140) Avoid refresh after starting or stopping a node

by Vladimir Blagojevic (JIRA)

[ https://issues.jboss.org/browse/ISPN-6140?page=com.atlassian.jira.plugin.... ] Vladimir Blagojevic updated ISPN-6140: -------------------------------------- Status: Open (was: New) > Avoid refresh after starting or stopping a node > ------------------------------------------------ > > Key: ISPN-6140 > URL: https://issues.jboss.org/browse/ISPN-6140 > Project: Infinispan > Issue Type: Enhancement > Components: Console > Reporter: Pedro Zapata > Assignee: Vladimir Blagojevic > Priority: Minor > > Node status should be automatically changed without relaying on a manual refresh. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-6388) Spark integration - TimeoutException: Replication timeout on application execution

by Matej Čimbora (JIRA)

[ https://issues.jboss.org/browse/ISPN-6388?page=com.atlassian.jira.plugin.... ] Matej Čimbora commented on ISPN-6388: ------------------------------------- I looked into the issue some time to ago, however couldn't finish it due to context switch. DistributedCacheStream.rehashAwareIteration shows multiple stayLocal=false evaluations. > Spark integration - TimeoutException: Replication timeout on application execution > ----------------------------------------------------------------------------------- > > Key: ISPN-6388 > URL: https://issues.jboss.org/browse/ISPN-6388 > Project: Infinispan > Issue Type: Bug > Components: Spark > Affects Versions: 8.2.0.Final > Reporter: Matej Čimbora > Attachments: app_0.txt, driver.txt, server.txt > > > The issue occurs sporadically while application is executing (e.g. WordCount example). To some degree it seems to be affected by number of partitions used (i.e. higher the count, the less likely the issue occurs). > Using 8 node cluster (1 worker/1 ISPN server per physical node), connector v. 0.2. > Attached sample driver, server, application logs. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-6387) ISPN000197: Error updating cluster member list: org.infinispan.util.concurrent.TimeoutException: Replication timeout for X

by Radoslav Husar (JIRA)

[ https://issues.jboss.org/browse/ISPN-6387?page=com.atlassian.jira.plugin.... ] Radoslav Husar commented on ISPN-6387: -------------------------------------- Tried to backport https://github.com/infinispan/infinispan/pull/4133 but that did not help. > ISPN000197: Error updating cluster member list: org.infinispan.util.concurrent.TimeoutException: Replication timeout for X > -------------------------------------------------------------------------------------------------------------------------- > > Key: ISPN-6387 > URL: https://issues.jboss.org/browse/ISPN-6387 > Project: Infinispan > Issue Type: Bug > Components: Core > Affects Versions: 8.1.2.Final > Reporter: Radoslav Husar > Assignee: Radoslav Husar > > Booting WF with starting caches yields after 1 minute: > The problematic call originates in Infinispan's org.infinispan.topology.ClusterTopologyManagerImpl#confirmMembersAvailable heartbeat command. > {noformat} > 00:20:51,646 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (timeout-thread--p10-t1) Response: sender=node2, received=false, suspected=false > 00:20:51,647 WARN [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p13-t2) ISPN000197: Error updating cluster member list: org.infinispan.util.concurrent.TimeoutException: Replication timeout for node2 > at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:765) > at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$0(JGroupsTransport.java:599) > at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) > at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) > at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) > at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:46) > at org.infinispan.remoting.transport.jgroups.SingleResponseFuture.call(SingleResponseFuture.java:17) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-6389) Warn if a class if found in small and uber jars

by Sebastian Łaskawiec (JIRA)

Sebastian Łaskawiec created ISPN-6389: ----------------------------------------- Summary: Warn if a class if found in small and uber jars Key: ISPN-6389 URL: https://issues.jboss.org/browse/ISPN-6389 Project: Infinispan Issue Type: Enhancement Reporter: Sebastian Łaskawiec Assignee: Sebastian Łaskawiec Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-6388) Spark integration - TimeoutException: Replication timeout on application execution

by Matej Čimbora (JIRA)

[ https://issues.jboss.org/browse/ISPN-6388?page=com.atlassian.jira.plugin.... ] Matej Čimbora updated ISPN-6388: -------------------------------- Description: The issue occurs sporadically while application is executing (e.g. WordCount example). To some degree it seems to be affected by number of partitions used (i.e. higher the count, the less likely the issue occurs). Using 8 node cluster (1 worker/1 ISPN server per physical node), connector v. 0.2. Attached sample driver, server, application logs. was: The issue occurs sporadically while application is executing (e.g. WordCount example). To some degree it seems to be affected by number of partitions used (i.e. higher the count, the less likely the issue occurs). Using 8 node cluster (1 worker/1 ISPN server per physical node). Attached sample driver, server, application logs. > Spark integration - TimeoutException: Replication timeout on application execution > ----------------------------------------------------------------------------------- > > Key: ISPN-6388 > URL: https://issues.jboss.org/browse/ISPN-6388 > Project: Infinispan > Issue Type: Bug > Components: Spark > Affects Versions: 8.2.0.Final > Reporter: Matej Čimbora > Attachments: app_0.txt, driver.txt, server.txt > > > The issue occurs sporadically while application is executing (e.g. WordCount example). To some degree it seems to be affected by number of partitions used (i.e. higher the count, the less likely the issue occurs). > Using 8 node cluster (1 worker/1 ISPN server per physical node), connector v. 0.2. > Attached sample driver, server, application logs. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-6388) Spark integration - TimeoutException: Replication timeout on application execution

by Matej Čimbora (JIRA)

Matej Čimbora created ISPN-6388: ----------------------------------- Summary: Spark integration - TimeoutException: Replication timeout on application execution Key: ISPN-6388 URL: https://issues.jboss.org/browse/ISPN-6388 Project: Infinispan Issue Type: Bug Components: Spark Affects Versions: 8.2.0.Final Reporter: Matej Čimbora Attachments: app_0.txt, driver.txt, server.txt The issue occurs sporadically while application is executing (e.g. WordCount example). To some degree it seems to be affected by number of partitions used (i.e. higher the count, the less likely the issue occurs). Using 8 node cluster (1 worker/1 ISPN server per physical node). Attached sample driver, server, application logs. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 3 months

1
0
0 / 0

[JBoss JIRA] (ISPN-6239) InitialClusterSizeTest.testInitialClusterSizeFail random failures

by Dan Berindei (JIRA)

[ https://issues.jboss.org/browse/ISPN-6239?page=com.atlassian.jira.plugin.... ] Dan Berindei commented on ISPN-6239: ------------------------------------ While trying to reproduce the failure on my machine, I found another failure caused by a concurrency issue in {{TEST_PING}}: {noformat} 12:44:36,043 TRACE (ForkThread-4,InitialClusterSizeTest:) [TEST_PING] Discoveries for DiscoveryKey{clusterName='ISPN', testName='org.infinispan.remoting.transport.InitialClusterSizeTest'} are : {} 12:44:36,043 TRACE (ForkThread-1,InitialClusterSizeTest:) [TEST_PING] Discoveries for DiscoveryKey{clusterName='ISPN', testName='org.infinispan.remoting.transport.InitialClusterSizeTest'} are : {} 12:44:36,043 TRACE (ForkThread-1,InitialClusterSizeTest:) [TEST_PING] Add discovery for NodeA-45697 to cache. The cache now contains: {NodeD-30921=TEST_PING@NodeD-30921, NodeA-45697=TEST_PING@NodeA-45697} 12:44:36,043 TRACE (ForkThread-4,InitialClusterSizeTest:) [TEST_PING] Add discovery for NodeD-30921 to cache. The cache now contains: {NodeD-30921=TEST_PING@NodeD-30921, NodeA-45697=TEST_PING@NodeA-45697} 12:44:36,043 TRACE (ForkThread-3,InitialClusterSizeTest:) [TEST_PING] Discoveries for DiscoveryKey{clusterName='ISPN', testName='org.infinispan.remoting.transport.InitialClusterSizeTest'} are : {NodeD-30921=TEST_PING@NodeD-30921, NodeA-45697=TEST_PING@NodeA-45697} 12:44:36,043 TRACE (ForkThread-3,InitialClusterSizeTest:) [TEST_PING] Add discovery for NodeC-59583 to cache. The cache now contains: {NodeD-30921=TEST_PING@NodeD-30921, NodeA-45697=TEST_PING@NodeA-45697, NodeC-59583=TEST_PING@NodeC-59583} 12:44:36,043 TRACE (ForkThread-2,InitialClusterSizeTest:) [TEST_PING] Discoveries for DiscoveryKey{clusterName='ISPN', testName='org.infinispan.remoting.transport.InitialClusterSizeTest'} are : {NodeD-30921=TEST_PING@NodeD-30921, NodeA-45697=TEST_PING@NodeA-45697, NodeC-59583=TEST_PING@NodeC-59583} 12:44:36,044 TRACE (ForkThread-2,InitialClusterSizeTest:) [TEST_PING] Add discovery for NodeB-6005 to cache. The cache now contains: {NodeD-30921=TEST_PING@NodeD-30921, NodeA-45697=TEST_PING@NodeA-45697, NodeB-6005=TEST_PING@NodeB-6005, NodeC-59583=TEST_PING@NodeC-59583} 12:44:36,044 TRACE (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeD-30921: discovery took 2 ms, members: 1 rsps (0 coords) [done] 12:44:36,044 TRACE (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeD-30921: could not determine coordinator from rsps 1 rsps (0 coords) [done] 12:44:36,045 TRACE (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeD-30921: nodes to choose new coord from are: [NodeD-30921, NodeA-45697] 12:44:36,045 TRACE (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeD-30921: I (NodeD-30921) am the first of the nodes, will become coordinator 12:44:36,045 TRACE (ForkThread-2,InitialClusterSizeTest:) [GMS] NodeB-6005: discovery took 3 ms, members: 3 rsps (0 coords) [done] 12:44:36,045 TRACE (ForkThread-2,InitialClusterSizeTest:) [GMS] NodeB-6005: could not determine coordinator from rsps 3 rsps (0 coords) [done] 12:44:36,045 TRACE (ForkThread-2,InitialClusterSizeTest:) [GMS] NodeB-6005: nodes to choose new coord from are: [NodeC-59583, NodeD-30921, NodeB-6005, NodeA-45697] 12:44:36,045 TRACE (ForkThread-2,InitialClusterSizeTest:) [GMS] NodeB-6005: I (NodeB-6005) am not the first of the nodes, waiting for another client to become coordinator {noformat} The cluster starts as 2 partitions with NodeB and NodeD as coordinators, and because the test doesn't use {{TransportFlags.withMerge()}}, the partitions will never merge. > InitialClusterSizeTest.testInitialClusterSizeFail random failures > ----------------------------------------------------------------- > > Key: ISPN-6239 > URL: https://issues.jboss.org/browse/ISPN-6239 > Project: Infinispan > Issue Type: Bug > Components: Test Suite - Core > Affects Versions: 8.2.0.Beta2 > Reporter: Dan Berindei > Assignee: Dan Berindei > Labels: testsuite_failure > Fix For: 8.2.0.CR1, 8.2.0.Final > > > The test starts 3 nodes concurrently, but configures Infinispan to wait for a cluster of 4 nodes, and expects that the nodes fail to start in {{initialClusterTimeout}} + 1 second. > However, because of a bug in {{TEST_PING}}, the first 2 nodes see each other as coordinator and send a {{JOIN}} request to each other, and it takes 3 seconds to recover and start the cluster properly. > The bug in {{TEST_PING}} is actually a hack introduced for {{ISPN-5106}}. The problem was that the first node (A) to start would install a view with itself as the single node, but the second node to start (B) would start immediately, and the discovery request from B would reach B's {{TEST_PING}} before it saw the view. That way, B could choose itself as the coordinator based on the order of A's and B's UUIDs, and the cluster would start as 2 partitions. Since most of our tests actually remove {{MERGE3}} from the protocol stack, the partitions would never merge and the test would fail with a timeout. > I fixed this in {{TEST_PING}} by assuming that the sender of the first discovery response is a coordinator, when there is a single response. This worked because all but a few tests start their managers sequentially, however it sometimes introduces this 3 seconds delay when nodes start in parallel. -- This message was sent by Atlassian JIRA (v6.4.11#64026)

10 years, 3 months

1
0
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-issues March 2016