[JBoss JIRA] (ISPN-9546) Upgrade to JGroups 4.0.15.Final
by Dan Berindei (JIRA)
Dan Berindei created ISPN-9546:
----------------------------------
Summary: Upgrade to JGroups 4.0.15.Final
Key: ISPN-9546
URL: https://issues.jboss.org/browse/ISPN-9546
Project: Infinispan
Issue Type: Component Upgrade
Components: Build
Affects Versions: 9.4.0.CR3
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 9.4.0.Final
JGroups 4.0.15.Final fixes JGRP-2294 and most likely ISPN-9517
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 6 months
[JBoss JIRA] (ISPN-9544) Asymmetric ForkChannels break cache topology management
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9544?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9544:
-------------------------------
Summary: Asymmetric ForkChannels break cache topology management (was: Fork reuse breaks cache topology management)
> Asymmetric ForkChannels break cache topology management
> -------------------------------------------------------
>
> Key: ISPN-9544
> URL: https://issues.jboss.org/browse/ISPN-9544
> Project: Infinispan
> Issue Type: Bug
> Components: Server
> Affects Versions: 9.4.0.CR3
> Reporter: Dan Berindei
> Assignee: Dan Berindei
>
> Before {{FORK}} was introduced, {{ClsuterTopologyManagerImpl}} and {{LocalTopologyManagerImpl}} assumed that the coordinator would always reply to other members' requests. After the introduction of {{FORK}} we added some hacks to work around the fact that the coordinator may not yet have a {{ForkChannel}} with our ID running **yet**, but we still expect the {{FORK}} setup to be symmetric after a reasonable amount of time.
> Stopping a {{FORK}} and starting it back without restarting the underlying channel also doesn't work, because a {{FORK}} start/stop does not trigger a new view. When a node sends a request to the coordinator and receives back a {{CacheNotFoundResponse}}, it assumes that it will also receive a new view, but if the {{CacheNotFoundResponse}} was a consequence of stopping a single {{DefaultCacheManager}}/{{ForkChannel}}, that view will never arrive.
> We don't restart individual cache managers in our tests, but the spark connector test suite does it, and it sometimes fails because of it:
> {noformat}
> 2018-09-26 21:18:03,035 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel cluster: [server2|6] (3) [server2, server0, server1]
> 2018-09-26 21:18:05,778 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) server1 sending request 37 to server2: CacheTopologyControlCommand{cache=org.infinispan.spark.suites.DistributedSuite, type=POLICY_GET_STATUS, sender=server1, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=6}
> 2018-09-26 21:18:05,795 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (jgroups-4,server1) server1 received response for request 37 from server2: CacheNotFoundResponse
> 2018-09-26 21:18:05,798 TRACE [org.infinispan.topology.LocalTopologyManagerImpl] (MSC service thread 1-5) Coordinator left the cluster while querying rebalancing status, retrying
> 2018-09-26 21:18:05,823 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) server1 sending request 41 to server2: CacheTopologyControlCommand{cache=org.infinispan.spark.suites.DistributedSuite, type=POLICY_GET_STATUS, sender=server1, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=6}
> 2018-09-26 21:18:05,841 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (jgroups-19,server1) server1 received response for request 41 from server2: CacheNotFoundResponse
> 2018-09-26 21:18:05,846 TRACE [org.infinispan.topology.LocalTopologyManagerImpl] (MSC service thread 1-5) Coordinator left the cluster while querying rebalancing status, retrying
> 2018-09-26 21:18:05,871 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) Waiting for transaction data for view 7, current view is 6
> 2018-09-26 21:19:05,779 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-5) MSC000001: Failed to start service jboss.datagrid-infinispan.clustered."org.infinispan.spark.suites.DistributedSuite": org.jboss.msc.service.StartException in service jboss.datagrid-infinispan.clustered."org.infinispan.spark.suites.DistributedSuite": Failed to start service
> at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1728)
> at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1556)
> at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
> at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1985)
> at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1487)
> at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1364)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000451: Timed out waiting for view 7, current view is 6
> at org.infinispan.topology.LocalTopologyManagerImpl.waitForView(LocalTopologyManagerImpl.java:558)
> at org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinatorRetry(LocalTopologyManagerImpl.java:598)
> at org.infinispan.topology.LocalTopologyManagerImpl.isCacheRebalancingEnabled(LocalTopologyManagerImpl.java:580)
> at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:233)
> at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1056)
> at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:451)
> at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:653)
> at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:598)
> at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:481)
> at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:465)
> at org.infinispan.manager.impl.AbstractDelegatingEmbeddedCacheManager.getCache(AbstractDelegatingEmbeddedCacheManager.java:157)
> at org.infinispan.server.infinispan.SecurityActions.lambda$startCache$4(SecurityActions.java:122)
> at org.infinispan.security.Security.doPrivileged(Security.java:44)
> at org.infinispan.server.infinispan.SecurityActions.doPrivileged(SecurityActions.java:69)
> at org.infinispan.server.infinispan.SecurityActions.startCache(SecurityActions.java:126)
> at org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:87)
> at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1736)
> at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1698)
> ... 6 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 6 months
[JBoss JIRA] (ISPN-9545) Infinispan Could not read or write
by Mingjun Liu (JIRA)
Mingjun Liu created ISPN-9545:
---------------------------------
Summary: Infinispan Could not read or write
Key: ISPN-9545
URL: https://issues.jboss.org/browse/ISPN-9545
Project: Infinispan
Issue Type: Bug
Affects Versions: 9.1.4.Final
Reporter: Mingjun Liu
2 datacenter, each with 3 keycloak nodes & 3 infinipsan nodes
versions:
keycloak - 3.4.3.Final
Infinispan - 9.1.4
Observations
1. lots of messages from keycloak on Timeout to read/write messages to infinipsan, including jgroups timeout messages
2. connect infinispan via ispn-cli.sh script, execute *container clustered*, then the command *ls* returning no output but error message
Recovery procedure
Restart whole infinispan cluster from one dc, then another
After infinispan restart keycloak service come back to normal
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 6 months
[JBoss JIRA] (ISPN-9544) Fork reuse breaks cache topology management
by Dan Berindei (JIRA)
Dan Berindei created ISPN-9544:
----------------------------------
Summary: Fork reuse breaks cache topology management
Key: ISPN-9544
URL: https://issues.jboss.org/browse/ISPN-9544
Project: Infinispan
Issue Type: Bug
Components: Server
Affects Versions: 9.4.0.CR3
Reporter: Dan Berindei
Assignee: Dan Berindei
Before {{FORK}} was introduced, {{ClsuterTopologyManagerImpl}} and {{LocalTopologyManagerImpl}} assumed that the coordinator would always reply to other members' requests. After the introduction of {{FORK}} we added some hacks to work around the fact that the coordinator may not yet have a {{ForkChannel}} with our ID running **yet**, but we still expect the {{FORK}} setup to be symmetric after a reasonable amount of time.
Stopping a {{FORK}} and starting it back without restarting the underlying channel also doesn't work, because a {{FORK}} start/stop does not trigger a new view. When a node sends a request to the coordinator and receives back a {{CacheNotFoundResponse}}, it assumes that it will also receive a new view, but if the {{CacheNotFoundResponse}} was a consequence of stopping a single {{DefaultCacheManager}}/{{ForkChannel}}, that view will never arrive.
We don't restart individual cache managers in our tests, but the spark connector test suite does it, and it sometimes fails because of it:
{noformat}
2018-09-26 21:18:03,035 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel cluster: [server2|6] (3) [server2, server0, server1]
2018-09-26 21:18:05,778 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) server1 sending request 37 to server2: CacheTopologyControlCommand{cache=org.infinispan.spark.suites.DistributedSuite, type=POLICY_GET_STATUS, sender=server1, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=6}
2018-09-26 21:18:05,795 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (jgroups-4,server1) server1 received response for request 37 from server2: CacheNotFoundResponse
2018-09-26 21:18:05,798 TRACE [org.infinispan.topology.LocalTopologyManagerImpl] (MSC service thread 1-5) Coordinator left the cluster while querying rebalancing status, retrying
2018-09-26 21:18:05,823 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) server1 sending request 41 to server2: CacheTopologyControlCommand{cache=org.infinispan.spark.suites.DistributedSuite, type=POLICY_GET_STATUS, sender=server1, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=6}
2018-09-26 21:18:05,841 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (jgroups-19,server1) server1 received response for request 41 from server2: CacheNotFoundResponse
2018-09-26 21:18:05,846 TRACE [org.infinispan.topology.LocalTopologyManagerImpl] (MSC service thread 1-5) Coordinator left the cluster while querying rebalancing status, retrying
2018-09-26 21:18:05,871 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) Waiting for transaction data for view 7, current view is 6
2018-09-26 21:19:05,779 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-5) MSC000001: Failed to start service jboss.datagrid-infinispan.clustered."org.infinispan.spark.suites.DistributedSuite": org.jboss.msc.service.StartException in service jboss.datagrid-infinispan.clustered."org.infinispan.spark.suites.DistributedSuite": Failed to start service
at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1728)
at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1556)
at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1985)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1487)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1364)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000451: Timed out waiting for view 7, current view is 6
at org.infinispan.topology.LocalTopologyManagerImpl.waitForView(LocalTopologyManagerImpl.java:558)
at org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinatorRetry(LocalTopologyManagerImpl.java:598)
at org.infinispan.topology.LocalTopologyManagerImpl.isCacheRebalancingEnabled(LocalTopologyManagerImpl.java:580)
at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:233)
at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1056)
at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:451)
at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:653)
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:598)
at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:481)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:465)
at org.infinispan.manager.impl.AbstractDelegatingEmbeddedCacheManager.getCache(AbstractDelegatingEmbeddedCacheManager.java:157)
at org.infinispan.server.infinispan.SecurityActions.lambda$startCache$4(SecurityActions.java:122)
at org.infinispan.security.Security.doPrivileged(Security.java:44)
at org.infinispan.server.infinispan.SecurityActions.doPrivileged(SecurityActions.java:69)
at org.infinispan.server.infinispan.SecurityActions.startCache(SecurityActions.java:126)
at org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:87)
at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1736)
at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1698)
... 6 more
{noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 6 months
[JBoss JIRA] (ISPN-9541) Module initialization is not thread-safe
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9541?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9541:
-------------------------------
Sprint: Sprint 9.4.0.Final
> Module initialization is not thread-safe
> ----------------------------------------
>
> Key: ISPN-9541
> URL: https://issues.jboss.org/browse/ISPN-9541
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Server
> Affects Versions: 9.4.0.CR3
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 9.4.0.Final
>
>
> In my ISPN-9127 fix I created a {{BasicComponentRegistry}} interface that represents a mostly-read-only collection of components. It has {{replaceComponent()}} method and a {{rewire()}} method for testing purposes, but it turns out modules were also relying on the ability to replace existing components in order to work.
> Replacing global components is normally safe during the {{ModuleLifecycle.cacheManagerStarting()}}, because none of the components are started yet, so when a component starts later we can still start its dependencies first. But because some modules starts some global components, e.g. by calling {{manager.getCache(name)}}, that assumption breaks.
> The {{infinispan-server-event-logger}} module is a bit more sneaky: it doesn't replace a component, instead it replaces the actual implementation of the event logger in the {{EventLogManager}} component. Events that happen before the module's {{cacheManagerStarting()}} or after {{cacheManagerStopping()}} will be silently dropped from the persistent event log.
> I am investigating making the module a factory of factories. Instead of having a monolitic {{cacheManagerStarting()}} method, it could define a set of components that it can create, and a set of components that should be started before the cache manager is "running". We probably need a way to depend on other modules as well, maybe reusing the {{@Inject}} and {{@ComponentName}} annotations.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 6 months