[JBoss JIRA] (ISPN-9544) Asymmetric ForkChannels break cache topology management
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9544?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9544:
-------------------------------
Summary: Asymmetric ForkChannels break cache topology management (was: Fork reuse breaks cache topology management)
> Asymmetric ForkChannels break cache topology management
> -------------------------------------------------------
>
> Key: ISPN-9544
> URL: https://issues.jboss.org/browse/ISPN-9544
> Project: Infinispan
> Issue Type: Bug
> Components: Server
> Affects Versions: 9.4.0.CR3
> Reporter: Dan Berindei
> Assignee: Dan Berindei
>
> Before {{FORK}} was introduced, {{ClsuterTopologyManagerImpl}} and {{LocalTopologyManagerImpl}} assumed that the coordinator would always reply to other members' requests. After the introduction of {{FORK}} we added some hacks to work around the fact that the coordinator may not yet have a {{ForkChannel}} with our ID running **yet**, but we still expect the {{FORK}} setup to be symmetric after a reasonable amount of time.
> Stopping a {{FORK}} and starting it back without restarting the underlying channel also doesn't work, because a {{FORK}} start/stop does not trigger a new view. When a node sends a request to the coordinator and receives back a {{CacheNotFoundResponse}}, it assumes that it will also receive a new view, but if the {{CacheNotFoundResponse}} was a consequence of stopping a single {{DefaultCacheManager}}/{{ForkChannel}}, that view will never arrive.
> We don't restart individual cache managers in our tests, but the spark connector test suite does it, and it sometimes fails because of it:
> {noformat}
> 2018-09-26 21:18:03,035 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel cluster: [server2|6] (3) [server2, server0, server1]
> 2018-09-26 21:18:05,778 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) server1 sending request 37 to server2: CacheTopologyControlCommand{cache=org.infinispan.spark.suites.DistributedSuite, type=POLICY_GET_STATUS, sender=server1, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=6}
> 2018-09-26 21:18:05,795 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (jgroups-4,server1) server1 received response for request 37 from server2: CacheNotFoundResponse
> 2018-09-26 21:18:05,798 TRACE [org.infinispan.topology.LocalTopologyManagerImpl] (MSC service thread 1-5) Coordinator left the cluster while querying rebalancing status, retrying
> 2018-09-26 21:18:05,823 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) server1 sending request 41 to server2: CacheTopologyControlCommand{cache=org.infinispan.spark.suites.DistributedSuite, type=POLICY_GET_STATUS, sender=server1, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=6}
> 2018-09-26 21:18:05,841 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (jgroups-19,server1) server1 received response for request 41 from server2: CacheNotFoundResponse
> 2018-09-26 21:18:05,846 TRACE [org.infinispan.topology.LocalTopologyManagerImpl] (MSC service thread 1-5) Coordinator left the cluster while querying rebalancing status, retrying
> 2018-09-26 21:18:05,871 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) Waiting for transaction data for view 7, current view is 6
> 2018-09-26 21:19:05,779 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-5) MSC000001: Failed to start service jboss.datagrid-infinispan.clustered."org.infinispan.spark.suites.DistributedSuite": org.jboss.msc.service.StartException in service jboss.datagrid-infinispan.clustered."org.infinispan.spark.suites.DistributedSuite": Failed to start service
> at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1728)
> at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1556)
> at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
> at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1985)
> at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1487)
> at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1364)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000451: Timed out waiting for view 7, current view is 6
> at org.infinispan.topology.LocalTopologyManagerImpl.waitForView(LocalTopologyManagerImpl.java:558)
> at org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinatorRetry(LocalTopologyManagerImpl.java:598)
> at org.infinispan.topology.LocalTopologyManagerImpl.isCacheRebalancingEnabled(LocalTopologyManagerImpl.java:580)
> at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:233)
> at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1056)
> at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:451)
> at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:653)
> at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:598)
> at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:481)
> at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:465)
> at org.infinispan.manager.impl.AbstractDelegatingEmbeddedCacheManager.getCache(AbstractDelegatingEmbeddedCacheManager.java:157)
> at org.infinispan.server.infinispan.SecurityActions.lambda$startCache$4(SecurityActions.java:122)
> at org.infinispan.security.Security.doPrivileged(Security.java:44)
> at org.infinispan.server.infinispan.SecurityActions.doPrivileged(SecurityActions.java:69)
> at org.infinispan.server.infinispan.SecurityActions.startCache(SecurityActions.java:126)
> at org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:87)
> at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1736)
> at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1698)
> ... 6 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
5 years, 7 months
[JBoss JIRA] (ISPN-9545) Infinispan Could not read or write
by Mingjun Liu (JIRA)
Mingjun Liu created ISPN-9545:
---------------------------------
Summary: Infinispan Could not read or write
Key: ISPN-9545
URL: https://issues.jboss.org/browse/ISPN-9545
Project: Infinispan
Issue Type: Bug
Affects Versions: 9.1.4.Final
Reporter: Mingjun Liu
2 datacenter, each with 3 keycloak nodes & 3 infinipsan nodes
versions:
keycloak - 3.4.3.Final
Infinispan - 9.1.4
Observations
1. lots of messages from keycloak on Timeout to read/write messages to infinipsan, including jgroups timeout messages
2. connect infinispan via ispn-cli.sh script, execute *container clustered*, then the command *ls* returning no output but error message
Recovery procedure
Restart whole infinispan cluster from one dc, then another
After infinispan restart keycloak service come back to normal
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
5 years, 7 months
[JBoss JIRA] (ISPN-9544) Fork reuse breaks cache topology management
by Dan Berindei (JIRA)
Dan Berindei created ISPN-9544:
----------------------------------
Summary: Fork reuse breaks cache topology management
Key: ISPN-9544
URL: https://issues.jboss.org/browse/ISPN-9544
Project: Infinispan
Issue Type: Bug
Components: Server
Affects Versions: 9.4.0.CR3
Reporter: Dan Berindei
Assignee: Dan Berindei
Before {{FORK}} was introduced, {{ClsuterTopologyManagerImpl}} and {{LocalTopologyManagerImpl}} assumed that the coordinator would always reply to other members' requests. After the introduction of {{FORK}} we added some hacks to work around the fact that the coordinator may not yet have a {{ForkChannel}} with our ID running **yet**, but we still expect the {{FORK}} setup to be symmetric after a reasonable amount of time.
Stopping a {{FORK}} and starting it back without restarting the underlying channel also doesn't work, because a {{FORK}} start/stop does not trigger a new view. When a node sends a request to the coordinator and receives back a {{CacheNotFoundResponse}}, it assumes that it will also receive a new view, but if the {{CacheNotFoundResponse}} was a consequence of stopping a single {{DefaultCacheManager}}/{{ForkChannel}}, that view will never arrive.
We don't restart individual cache managers in our tests, but the spark connector test suite does it, and it sometimes fails because of it:
{noformat}
2018-09-26 21:18:03,035 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel cluster: [server2|6] (3) [server2, server0, server1]
2018-09-26 21:18:05,778 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) server1 sending request 37 to server2: CacheTopologyControlCommand{cache=org.infinispan.spark.suites.DistributedSuite, type=POLICY_GET_STATUS, sender=server1, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=6}
2018-09-26 21:18:05,795 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (jgroups-4,server1) server1 received response for request 37 from server2: CacheNotFoundResponse
2018-09-26 21:18:05,798 TRACE [org.infinispan.topology.LocalTopologyManagerImpl] (MSC service thread 1-5) Coordinator left the cluster while querying rebalancing status, retrying
2018-09-26 21:18:05,823 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) server1 sending request 41 to server2: CacheTopologyControlCommand{cache=org.infinispan.spark.suites.DistributedSuite, type=POLICY_GET_STATUS, sender=server1, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=6}
2018-09-26 21:18:05,841 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (jgroups-19,server1) server1 received response for request 41 from server2: CacheNotFoundResponse
2018-09-26 21:18:05,846 TRACE [org.infinispan.topology.LocalTopologyManagerImpl] (MSC service thread 1-5) Coordinator left the cluster while querying rebalancing status, retrying
2018-09-26 21:18:05,871 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) Waiting for transaction data for view 7, current view is 6
2018-09-26 21:19:05,779 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-5) MSC000001: Failed to start service jboss.datagrid-infinispan.clustered."org.infinispan.spark.suites.DistributedSuite": org.jboss.msc.service.StartException in service jboss.datagrid-infinispan.clustered."org.infinispan.spark.suites.DistributedSuite": Failed to start service
at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1728)
at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1556)
at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1985)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1487)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1364)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000451: Timed out waiting for view 7, current view is 6
at org.infinispan.topology.LocalTopologyManagerImpl.waitForView(LocalTopologyManagerImpl.java:558)
at org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinatorRetry(LocalTopologyManagerImpl.java:598)
at org.infinispan.topology.LocalTopologyManagerImpl.isCacheRebalancingEnabled(LocalTopologyManagerImpl.java:580)
at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:233)
at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1056)
at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:451)
at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:653)
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:598)
at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:481)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:465)
at org.infinispan.manager.impl.AbstractDelegatingEmbeddedCacheManager.getCache(AbstractDelegatingEmbeddedCacheManager.java:157)
at org.infinispan.server.infinispan.SecurityActions.lambda$startCache$4(SecurityActions.java:122)
at org.infinispan.security.Security.doPrivileged(Security.java:44)
at org.infinispan.server.infinispan.SecurityActions.doPrivileged(SecurityActions.java:69)
at org.infinispan.server.infinispan.SecurityActions.startCache(SecurityActions.java:126)
at org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:87)
at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1736)
at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1698)
... 6 more
{noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
5 years, 7 months
[JBoss JIRA] (ISPN-9541) Module initialization is not thread-safe
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-9541?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-9541:
-------------------------------
Sprint: Sprint 9.4.0.Final
> Module initialization is not thread-safe
> ----------------------------------------
>
> Key: ISPN-9541
> URL: https://issues.jboss.org/browse/ISPN-9541
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Server
> Affects Versions: 9.4.0.CR3
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 9.4.0.Final
>
>
> In my ISPN-9127 fix I created a {{BasicComponentRegistry}} interface that represents a mostly-read-only collection of components. It has {{replaceComponent()}} method and a {{rewire()}} method for testing purposes, but it turns out modules were also relying on the ability to replace existing components in order to work.
> Replacing global components is normally safe during the {{ModuleLifecycle.cacheManagerStarting()}}, because none of the components are started yet, so when a component starts later we can still start its dependencies first. But because some modules starts some global components, e.g. by calling {{manager.getCache(name)}}, that assumption breaks.
> The {{infinispan-server-event-logger}} module is a bit more sneaky: it doesn't replace a component, instead it replaces the actual implementation of the event logger in the {{EventLogManager}} component. Events that happen before the module's {{cacheManagerStarting()}} or after {{cacheManagerStopping()}} will be silently dropped from the persistent event log.
> I am investigating making the module a factory of factories. Instead of having a monolitic {{cacheManagerStarting()}} method, it could define a set of components that it can create, and a set of components that should be started before the cache manager is "running". We probably need a way to depend on other modules as well, maybe reusing the {{@Inject}} and {{@ComponentName}} annotations.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
5 years, 7 months
[JBoss JIRA] (HRJS-79) Registering listener with undefined listenerId does nothing
by Anna Manukyan (JIRA)
Anna Manukyan created HRJS-79:
---------------------------------
Summary: Registering listener with undefined listenerId does nothing
Key: HRJS-79
URL: https://issues.jboss.org/browse/HRJS-79
Project: Infinispan Javascript client
Issue Type: Bug
Reporter: Anna Manukyan
I was trying the following scenario: the case when the listenerId is passed while adding new listeners.
I was creating 3 listeners for create,modify and remove events. While adding listeners for modify and remove events I was passing the listenerId, but for modify event the listenerId was some nonexistent string.
I was curious what will happen. From the functional side nothing happened, so the modify event is not fired and from the logs I can see the following:
{code}
[2018-09-25T22:24:55.845] [DEBUG] client - Invoke addListener(msgId=464,event=create,listenerId=listener_465,opts=undefined) remotely
[2018-09-25T22:24:55.845] [TRACE] encoder - Encode operation with topology id 0
[2018-09-25T22:24:55.846] [TRACE] listener - Create listener emitter for connection 127.0.0.1:11222@conn_15 and listener with listenerId=listener_465
[2018-09-25T22:24:55.846] [TRACE] io_7 - Write buffer(msgId=464) to 127.0.0.1:11222@conn_15: A0D0031D2500000300010D00010D000C6C697374656E65725F343635000000000F
[2018-09-25T22:24:55.847] [TRACE] decoder - Read header(msgId=464): opCode=38, status=0, hasNewTopology=0
[2018-09-25T22:24:55.847] [TRACE] decoder - Call decode for request(msgId=464)
[2018-09-25T22:24:55.847] [TRACE] io_7_conn_15 - Complete success for request(msgId=464) with true
[2018-09-25T22:24:55.847] [TRACE] io_7_conn_15 - After decoding request(msgId=464), buffer size is 6, and offset 6
[2018-09-25T22:24:55.847] [DEBUG] client - Invoke addListener(msgId=466,event=modify,opts={"listenerId":"blblbl"}) locally
{color:red}[2018-09-25T22:24:55.848] [TRACE] listener - Create listener emitter for connection undefined and listener with listenerId=blblbl{color}
[2018-09-25T22:24:55.848] [DEBUG] client - Invoke addListener(msgId=467,event=remove,opts={"listenerId":"listener_465"}) locally
{code}
So the first line creates "create" event listener remotely. For the "modify" and "delete" events the "addLocalListener()" function is called. I am worried about the line marked in red, there the connection is passed as {color:red}undefined{color}.
I have found out that the connection parameter is not passed to protocol's addListener function when it is called from addLocalListener() function, but couldn't find a way to fix that.
Should it be that way?
I know that this kind of scenario is hardly possible but just in case the developer did some mistake, perhaps he/she may spend much time on finding why the event is not fired.
The code for generating this log is given here:
{code}
...code in the test...
it('fails when trying to attach to non-existent listener', function(done) {
client.then(function (client) {
var clientAddListenerCreate = client.addListener(
'create', function(key) { console.log('[Event] Created key: ' + key); });
var clientAddListeners = clientAddListenerCreate.then(
function(listenerId) {
// Multiple callbacks can be associated with a single client-side listener.
// This is achieved by registering listeners with the same listener id
// as shown in the example below.
var clientAddListenerModify = client.addListener(
'modify', function(key) { console.log('[Event] Modified key: ' + key); },
{listenerId: 'blblbl'});
var clientAddListenerRemove = client.addListener(
'remove', function(key) { console.log('[Event] Removed key: ' + key); },
{listenerId: listenerId});
return Promise.all([clientAddListenerModify, clientAddListenerRemove]);
});
var clientCreate = clientAddListeners.then(
function() { return client.putIfAbsent('eventful', 'v0'); });
var clientModify = clientCreate.then(
function() { return client.replace('eventful', 'v1'); });
var clientRemove = clientModify.then(
function() { return client.remove('eventful'); });
var clientRemoveListener =
Promise.all([clientAddListenerCreate, clientRemove]).then(
function(values) {
var listenerId = values[0];
return client.removeListener(listenerId);
});
}).catch(t.failed(done)).finally(done);
});
........
{code}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
5 years, 7 months
[JBoss JIRA] (HRJS-78) Adding listener with unknown type succeeds
by Anna Manukyan (JIRA)
Anna Manukyan created HRJS-78:
---------------------------------
Summary: Adding listener with unknown type succeeds
Key: HRJS-78
URL: https://issues.jboss.org/browse/HRJS-78
Project: Infinispan Javascript client
Issue Type: Bug
Reporter: Anna Manukyan
When adding a new listener with unknown event type, nothing happens. From the logs I can see that the event was added successfully. See the logs below:
{code}
[2018-09-25T22:24:55.824] [DEBUG] client - Invoke addListener(msgId=460,event=barev,listenerId=listener_461,opts=undefined) remotely
[2018-09-25T22:24:55.824] [TRACE] encoder - Encode operation with topology id 0
[2018-09-25T22:24:55.825] [TRACE] listener - Create listener emitter for connection 127.0.0.1:11222@conn_15 and listener with listenerId=listener_461
[2018-09-25T22:24:55.825] [TRACE] io_7 - Write buffer(msgId=460) to 127.0.0.1:11222@conn_15: A0CC031D2500000300010D00010D000C6C697374656E65725F343631000000000F
[2018-09-25T22:24:55.827] [TRACE] decoder - Read header(msgId=460): opCode=38, status=0, hasNewTopology=0
[2018-09-25T22:24:55.827] [TRACE] decoder - Call decode for request(msgId=460)
[2018-09-25T22:24:55.827] [TRACE] io_7_conn_15 - Complete success for request(msgId=460) with true
[2018-09-25T22:24:55.827] [TRACE] io_7_conn_15 - After decoding request(msgId=460), buffer size is 6, and offset 6
{code}
The code for this logs is given here:
{code}
... code from spec...
it('fails when wrong event name is passed', function(done) {
client.then(t.on('barev', t.expectEvent('wrongNameCreate', done, true, 'value')))
.then(t.assert(t.putIfAbsent('wrongNameCreate', 'value'), t.toBeTruthy))
.catch(t.failed(done)).finally(done);
});
...
{code}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
5 years, 7 months