[JBoss JIRA] (WFLY-9855) [JDK9+] org.jboss.security.negotiation.spnego package is exported by two jars
by Darran Lofthouse (JIRA)
[ https://issues.jboss.org/browse/WFLY-9855?page=com.atlassian.jira.plugin.... ]
Darran Lofthouse reassigned WFLY-9855:
--------------------------------------
Assignee: (was: Darran Lofthouse)
> [JDK9+] org.jboss.security.negotiation.spnego package is exported by two jars
> -----------------------------------------------------------------------------
>
> Key: WFLY-9855
> URL: https://issues.jboss.org/browse/WFLY-9855
> Project: WildFly
> Issue Type: Bug
> Components: Security
> Reporter: Tomaz Cerar
> Priority: Critical
>
> Currently if you have
> jboss-negotiation-spnego-3.0.4.Final and jboss-negotiation-extras-3.0.4.Final.jar
> on your module path, jvm complains as both jars export package org.jboss.security.negotiation.spnego
> which violates the modules contract where only one module (jar) can provide single package.
> example error that jvm prints
> {noformat}
> Error: Modules jboss.negotiation.extras and jboss.negotiation.spnego export package org.jboss.security.negotiation.spnego to module wildfly.clustering.common
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months
[JBoss JIRA] (WFCORE-3876) Composite operation on filesystem-realm blocks management operations
by Darran Lofthouse (JIRA)
[ https://issues.jboss.org/browse/WFCORE-3876?page=com.atlassian.jira.plugi... ]
Darran Lofthouse reassigned WFCORE-3876:
----------------------------------------
Assignee: (was: Darran Lofthouse)
> Composite operation on filesystem-realm blocks management operations
> --------------------------------------------------------------------
>
> Key: WFCORE-3876
> URL: https://issues.jboss.org/browse/WFCORE-3876
> Project: WildFly Core
> Issue Type: Bug
> Components: Security
> Reporter: Claudio Miranda
>
> There is a problem when adding an identity and an attribute on filesystem-realm as composite operation, it blocks the operation, but also blocks some other operations, for example, while the composite operation run, the other following operations just blocks. This problem only occurs in domain mode.
> The filesystem-realm=file1 was created with no errors.
> Add an identity and a identity attribute as composite operation
> {code}
> batch
> /host=master/server=server-three/subsystem=elytron/filesystem-realm=file1:add-identity(identity=user3)
> /host=master/server=server-three/subsystem=elytron/filesystem-realm=file1:add-identity-attribute(identity=user3,name=key1,value=[val1,val11])
> run-batch
> {code}
> The following composite, also blocks the same way (for an existing identity named user3)
> {code}
> batch
> /host=master/server=server-three/subsystem=elytron/filesystem-realm=file1:add-identity-attribute(identity=user3,name=key3,value=[val3,val33])
> /host=master/server=server-three/subsystem=elytron/filesystem-realm=file1:add-identity-attribute(identity=user3,name=key4,value=[val4,val44])
> run-batch
> {code}
> The following operation just blocks waiting the above operation to finish.
> {code}
> /host=master/server=server-three/subsystem=elytron/filesystem-realm=file1:read-identity(identity=other)
> {code}
> {code}
> /profile=full-ha/subsystem=elytron/filesystem-realm=file3:add(path=file3)
> {code}
> {code}
> /profile=full-ha/subsystem=elytron/properties-realm=props1:add(users-properties={path=application-users.properties,relative-to=jboss.domain.config.dir,digest-realm-name=ApplicationRealm})
> {code}
> It also blocks write-attribute operation on other subsystems
> {code}
> /profile=full-ha/subsystem=io/worker=default:write-attribute(name=task-max-threads,value=100)
> /profile=full-ha/subsystem=datasources/data-source=ExampleDS:write-attribute(name=max-pool-size,value=12)
> {code}
> The last operation reports as a non-progressing operation
> {code}
> /host=master/core-service=management/service=management-operations:find-non-progressing-operation
> {
> "outcome" => "success",
> "result" => "500616352"
> }
> [domain@localhost:9990 /] /host=master/core-service=management/service=management-operations/active-operation=*:read-resource
> {
> "outcome" => "success",
> "result" => [
> {
> "address" => [
> ("host" => "master"),
> ("core-service" => "management"),
> ("service" => "management-operations"),
> ("active-operation" => "-886331830")
> ],
> "outcome" => "success",
> "result" => {
> "access-mechanism" => "NATIVE",
> "address" => [
> ("host" => "master"),
> ("core-service" => "management"),
> ("service" => "management-operations"),
> ("active-operation" => "*")
> ],
> "caller-thread" => "management-handler-thread - 15",
> "cancelled" => false,
> "domain-rollout" => false,
> "domain-uuid" => undefined,
> "exclusive-running-time" => -1L,
> "execution-status" => "executing",
> "operation" => "read-resource",
> "running-time" => 2341554L
> }
> },
> {
> "address" => [
> ("host" => "master"),
> ("core-service" => "management"),
> ("service" => "management-operations"),
> ("active-operation" => "-199839257")
> ],
> "outcome" => "success",
> "result" => {
> "access-mechanism" => "NATIVE",
> "address" => [],
> "caller-thread" => "management-handler-thread - 13",
> "cancelled" => false,
> "domain-rollout" => false,
> "domain-uuid" => undefined,
> "exclusive-running-time" => -1L,
> "execution-status" => "executing",
> "operation" => "composite",
> "running-time" => 37961295588L
> }
> },
> {
> "address" => [
> ("host" => "master"),
> ("core-service" => "management"),
> ("service" => "management-operations"),
> ("active-operation" => "500616352")
> ],
> "outcome" => "success",
> "result" => {
> "access-mechanism" => "NATIVE",
> "address" => [
> ("profile" => "full-ha"),
> ("subsystem" => "io"),
> ("worker" => "default")
> ],
> "caller-thread" => "management-handler-thread - 14",
> "cancelled" => false,
> "domain-rollout" => false,
> "domain-uuid" => "87a864ef-e287-4436-acd5-4842459dfc2e",
> "exclusive-running-time" => 33671893306L,
> "execution-status" => "executing",
> "operation" => "write-attribute",
> "running-time" => 33671782128L
> }
> }
> ]
> }
> {code}
> After the operation timeout, the second step of the composite operation as error.
> {code}
> The batch failed with the following error (you are remaining in the batch editing mode to have a chance to correct the error):
> WFLYCTL0062: Composite operation failed and was rolled back. Steps that failed:
> Step: step-2
> Operation: /host=master/server=server-three/subsystem=elytron/filesystem-realm=file1:add-identity-attribute(identity=user3,name=key1,value=[val1,val11])
> Failure: WFLYCTL0409: Execution of operation 'add-identity-attribute' on remote process at address '[
> ("host" => "master"),
> ("server" => "server-three")
> ]' timed out after 305000 ms while awaiting initial response; remote process has been notified to terminate operation
> {code}
> I understand that to add the identity attribute, it should run as non composite, and it works, but the problem is the blocking it does on the other operations.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months
[JBoss JIRA] (WFLY-9702) SSO Integration for Programmatic Authentication
by Darran Lofthouse (JIRA)
[ https://issues.jboss.org/browse/WFLY-9702?page=com.atlassian.jira.plugin.... ]
Darran Lofthouse reassigned WFLY-9702:
--------------------------------------
Assignee: (was: Darran Lofthouse)
> SSO Integration for Programmatic Authentication
> -----------------------------------------------
>
> Key: WFLY-9702
> URL: https://issues.jboss.org/browse/WFLY-9702
> Project: WildFly
> Issue Type: Feature Request
> Components: Clustering, Security, Web (Undertow)
> Reporter: Darran Lofthouse
> Priority: Critical
>
> At the moment the SSO integration only fully covers authentication mechanisms as they can be wrapped, we need to revisit for programmatic authentication.
> In this scenario we don't have either a wrapped mechanism or a CallbackHandler.
> Couple of options: -
> * Can we get away with pushing in some form of IdentityCache factory the mechs can obtain from the request? This may miss the additional notifications the SSO impl depends on.
> * Can we also better support listening for the notifications without the need for wrappers? This could cover both mechs and programmatic authentication?
> * Instead do we make the programmatic authenticator pluggable, i.e. push in an SSO aware impl, it can choose how to handle it's own caching and also doesn't need the notifications as it is in control of that stage of the process.
> *
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months
[JBoss JIRA] (WFLY-9212) Cannot control what principal JAASIdentityManagerImpl for SecurityContextUtil#createSubjectInfo(Principal, Object, Subject)
by Darran Lofthouse (JIRA)
[ https://issues.jboss.org/browse/WFLY-9212?page=com.atlassian.jira.plugin.... ]
Darran Lofthouse reassigned WFLY-9212:
--------------------------------------
Assignee: (was: Darran Lofthouse)
> Cannot control what principal JAASIdentityManagerImpl for SecurityContextUtil#createSubjectInfo(Principal, Object,Subject)
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-9212
> URL: https://issues.jboss.org/browse/WFLY-9212
> Project: WildFly
> Issue Type: Feature Request
> Components: Security
> Affects Versions: 10.1.0.Final
> Reporter: Scott Stark
>
> I have a custom JWT based auth method and JAAS login module that I'm testing with swarm which is using WFLY 10.1.0.Final. I am not able to install a custom Principal instance for retrieval via the JAX-RS javax.ws.rs.core.SecurityContext#getUserPrincipal() because of how the JAASIdentityManagerImpl#verifyCredential(final AccountImpl account, final Object credential) method populates the SecurityContext.SecurityContextUtil SubjectInfo.
> The following line:
> https://github.com/wildfly/wildfly/blob/c3332cec0c9bc5dc57899c2ae7ba26dd0...
> sc.getUtil().createSubjectInfo(incomingPrincipal, credential, subject);
> Is using the incomingPrincipal, which is derived from the simple AccountImpl wrapping of the incoming String id. In my call path, this is the user id before any authentication has happened as there is no originalPrincipal value on the AccountImpl, so it is not the best representation of the authenticated caller, and the wrapping Principal instance does not exist amongst the authenticated Subject#getPrincipals().
> I see this is due to a CallerPrincipal issue:
> https://issues.jboss.org/browse/WFLY-3626
> but one needs to be able to control what form of the authenticated userPrincipal that is returned by the user facing container getUserPrincipal() type of API calls.
> See the getPrincipalClass() unit test in this repo for an example of what is being tested:
> https://github.com/MicroProfileJWT/microprofile-jwt-auth-wfswarm/blob/f39...
> I can work around this by overriding the SubjectInfo by immediately after the auth mechanism has called the SecurityContext#authenticationComplete(...) using:
> // Workaround authenticated JWTPrincipal not being installed as user principal
> org.jboss.security.SecurityContext jbSC = SecurityContextAssociation.getSecurityContext();
> Subject subject = jbSC.getUtil().getSubject();
> jbSC.getUtil().createSubjectInfo(jwtPrincipal, bearerToken, subject);
> I have tried wrapping the Account passed into SecurityContext#authenticationComplete(...) and that has not worked so far, but I'll look again to see if there is something else I can override to achieve the desired behavior.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months
[JBoss JIRA] (WFLY-10042) Elytron tests fail intermittently
by Darran Lofthouse (JIRA)
[ https://issues.jboss.org/browse/WFLY-10042?page=com.atlassian.jira.plugin... ]
Darran Lofthouse reassigned WFLY-10042:
---------------------------------------
Assignee: (was: Darran Lofthouse)
> Elytron tests fail intermittently
> ---------------------------------
>
> Key: WFLY-10042
> URL: https://issues.jboss.org/browse/WFLY-10042
> Project: WildFly
> Issue Type: Bug
> Components: Security
> Reporter: Stuart Douglas
>
> The JMX MBean server service does not have correct dependencies set on the security domain, and as a result unregistering the Arquillian MBean can fail on reload.
> If this happen all subsequent tests will fail as the Arquillian service will not start correctly.
> An example run is at: https://ci.wildfly.org/viewLog.html?buildId=89151&buildTypeId=WFPR&tab=bu...
> {code}
> 2018-02-12 09:31:55,112 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 31) WFLYUT0022: Unregistered web context: '/chained-principal-transformer-transform-transformed' from server 'default-server'
> 2018-02-12 09:31:55,118 INFO [org.jboss.as.server.deployment] (MSC service thread 1-1) WFLYSRV0028: Stopped deployment chained-principal-transformer-transform-transformed.war (runtime-name: chained-principal-transformer-transform-transformed.war) in 6ms
> 2018-02-12 09:31:55,129 INFO [org.jboss.as.repository] (management-handler-thread - 1) WFLYDR0002: Content removed from location /store/work/tc-work/9ccd5e119c4a65d0/testsuite/integration/elytron/target/wildfly/standalone/data/content/7b/30341090e956f73f2066f8e357380151a337e8/content
> 2018-02-12 09:31:55,129 INFO [org.jboss.as.server] (management-handler-thread - 1) WFLYSRV0009: Undeployed "chained-principal-transformer-transform-transformed.war" (runtime-name: "chained-principal-transformer-transform-transformed.war")
> 2018-02-12 09:31:55,563 ERROR [org.jboss.as.arquillian] (MSC service thread 1-8) Cannot stop Arquillian Test Runner: java.lang.IllegalStateException
> at org.jboss.msc.value.InjectedValue.getValue(InjectedValue.java:47)
> at org.jboss.as.controller.access.management.ManagementSecurityIdentitySupplier.get(ManagementSecurityIdentitySupplier.java:60)
> at org.jboss.as.controller.access.management.ManagementSecurityIdentitySupplier.get(ManagementSecurityIdentitySupplier.java:39)
> at org.jboss.as.jmx.PluggableMBeanServerImpl.log(PluggableMBeanServerImpl.java:1180)
> at org.jboss.as.jmx.MBeanServerAuditLogRecordFormatter.log(MBeanServerAuditLogRecordFormatter.java:331)
> at org.jboss.as.jmx.MBeanServerAuditLogRecordFormatter.isRegistered(MBeanServerAuditLogRecordFormatter.java:176)
> at org.jboss.as.jmx.PluggableMBeanServerImpl.isRegistered(PluggableMBeanServerImpl.java:784)
> at org.jboss.arquillian.protocol.jmx.JMXTestRunner.unregisterMBean(JMXTestRunner.java:109)
> at org.jboss.as.arquillian.service.ArquillianService.stop(ArquillianService.java:96)
> at org.jboss.msc.service.ServiceControllerImpl$StopTask.stopService(ServiceControllerImpl.java:1767)
> at org.jboss.msc.service.ServiceControllerImpl$StopTask.execute(ServiceControllerImpl.java:1740)
> at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1527)
> at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1979)
> at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1481)
> at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1360)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months
[JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments
by tommaso borgato (JIRA)
[ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin... ]
tommaso borgato closed WFLY-10755.
----------------------------------
Release Notes Text: the job run on a segment with UDP problems; going to run the tcp stack counterpart
Resolution: Explained
> ISPN000208: No live owners found for segments
> ---------------------------------------------
>
> Key: WFLY-10755
> URL: https://issues.jboss.org/browse/WFLY-10755
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 14.0.0.CR1
> Reporter: tommaso borgato
> Assignee: Paul Ferraro
>
> The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
> The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
> {noformat}
> <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
> <transport lock-timeout="60000"/>
> <distributed-cache owners="2" name="dist">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </distributed-cache>
> <replicated-cache name="repl">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </replicated-cache>
> <invalidation-cache name="offload">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
> <table prefix="s">
> <id-column name="id" type="VARCHAR(255)"/>
> <data-column name="datum" type="BYTEA"/>
> <timestamp-column name="version" type="BIGINT"/>
> </table>
> </jdbc-store>
> </invalidation-cache>
> </cache-container>
> {noformat}
> h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> The error was observed 2 times on node dev212;
> The first time, right after Node dev214 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
> [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
> [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
> [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] at java.lang.Thread.run(Thread.java:748)
> [JBossINF]
> ...
> [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
> {noformat}
> and, the second time, right after Node dev215 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
> {noformat}
> bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange
> This run already used modified jgroups time-outs:
> {noformat}
> <protocol type="FD_ALL">
> <property name="timeout">10000</property>
> <property name="interval">2000</property>
> <property name="timeout_check_interval">1000</property>
> </protocol>
> <protocol type="VERIFY_SUSPECT">
> <property name="timeout">1000</property>
> </protocol>
> {noformat}
> h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified.
> h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
> {noformat}
> <FD_ALL timeout="60000"
> interval="15000"
> timeout_check_interval="5000"
> />
> {noformat}
> In this run, the error is observed on node dev212:
> {noformat}
> [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
> {noformat}
> but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following:
> {noformat}
> [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
> 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
> [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
> [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
> [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> ...
> {noformat}
> h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> We made an attempt with the following setting on the same segment (perf27-eap) (the idea being allowing some longer time-out to verify suspected nodes):
> {noformat}
> <protocol type="FD_ALL">
> <property name="timeout">5000</property>
> <property name="interval">1000</property>
> <property name="timeout_check_interval">2000</property>
> </protocol>
> <protocol type="VERIFY_SUSPECT">
> <property name="timeout">5000</property>
> </protocol>
> {noformat}
> we didn't observe the error, but we observed the following {{*ERROR*}} on node {{*[dev215|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}}:
> {noformat}
> [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214
> [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93)
> [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585)
> [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450)
> [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334)
> [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313)
> [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87)
> [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731)
> [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47)
> [JBossINF] at java.lang.Thread.run(Thread.java:748)
> [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214
> [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
> [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] ... 1 more
> [JBossINF]
> [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214].
> [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214].
> [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214].
> [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214].
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 9 months