]
James Perkins reassigned WFCORE-5275:
-------------------------------------
Assignee: James Perkins
Socket handler is holding server to start
-----------------------------------------
Key: WFCORE-5275
URL:
https://issues.redhat.com/browse/WFCORE-5275
Project: WildFly Core
Issue Type: Bug
Components: Logging
Reporter: James Perkins
Assignee: James Perkins
Priority: Major
CU is on EAP 7.3 on OCP and has configured the socket-handler as below:
<async-handler name="async-logstash-handler">
<queue-length
value="${env.LHT_LOGGING_BUFFER_SIZE:1048576}"/>
<overflow-action value="discard"/>
<subhandlers>
<handler name="logstash-handler"/>
</subhandlers>
</async-handler>
<console-handler name="CONSOLE">
<formatter>
<named-formatter name="COLOR-PATTERN"/>
</formatter>
</console-handler>
<socket-handler name="logstash-handler" block-on-reconnect="true"
outbound-socket-binding-ref="logstash">
<named-formatter name="logstash-json-formatter"/>
</socket-handler>
The formatter:
<formatter name="logstash-json-formatter">
<json-formatter>
<exception-output-type value="formatted"/>
<key-overrides host-name="host"
logger-name="category" message="log-message"
thread-id="thread" thread-name="thread-name"
timestamp="@timestamp"/>
<meta-data>
<property name="@version" value="1"/>
<property name="lhtappid"
value="2038"/>
<property name="namespace"
value="lht-platform-sample-applications-nonprod"/>
<property name="stage" value="dev"/>
</meta-data>
</json-formatter>
</formatter>
The socket binding :
<outbound-socket-binding name="logstash">
<remote-destination
host="logstash.lhtlogging-nonprod.svc.cluster.local" port="8080"/>
</outbound-socket-binding>
When based on network policy on OCP the traffic on
logstash.lhtlogging-nonprod.svc.cluster.local at 80080 will be enabled. This configuration
works well, however if we disable the traffic the server never get started and throws the
following error:
09:47:37,297 INFO [org.jboss.as.connector.deployment] (MSC service thread 1-3)
WFLYJCA0002: Bound JCA ConnectionFactory [java:/JmsXA]
09:47:37,297 INFO [org.jboss.as.connector.deployment] (MSC service thread 1-2)
WFLYJCA0118: Binding connection factory named java:/JmsXA to alias
java:jboss/DefaultJMSConnectionFactory
09:49:06,466 ERROR [org.jboss.as.controller.management-operation] (Controller Boot
Thread) WFLYCTL0348: Timeout after [90] seconds waiting for service container stability.
Operation will roll back. Step that first updated the service container was 'add'
at address '[
("core-service" => "management"),
("management-interface" => "http-interface")
]'
Failed to invoke setter setOverflowAction with value null
.java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at
org.jboss.logmanager.config.AbstractPropertyConfiguration$1.rollback(AbstractPropertyConfiguration.java:244)
at
org.jboss.logmanager.config.LogContextConfigurationImpl.doForget(LogContextConfigurationImpl.java:355)
at
org.jboss.logmanager.config.LogContextConfigurationImpl.forget(LogContextConfigurationImpl.java:319)
at
org.jboss.as.logging@10.1.17.Final-redhat-00001//org.jboss.as.logging.logmanager.ConfigurationPersistence.forget(ConfigurationPersistence.java:341)
at
org.jboss.as.logging@10.1.17.Final-redhat-00001//org.jboss.as.logging.logmanager.ConfigurationPersistence.rollback(ConfigurationPersistence.java:349)
at
org.jboss.as.logging@10.1.17.Final-redhat-00001//org.jboss.as.logging.LoggingOperations$CommitOperationStepHandler$1.handleResult(LoggingOperations.java:123)
at
org.jboss.as.controller@10.1.17.Final-redhat-00001//org.jboss.as.controller.AbstractOperationContext$Step.invokeResultHandler(AbstractOperationContext.java:1533)
at
org.jboss.as.controller@10.1.17.Final-redhat-00001//org.jboss.as.controller.AbstractOperationContext$Step.handleResult(AbstractOperationContext.java:1515)
at
org.jboss.as.controller@10.1.17.Final-redhat-00001//org.jboss.as.controller.AbstractOperationContext$Step.finalizeInternal(AbstractOperationContext.java:1472)
at
org.jboss.as.controller@10.1.17.Final-redhat-00001//org.jboss.as.controller.AbstractOperationContext$Step.finalizeStep(AbstractOperationContext.java:1455)
at
org.jboss.as.controller@10.1.17.Final-redhat-00001//org.jboss.as.controller.AbstractOperationContext$Step.access$400(AbstractOperationContext.java:1319)
at
org.jboss.as.controller@10.1.17.Final-redhat-00001//org.jboss.as.controller.AbstractOperationContext.executeResultHandlerPhase(AbstractOperationContext.java:876)
at
org.jboss.as.controller@10.1.17.Final-redhat-00001//org.jboss.as.controller.AbstractOperationContext.processStages(AbstractOperationContext.java:726)
at
org.jboss.as.controller@10.1.17.Final-redhat-00001//org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:467)
at
org.jboss.as.controller@10.1.17.Final-redhat-00001//org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1413)
at
org.jboss.as.controller@10.1.17.Final-redhat-00001//org.jboss.as.controller.ModelControllerImpl.boot(ModelControllerImpl.java:527)
at
org.jboss.as.controller@10.1.17.Final-redhat-00001//org.jboss.as.controller.AbstractControllerService.boot(AbstractControllerService.java:481)
at
org.jboss.as.controller@10.1.17.Final-redhat-00001//org.jboss.as.controller.AbstractControllerService.boot(AbstractControllerService.java:443)
at
org.jboss.as.server@10.1.17.Final-redhat-00001//org.jboss.as.server.ServerService.boot(ServerService.java:437)
at
org.jboss.as.server@10.1.17.Final-redhat-00001//org.jboss.as.server.ServerService.boot(ServerService.java:396)
at
org.jboss.as.controller@10.1.17.Final-redhat-00001//org.jboss.as.controller.AbstractControllerService$1.run(AbstractControllerService.java:383)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.NullPointerException: overflowAction is null
at org.jboss.logmanager.handlers.AsyncHandler.setOverflowAction(AsyncHandler.java:117)
... 26 more
When check the thread dumps from the same time, we find that there is a long running
thread, which is also mentioned in earlier update:
"MSC service thread 1-2" #18 prio=5 os_prio=0 cpu=616.95ms elapsed=120.44s
tid=0x000055790ece0000 nid=0x78b runnable [0x00007f37a1ee8000]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketConnect(java.base(a)11.0.9/Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(java.base@11.0.9/AbstractPlainSocketImpl.java:399)
- locked <0x00000000d64c8470> (a java.net.SocksSocketImpl)
at
java.net.AbstractPlainSocketImpl.connectToAddress(java.base@11.0.9/AbstractPlainSocketImpl.java:242)
at
java.net.AbstractPlainSocketImpl.connect(java.base@11.0.9/AbstractPlainSocketImpl.java:224)
at java.net.SocksSocketImpl.connect(java.base@11.0.9/SocksSocketImpl.java:403)
at java.net.Socket.connect(java.base@11.0.9/Socket.java:609)
at java.net.Socket.connect(java.base@11.0.9/Socket.java:558)
at
org.jboss.as.network.OutboundSocketBinding.connect(OutboundSocketBinding.java:125)
at
org.jboss.as.logging.handlers.SocketHandlerResourceDefinition$WildFlyClientSocketFactory.createSocket(SocketHandlerResourceDefinition.java:441)
at
org.jboss.logmanager.handlers.TcpOutputStream.<init>(TcpOutputStream.java:165)
at
org.jboss.logmanager.handlers.SocketHandler.createOutputStream(SocketHandler.java:453)
at org.jboss.logmanager.handlers.SocketHandler.initialize(SocketHandler.java:422)
at org.jboss.logmanager.handlers.SocketHandler.doPublish(SocketHandler.java:211)
- locked <0x00000000f49b7a10> (a java.lang.Object)
at org.jboss.logmanager.ExtHandler.publish(ExtHandler.java:77)
at
org.jboss.logmanager.handlers.DelayedHandler.publishToChildren(DelayedHandler.java:213)
at org.jboss.logmanager.handlers.DelayedHandler.activate(DelayedHandler.java:205)
- locked <0x00000000f49b55f0> (a org.jboss.logmanager.handlers.DelayedHandler)
at org.jboss.logmanager.handlers.DelayedHandler.setHandlers(DelayedHandler.java:124)
at
org.jboss.as.logging.handlers.SocketHandlerResourceDefinition$1$2$1.start(SocketHandlerResourceDefinition.java:252)
- locked <0x00000000f49b55a0> (a
org.jboss.as.logging.handlers.SocketHandlerResourceDefinition$1$2$1)
at
org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1739)
at
org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1701)
at
org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1559)
at
org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
at
org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
at
org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1363)
at java.lang.Thread.run(java.base@11.0.9/Thread.java:834)
Which at times blocks the boot controll thread:
"Controller Boot Thread" #23 prio=5 os_prio=0 cpu=564.32ms elapsed=118.63s
tid=0x000055790e6e4800 nid=0x7a2 waiting for monitor entry [0x00007f37a13f2000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.jboss.logmanager.handlers.DelayedHandler.close(DelayedHandler.java:82)
- waiting to lock <0x00000000f49b55f0> (a
org.jboss.logmanager.handlers.DelayedHandler)
at
org.jboss.logmanager.config.HandlerConfigurationImpl$9.rollback(HandlerConfigurationImpl.java:413)
at
org.jboss.logmanager.config.LogContextConfigurationImpl.doForget(LogContextConfigurationImpl.java:355)
at
org.jboss.logmanager.config.LogContextConfigurationImpl.forget(LogContextConfigurationImpl.java:319)
at
org.jboss.as.logging.logmanager.ConfigurationPersistence.forget(ConfigurationPersistence.java:341)
- locked <0x00000000f511e1d8> (a java.lang.Object)
at
org.jboss.as.logging.logmanager.ConfigurationPersistence.rollback(ConfigurationPersistence.java:349)
at
org.jboss.as.logging.LoggingOperations$CommitOperationStepHandler$1.handleResult(LoggingOperations.java:123)
at
org.jboss.as.controller.AbstractOperationContext$Step.invokeResultHandler(AbstractOperationContext.java:1533)
at
org.jboss.as.controller.AbstractOperationContext$Step.handleResult(AbstractOperationContext.java:1515)
at
org.jboss.as.controller.AbstractOperationContext$Step.finalizeInternal(AbstractOperationContext.java:1472)
at
org.jboss.as.controller.AbstractOperationContext$Step.finalizeStep(AbstractOperationContext.java:1455)
at
org.jboss.as.controller.AbstractOperationContext$Step.access$400(AbstractOperationContext.java:1319)
at
org.jboss.as.controller.AbstractOperationContext.executeResultHandlerPhase(AbstractOperationContext.java:876)
at
org.jboss.as.controller.AbstractOperationContext.processStages(AbstractOperationContext.java:726)
at
org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:467)
at
org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1413)
at org.jboss.as.controller.ModelControllerImpl.boot(ModelControllerImpl.java:527)
at
org.jboss.as.controller.AbstractControllerService.boot(AbstractControllerService.java:481)
at
org.jboss.as.controller.AbstractControllerService.boot(AbstractControllerService.java:443)
at org.jboss.as.server.ServerService.boot(ServerService.java:437)
at org.jboss.as.server.ServerService.boot(ServerService.java:396)
at
org.jboss.as.controller.AbstractControllerService$1.run(AbstractControllerService.java:383)
at java.lang.Thread.run(java.base@11.0.9/Thread.java:834)~~~
I don't find any issues with the configuration. I see in the code the socket is
trying to connect without and timeout and the same is missing from the configuration.
Either the server where the socket is listening should be in start condition to avoid the
failure from boot or we should avoid the server from a failure to boot if no processes is
running on the socket in socket-handler.
The issue is rarely reproducible on local machine.