[JBoss JIRA] (WFLY-10755) ISPN000208: No live owners found for segments
by tommaso borgato (JIRA)
[ https://issues.jboss.org/browse/WFLY-10755?page=com.atlassian.jira.plugin... ]
tommaso borgato closed WFLY-10755.
----------------------------------
Release Notes Text: the job run on a segment with UDP problems; going to run the tcp stack counterpart
Resolution: Explained
> ISPN000208: No live owners found for segments
> ---------------------------------------------
>
> Key: WFLY-10755
> URL: https://issues.jboss.org/browse/WFLY-10755
> Project: WildFly
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 14.0.0.CR1
> Reporter: tommaso borgato
> Assignee: Paul Ferraro
>
> The error {{*"ISPN000208: No live owners found for segments"*}} was observed in scenario [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4|https://jen...].
> The scenario is composed of a 4 nodes cluster configured with an invalidation cache backed by a PostreSQL database:
> {noformat}
> <cache-container name="web" default-cache="repl" module="org.wildfly.clustering.web.infinispan">
> <transport lock-timeout="60000"/>
> <distributed-cache owners="2" name="dist">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </distributed-cache>
> <replicated-cache name="repl">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <file-store/>
> </replicated-cache>
> <invalidation-cache name="offload">
> <locking isolation="REPEATABLE_READ"/>
> <transaction mode="BATCH"/>
> <jdbc-store data-source="testDS" fetch-state="false" passivation="false" purge="false" shared="true" dialect="POSTGRES">
> <table prefix="s">
> <id-column name="id" type="VARCHAR(255)"/>
> <data-column name="datum" type="BYTEA"/>
> <timestamp-column name="version" type="BIGINT"/>
> </table>
> </jdbc-store>
> </invalidation-cache>
> </cache-container>
> {noformat}
> h2. First run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4 run 20|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> The error was observed 2 times on node dev212;
> The first time, right after Node dev214 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:08:34,196 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN000094: Received new cluster view for channel ejb: [dev212|8] (3) [dev212, dev213, dev215]
> [JBossINF] [0m[0m09:08:34,197 INFO [org.infinispan.CLUSTER] (thread-22,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[33m09:08:34,362 WARN [org.infinispan.interceptors.impl.InvalidationInterceptor] (timeout-thread--p10-t1) ISPN000268: Unable to broadcast evicts as a part of the prepare phase. Rolling back.: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 33 from dev215
> [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
> [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] at java.lang.Thread.run(Thread.java:748)
> [JBossINF]
> ...
> [JBossINF] [0m[31m09:08:52,772 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {4 7-9 12-13 30-31 37 49 59 76-77 88-89 92 118-120 156-157 196 205 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev214]
> {noformat}
> and, the second time, right after Node dev215 left the cluster:
> {noformat}
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev213 left the cluster
> [JBossINF] [0m[0m09:11:32,029 INFO [org.infinispan.CLUSTER] (thread-24,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 36 48 55-58 65 75 90 93 108-109 126 150 172 176-177 179-180 204 229-230} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[31m09:11:32,030 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-4 7-9 12-13 30-31 36-37 48-49 55-59 65 75-77 88-90 92-93 108-109 118-120 126 150 156-157 172 176-177 179-180 196 204-205 229-230 235 251} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215, dev214]
> [JBossINF] [0m[0m09:12:29,829 INFO [org.infinispan.CLUSTER] (thread-21,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev214|10] (4) [dev214, dev212, dev213, dev215], 2 subgroups: [dev212|8] (3) [dev212, dev213, dev215], [dev214|9] (2) [dev214, dev212]
> {noformat}
> bq. please note that the log saying that node dev213 left the cluster, look suspicious: node dev213 was started at 8:59:53 and then restarted at 9:12:29, so the log saying node dev213 left the cluster at 9:11:32 is a bit strange
> This run already used modified jgroups time-outs:
> {noformat}
> <protocol type="FD_ALL">
> <property name="timeout">10000</property>
> <property name="interval">2000</property>
> <property name="timeout_check_interval">1000</property>
> </protocol>
> <protocol type="VERIFY_SUSPECT">
> <property name="timeout">1000</property>
> </protocol>
> {noformat}
> h2. Second run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 18|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> The error was observed also in a [previous run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were unmodified.
> h2. Third run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 21|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> The error was observed also in [run|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/E...] where those *FD_ALL* and *VERIFY_SUSPECT* values were set accordingly to what this [JIRA|https://issues.jboss.org/browse/ISPN-9087] states the previous values for FD_ALL were:
> {noformat}
> <FD_ALL timeout="60000"
> interval="15000"
> timeout_check_interval="5000"
> />
> {noformat}
> In this run, the error is observed on node dev212:
> {noformat}
> [JBossINF] [0m[33m03:56:59,728 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev212) JGRP000032: dev212: no physical address for 2806f77e-ee15-45dc-283d-683a4828e878, dropping message
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,336 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100000: Node dev214 joined the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev215 left the cluster
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-30,ejb,dev212) ISPN100001: Node dev214 left the cluster
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[31m03:58:02,339 ERROR [org.infinispan.statetransfer.StateConsumerImpl] (transport-thread--p15-t15) ISPN000208: No live owners found for segments {2-3 21-26 30 46 53-54 58-59 64 69 75 82-83 88 142 150 233 236} of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar. Excluded owners: [dev215]
> [JBossINF] [0m[33m03:58:02,340 WARN [org.infinispan.statetransfer.InboundTransferTask] (stateTransferExecutor-thread--p20-t14) ISPN000210: Failed to request state of cache clusterbench-ee7.ear/clusterbench-ee7-ejb.jar from node dev214, segments {47-48 65 87 102 157 163 187-188 190-191 221-223 228 232}: org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node dev214 was suspected
> {noformat}
> but the logs on dev214 show the node wasn't down; it was just restarted and was logging the following:
> {noformat}
> [JBossINF] [0m[0m03:56:14,093 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://10.16.176.60:9990/management
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://10.16.176.60:9990
> [JBossINF] [0m[0m03:56:14,095 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 14.0.0.Beta2-SNAPSHOT (WildFly Core 6.0.0.Alpha4) started in 8533ms - Started 1156 of 1353 services (511 services are lazy, passive or on-demand)
> 2018/07/29 03:56:14:095 EDT [DEBUG][Thread-89] HOST dev220.mw.lab.eng.bos.redhat.com:rootProcess:test - JBossStartup, server started!
> [JBossINF] [0m[33m03:57:13,441 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 43 from non-member dev213 (view=[dev214|0] (1) [dev214]) (received 17 identical messages from dev213 in the last 61714 ms)
> [JBossINF] [0m[33m03:57:15,289 WARN [org.jgroups.protocols.pbcast.NAKACK2] (thread-8,ejb,dev214) JGRP000011: dev214: dropped message 90 from non-member dev215 (view=[dev214|0] (1) [dev214]) (received 3 identical messages from dev215 in the last 61551 ms)
> [JBossINF] [0m[33m03:57:57,334 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:57:59,339 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:01,342 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[0m03:58:02,337 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,338 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,339 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,340 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,341 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[0m03:58:02,342 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN000093: Received new, MERGED cluster view for channel ejb: MergeView::[dev212|10] (3) [dev212, dev213, dev214], 1 subgroups: [dev214|0] (1) [dev214]
> [JBossINF] [0m[0m03:58:02,343 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev212 joined the cluster
> [JBossINF] [0m[0m03:58:02,344 INFO [org.infinispan.CLUSTER] (thread-13,ejb,dev214) ISPN100000: Node dev213 joined the cluster
> [JBossINF] [0m[33m03:58:03,345 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:05,347 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> [JBossINF] [0m[33m03:58:07,350 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ejb,dev214) JGRP000032: dev214: no physical address for 710670e7-7bb0-9e01-743e-abad40b595ec, dropping message
> ...
> {noformat}
> h2. Fourth run [eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB run 23|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EA...]
> We made an attempt with the following setting on the same segment (perf27-eap) (the idea being allowing some longer time-out to verify suspected nodes):
> {noformat}
> <protocol type="FD_ALL">
> <property name="timeout">5000</property>
> <property name="interval">1000</property>
> <property name="timeout_check_interval">2000</property>
> </protocol>
> <protocol type="VERIFY_SUSPECT">
> <property name="timeout">5000</property>
> </protocol>
> {noformat}
> we didn't observe the error, but we observed the following {{*ERROR*}} on node {{*[dev215|https://jenkins.hosts.mwqe.eng.bos.redhat.com/hudson/view/EAP7/view/EAP7-Clustering_JJB/view/clustering-db-session-tests/job/eap-7x-db-failover-db-session-shutdown-repl-sync-postgres-9-4_JJB/23/console-dev215/]*}}:
> {noformat}
> [JBossINF] [0m[31m13:42:40,794 ERROR [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN000196: Failed to recover cluster state after the current node became the coordinator (or after merge): java.util.concurrent.ExecutionException: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214
> [JBossINF] at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> [JBossINF] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
> [JBossINF] at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93)
> [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:585)
> [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:450)
> [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.becomeCoordinator(ClusterTopologyManagerImpl.java:334)
> [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:313)
> [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87)
> [JBossINF] at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731)
> [JBossINF] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
> [JBossINF] at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
> [JBossINF] at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47)
> [JBossINF] at java.lang.Thread.run(Thread.java:748)
> [JBossINF] Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 34 from dev214
> [JBossINF] at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
> [JBossINF] at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
> [JBossINF] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [JBossINF] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [JBossINF] ... 1 more
> [JBossINF]
> [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p16-t24) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214].
> [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p13-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214].
> [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p15-t2) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214].
> [JBossINF] [0m[31m13:42:40,798 FATAL [org.infinispan.CLUSTER] (transport-thread--p14-t5) ISPN100004: After merge (or coordinator change), the coordinator failed to recover cluster. Cluster members are [dev215, dev214].
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
5 years, 4 months
[JBoss JIRA] (WFWIP-75) it's not possible to enable Hibernate Bytecode Transformer using j-d-s.xml
by Scott Marlow (JIRA)
[ https://issues.jboss.org/browse/WFWIP-75?page=com.atlassian.jira.plugin.s... ]
Scott Marlow commented on WFWIP-75:
-----------------------------------
I should be able to merge the support fro WFWIP-75 after adding a org.jboss.as.hibernate.Transformed marker interface that means that an application class has been transformed, which we will use to avoid transforming the same class more than once.
> it's not possible to enable Hibernate Bytecode Transformer using j-d-s.xml
> --------------------------------------------------------------------------
>
> Key: WFWIP-75
> URL: https://issues.jboss.org/browse/WFWIP-75
> Project: WildFly WIP
> Issue Type: Bug
> Components: JPA
> Reporter: Martin Simka
> Assignee: Scott Marlow
> Priority: Blocker
>
> based on testing with https://github.com/scottmarlow/wildfly/tree/WFLY-10520_pr_hack
> it's not possible to enable transformer for application in {{jboss-deployment-structure.xml}}.
> there are probably more issues
> # application has to have dependency on module {{org.hibernate.bytecodetransfomer}} (which is ok, users must edit j-d-s.xml anyway) / or {{org.hibernate.bytecodetransfomer}} must be global module
> # transformer has private constructor so {{ClassFileTransformerProcessor}} cannot instantiate it
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
5 years, 4 months
[JBoss JIRA] (WFWIP-75) it's not possible to enable Hibernate Bytecode Transformer using j-d-s.xml
by Scott Marlow (JIRA)
[ https://issues.jboss.org/browse/WFWIP-75?page=com.atlassian.jira.plugin.s... ]
Scott Marlow commented on WFWIP-75:
-----------------------------------
Applications shouldn't need to add a dependency on org.hibernate.bytecodetransfomer, as we will remove that module in the future.
With my other local changes, the following j-d-s.xml (moved to a separate file) seems to work:
{code}
<jboss-deployment-structure>
<deployment>
<transformers>
<transformer class="org.jboss.as.hibernate.Hibernate51CompatibilityTransformer"/>
</transformers>
<dependencies>
<module name="com.h2database.h2" />
<module name="org.slf4j"/>
</dependencies>
</deployment>
<sub-deployment name="beans.jar">
<transformers>
<transformer class="org.jboss.as.hibernate.Hibernate51CompatibilityTransformer"/>
</transformers>
</sub-deployment>
<sub-deployment name="main.war">
<transformers>
<transformer class="org.jboss.as.hibernate.Hibernate51CompatibilityTransformer"/>
</transformers>
</sub-deployment>
</jboss-deployment-structure>
{code}
> it's not possible to enable Hibernate Bytecode Transformer using j-d-s.xml
> --------------------------------------------------------------------------
>
> Key: WFWIP-75
> URL: https://issues.jboss.org/browse/WFWIP-75
> Project: WildFly WIP
> Issue Type: Bug
> Components: JPA
> Reporter: Martin Simka
> Assignee: Scott Marlow
> Priority: Blocker
>
> based on testing with https://github.com/scottmarlow/wildfly/tree/WFLY-10520_pr_hack
> it's not possible to enable transformer for application in {{jboss-deployment-structure.xml}}.
> there are probably more issues
> # application has to have dependency on module {{org.hibernate.bytecodetransfomer}} (which is ok, users must edit j-d-s.xml anyway) / or {{org.hibernate.bytecodetransfomer}} must be global module
> # transformer has private constructor so {{ClassFileTransformerProcessor}} cannot instantiate it
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
5 years, 4 months
[JBoss JIRA] (WFWIP-75) it's to possible to enable Hibernate Bytecode Transformer using j-d-s.xml
by Martin Simka (JIRA)
Martin Simka created WFWIP-75:
---------------------------------
Summary: it's to possible to enable Hibernate Bytecode Transformer using j-d-s.xml
Key: WFWIP-75
URL: https://issues.jboss.org/browse/WFWIP-75
Project: WildFly WIP
Issue Type: Bug
Components: JPA
Reporter: Martin Simka
Assignee: Scott Marlow
Priority: Blocker
based on testing with https://github.com/scottmarlow/wildfly/tree/WFLY-10520_pr_hack
it's not possible to enable transformer for application in {{jboss-deployment-structure.xml}}.
there are probably more issues
# application has to have dependency on module {{org.hibernate.bytecodetransfomer}} (which is ok, users must edit j-d-s.xml anyway) / or {{org.hibernate.bytecodetransfomer}} must be global module
# transformer has private constructor so {{ClassFileTransformerProcessor}} cannot instantiate it
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
5 years, 4 months
[JBoss JIRA] (WFWIP-56) IndexOutOfBoundsException in JDBC HA scenario
by Martyn Taylor (JIRA)
[ https://issues.jboss.org/browse/WFWIP-56?page=com.atlassian.jira.plugin.s... ]
Martyn Taylor reopened WFWIP-56:
--------------------------------
> IndexOutOfBoundsException in JDBC HA scenario
> ---------------------------------------------
>
> Key: WFWIP-56
> URL: https://issues.jboss.org/browse/WFWIP-56
> Project: WildFly WIP
> Issue Type: Bug
> Components: Artemis
> Reporter: Erich Duda
> Assignee: Martyn Taylor
> Priority: Blocker
> Labels: feature-branch-blocker
>
> PR: https://github.com/wildfly/wildfly/pull/11355
> *Scenario:*
> * There are two Wildfly servers each containing two Artemis brokers
> * Artemis brokers creates two live-backup pairs. Each Wildfly instance contains one live and one backup
> * Second server has MDB which resends messages from InQueue to OutQueue
> * The second server is stopped and restarted
> After the second server is restarted, everything works correctly for a while. Failback performs successfully, MDB resends messages. However after some time the server starts to print many errors \[1\]. It looks like some messages where corrupted during failover/failback and cannot be delivered.
> The reproducer always hits the issue. The issue is not relevant to particular database, I hit it with Oracle12c and DB2 11.1. I also tried to reduce amount of messages and their size, but I still hit the issue. So it is not related to paging or large messages.
> \[1\]
> {code}
> 15:20:50,010 ERROR [org.apache.activemq.artemis.ra] (Thread-7 (ActiveMQ-client-global-threads)) AMQ154004: Failed to deliver message: java.lang.IndexOutOfBoundsException
> at org.apache.activemq.artemis.core.buffers.impl.ChannelBufferWrapper.readSimpleStringInternal(ChannelBufferWrapper.java:93) [artemis-commons-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.core.buffers.impl.ChannelBufferWrapper.readNullableSimpleString(ChannelBufferWrapper.java:73) [artemis-commons-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.reader.TextMessageUtil.readBodyText(TextMessageUtil.java:37) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.jms.client.ActiveMQTextMessage.doBeforeReceive(ActiveMQTextMessage.java:112) [artemis-jms-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.ra.inflow.ActiveMQMessageHandler.onMessage(ActiveMQMessageHandler.java:295) [artemis-ra-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1001) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.access$400(ClientConsumerImpl.java:49) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1124) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:122) [artemis-commons-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_171]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_171]
> at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_171]
> {code}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
5 years, 4 months
[JBoss JIRA] (WFWIP-56) IndexOutOfBoundsException in JDBC HA scenario
by Martyn Taylor (JIRA)
[ https://issues.jboss.org/browse/WFWIP-56?page=com.atlassian.jira.plugin.s... ]
Martyn Taylor commented on WFWIP-56:
------------------------------------
Opening. Managed to recreate this issue in a separate scenario: org.jboss.qa.artemis.test.failover.DedicatedFailoverCoreBridges#testFailbackKillWithBridgeWithNettyDiscovery
> IndexOutOfBoundsException in JDBC HA scenario
> ---------------------------------------------
>
> Key: WFWIP-56
> URL: https://issues.jboss.org/browse/WFWIP-56
> Project: WildFly WIP
> Issue Type: Bug
> Components: Artemis
> Reporter: Erich Duda
> Assignee: Martyn Taylor
> Priority: Blocker
> Labels: feature-branch-blocker
>
> PR: https://github.com/wildfly/wildfly/pull/11355
> *Scenario:*
> * There are two Wildfly servers each containing two Artemis brokers
> * Artemis brokers creates two live-backup pairs. Each Wildfly instance contains one live and one backup
> * Second server has MDB which resends messages from InQueue to OutQueue
> * The second server is stopped and restarted
> After the second server is restarted, everything works correctly for a while. Failback performs successfully, MDB resends messages. However after some time the server starts to print many errors \[1\]. It looks like some messages where corrupted during failover/failback and cannot be delivered.
> The reproducer always hits the issue. The issue is not relevant to particular database, I hit it with Oracle12c and DB2 11.1. I also tried to reduce amount of messages and their size, but I still hit the issue. So it is not related to paging or large messages.
> \[1\]
> {code}
> 15:20:50,010 ERROR [org.apache.activemq.artemis.ra] (Thread-7 (ActiveMQ-client-global-threads)) AMQ154004: Failed to deliver message: java.lang.IndexOutOfBoundsException
> at org.apache.activemq.artemis.core.buffers.impl.ChannelBufferWrapper.readSimpleStringInternal(ChannelBufferWrapper.java:93) [artemis-commons-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.core.buffers.impl.ChannelBufferWrapper.readNullableSimpleString(ChannelBufferWrapper.java:73) [artemis-commons-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.reader.TextMessageUtil.readBodyText(TextMessageUtil.java:37) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.jms.client.ActiveMQTextMessage.doBeforeReceive(ActiveMQTextMessage.java:112) [artemis-jms-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.ra.inflow.ActiveMQMessageHandler.onMessage(ActiveMQMessageHandler.java:295) [artemis-ra-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1001) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.access$400(ClientConsumerImpl.java:49) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1124) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:122) [artemis-commons-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-012]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_171]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_171]
> at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_171]
> {code}
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
5 years, 4 months