]
Bogdan Sikora commented on ISPN-6940:
-------------------------------------
[~pferraro] To whom should i assign the Jira?
Unavailable servers with Replication timeout exception
------------------------------------------------------
Key: ISPN-6940
URL:
https://issues.jboss.org/browse/ISPN-6940
Project: Infinispan
Issue Type: Bug
Affects Versions: 8.2.3.Final
Reporter: Bogdan Sikora
Priority: Critical
Attachments: clusterbench.war
Exception in log after every request
{noformat}
ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-1)
ISPN000136: Error executing command GetKeyValueCommand, writing keys []:
org.infinispan.util.concurrent.TimeoutException: Replication timeout for jboss-eap-7.1-1
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:801)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.staggeredProcessNext(CommandAwareRpcDispatcher.java:375)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.lambda$processCallsStaggered$3(CommandAwareRpcDispatcher.java:357)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
09:46:25,427 ERROR [io.undertow.request] (default task-1) UT005023: Exception handling
request to /clusterbench/jvmroute: org.infinispan.util.concurrent.TimeoutException:
Replication timeout for jboss-eap-7.1-1
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:801)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.staggeredProcessNext(CommandAwareRpcDispatcher.java:375)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.lambda$processCallsStaggered$3(CommandAwareRpcDispatcher.java:357)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}
But it never disabled whole server to every request, eap generates this pretty page with
replication error exception
{noformat}
<html><head><title>ERROR</title><style>body {
font-family: "Lucida Grande", "Lucida Sans Unicode",
"Trebuchet MS", Helvetica, Arial, Verdana, sans-serif;
margin: 5px;
}
.header {
background-image: linear-gradient(bottom, rgb(153,151,153) 8%, rgb(199,199,199)
54%);
background-image: -o-linear-gradient(bottom, rgb(153,151,153) 8%, rgb(199,199,199)
54%);
background-image: -moz-linear-gradient(bottom, rgb(153,151,153) 8%, rgb(199,199,199)
54%);
background-image: -webkit-linear-gradient(bottom, rgb(153,151,153) 8%,
rgb(199,199,199) 54%);
background-image: -ms-linear-gradient(bottom, rgb(153,151,153) 8%, rgb(199,199,199)
54%);
background-image: -webkit-gradient(
linear,
left bottom,
left top,
color-stop(0.08, rgb(153,151,153)),
color-stop(0.54, rgb(199,199,199))
);
color: black;
padding: 2px;
font-weight: normal;
border: solid 1px;
font-size: 170%;
text-align: left;
vertical-align: middle;
height: 32px;
}
.error-div {
display: inline-block; width: 32px; height: 32px; background:
url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAABmJLR0QAAAAAAAD5Q7t/AAAACXBIWXMAAABIAAAASABGyWs+AAAACXZwQWcAAAAgAAAAIACH+pydAAAGGElEQVRYw8WXW2wcVxnHf+fM7NXrdXbdtZ3ipGqCEzvQKmlFoUggwkvvGBoaqVVV5Y0nkCoQPPCAhMQbQlzUh/IWHoh6QUVKZVLxUCGogKZRW9uJkyg0iVzZ2dhre9c7u7M7c87hYWc2M7t2bBASI306O/85+v7/7zLn24H/8yV2u/EcFNvwVNK2X0DKw1qpYaX1gCWlY0m5arRe8JQ6I2DmaVj/nwk4CxO2Zf1aw9dKo6N67+eOZHN7x0gViiSHBmlXa7hrazjLt1i+eKmxUi5bwpg/e8a8PA3X/msBr0MyJ+WvgFOffehY8v7HH5OW78HqGjgOtNrge2AnIJWAgQEYLqBsm3/NvKM/mZ1rYczpRa2/9x3w/iMBM1BCiHfuGRmZevClF9NJ5cP1m9B0+/aa3t/ZDNw3TtuSfPy737trK6sXtTGPPwOruxLwNhSEEHMHpqZGDz93wubKFdio3ZXUGNOHs2cIDh/g6pt/9G8uXLnlGPP5k1Dt5bN60m6lhXj34OTkxKET37SZm4d6oz/KgFRH7yOrBozbwlTWGf7ql6RZrWTcytrxaTj9Ro9OGb3JS/mz0nDxgYnnvmWbuUsdJ6FTY2JmImShqZA83N90UbOXOXDiG4nCcPFoTsqfbluCP8DenJQLx1/+7pBcXILaZl9qo2svFmalb48xkM9hxkf5+29+u9HSenIayn0ZyEr5i0NHpnKy7aGrtW6UehsLozRBRlRQEh3BwzLpWh3h++yfPJRLSvnzvhK8DkPamOl7n3nCUjcXtyXuK0X0WViaAFc9QtSNJe594uu2MubZGcjHBGTgqeFiQQml0K12nDgSpd4iyi4uRMci6Q8xJSWq7SGEZCg/qD14LCYgIcTzoxMHc6qygYk40aETKTFSdh2aANOWhbEsjJSYCK6C/V0LcH+9xvCB/TlbiBdCAXbQMJPZ/fvQjUbXEULAa6/1deyuh0dwNU+e7DSzEOhmi8xn9iI+ujgVE6ChlBwpYcqVjuIgA3IXBMYYtNbd30IIpJQI0ZEa+gPQnk9ieA8GSrESaGNyiWIBrVQntUHqdrq01iiluo0WilBK4fs+WutOr4S9YAyJoRzamMGYAClEve046ETyTr13EKCU6kZ+N4GNCxfwKpVOryQSuOUVhBCbsRIIuN1cWRkaSCfRrdaOTncijgltNPDn5xHZLIn7x3GNh4CVmADgcv3a9YmBY0cxm06nAQH31KlOao3BL5fxbtzAOE73hAt7JXb6RU7PGO44WK5Do1ZHw6WYAN+YM+X5S8dLXziWizVNvY5aWcFfWuq8Idsctb1Yr8DwmRwrUr1yfVMbcyYmwIWZ2mbdai9+gvfBAsaAbrUwrrtllLuJPibIGGQ2hUlIHLdlA+diTXgSqkKIt5b/OevL+8bwqlWU63anW3jyhaa4c9T2Hs0hrsL5EOCJQ/sov7/gS8Mb07AZEwDgav2D5aVy3eSziEL+Dllk0PSRcmcMqx6BUeGykINsksrttboPP4w2aVfACVhGiFevvft+I/XwJCad3DF6tc2MiD4zmRSZhyZY/OusI4V4JTqKYwIA6lr/2Gm6s5/+7UMv/egDmEyqL7VbRR/LViR6sikGvniYpffm2m6r/VFd65/0vqZ9R/tbsMeG+WKpMDL+lWMJ58Jl/NVqvOG26PAoDmCXhhg4epDl9+a9zUp1uQkPbvWfcMvZchbuEXAuk0kf2X/84YxpuDQv3kA5zW1fw9CswSyZyXFEJsmnf/m46bntuTY8+SxUtuLadri9CokxeEUK8WJhtJgaeeSIxG3RurWGv+Gg3RbaU5CwkOkU1lCWxFgBmbS4ff6qX7297rXh9Gn4/llobMcjtsFywACQ+zZMTsOP8vBIOpNS+bFiJj2cxx7MYg+k8ZwmrWqD9lqNjVvrTc9tWRtw/k345dtwFagDTmTdUUA2EDAYXffB2JPw6FH4ch7GUpCzIemD50J9A1Y+hH/8Cc4vdTq9Tud9D9dNOt+MajclyNL53xZmIhusmcBSQJLOd4UJnHqACzTppDy0kHizl/yuPRB5nggIM0A6IE4GuA34EQFtoBUQuwGm7kbwb+eaEEXmuV5dAAAAJXRFWHRjcmVhdGUtZGF0ZQAyMDA5LTExLTEwVDE5OjM4OjI0LTA3OjAwdDKp4gAAACV0RVh0ZGF0ZTpjcmVhdGUAMjAxMC0wMi0yMFQyMzoyNjoyNC0wNzowMC7DUNYAAAAldEVYdGRhdGU6bW9kaWZ5ADIwMTAtMDEtMTFUMDg6NTc6MzUtMDc6MDCruapPAAAAMnRFWHRMaWNlbnNlAGh0dHA6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUHVibGljX2RvbWFpbj/96s8AAAAldEVYdG1vZGlmeS1kYXRlADIwMDktMTEtMTBUMTk6Mzg6MjQtMDc6MDArg9/WAAAAGXRFWHRTb3VyY2UAVGFuZ28gSWNvbiBMaWJyYXJ5VM/tggAAADp0RVh0U291cmNlX1VSTABodHRwOi8vdGFuZ28uZnJlZWRlc2t0b3Aub3JnL1RhbmdvX0ljb25fTGlicmFyebzIrdYAAAAASUVORK5CYII=')
left center no-repeat;
}.error-text-div {
display: inline-block; vertical-align: top; height: 32px;}.label {
font-weight:bold; display: inline-block;}.value { display:
inline-block;}</style></head><body><div
class="header"><div class="error-div"></div><div
class="error-text-div">Error processing
request</div></div><div class="label">Context
Path:</div><div
class="value">/clusterbench</div><br/><div
class="label">Servlet Path:</div><div
class="value">/jvmroute</div><br/><div
class="label">Path Info:</div><div
class="value">null</div><br/><div
class="label">Query String:</div><div
class="value">null</div><br/><b>Stack
Trace</b><br/>org.infinispan.util.concurrent.TimeoutException: Replication
timeout for
jboss-eap-7.1-1<br/>org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:801)<br/>org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)<br/>java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)<br/>java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)<br/>java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)<br/>java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)<br/>org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.staggeredProcessNext(CommandAwareRpcDispatcher.java:375)<br/>org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.lambda$processCallsStaggered$3(CommandAwareRpcDispatcher.java:357)<br/>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)<br/>java.util.concurrent.FutureTask.run(FutureTask.java:266)<br/>java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)<br/>java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)<br/>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)<br/>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)<br/>java.lang.Thread.run(Thread.java:745)<br/></body></html>
{noformat}
Reproducing
# Unzip jboss-eap-7.1.0.DR2.zip
# copy recurcive (cp -R) to have 3 jboss-eap folders (jboss-eap-7.1, jboss-eap-7.1-2,
jboss-eap-7.1-3)
# open configuration file standalone-ha.xml
# add some offset to server ports (management-http, management-https, ajp,http, https) to
remove conficts
# add with names (value) what you want
{noformat}
<system-properties>
<property name="jboss.mod_cluster.jvmRoute"
value="jboss-eap-7.1-3"/>
<property name="jboss.node.name" value="jboss-eap-7.1-3"/>
</system-properties>
{noformat}
# You should probably change ip from localhost to prevent firewall to misbehave
# copy clusterbench.war ro standalone/deployments
# start servers with standalone-ha.xml
# curl {noformat} ${YOUR_SET_IP_ADDRESS}:${SERVER_PORT}/clusterbench/jvmroute {noformat}
# one of the servers should return 500, if not restart one of them
I had third server unable to response, when i turn off second one, third came to live.
After start of second one, third one stops responding again. Sometimes more then one
server stops to respond, i have got scenario where whole cluster stopped responding.