[
https://issues.jboss.org/browse/ISPN-5505?page=com.atlassian.jira.plugin....
]
Galder Zamarreño edited comment on ISPN-5505 at 5/26/15 7:00 AM:
-----------------------------------------------------------------
As part of the final fix for ISPN-5477 in
https://github.com/infinispan/infinispan/pull/3503, I've reduced the max retries for
DistTopologyChangeUnderLoadTest to 1 retry. That should be enough because there's only
two servers, so if one fails, it should go to the other one. That's not what's
happening in the logs, here's some snippets:
{code}
06694 INFO [org.infinispan.test.fwk.UnitTestTestNGListener]
(testng-DistTopologyChangeUnderLoadTest:) Starting test
testPutsSucceedWhileTopologyChanges(org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest)
...
06736 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory]
(testng-DistTopologyChangeUnderLoadTest:) New list: [/127.0.0.1:15267]
...
08418 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory]
(ForkThread-1,DistTopologyChangeUnderLoadTest:) New list: [/127.0.0.1:15272,
/127.0.0.1:15267]
...
10560 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory]
(ForkThread-1,DistTopologyChangeUnderLoadTest:) Using consistent hash for determining the
server: /127.0.0.1:15272
...
10574 TRACE [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation]
(ForkThread-1,DistTopologyChangeUnderLoadTest:) Exception encountered. Retry 0 out of 1
...
10575 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory]
(ForkThread-1,DistTopologyChangeUnderLoadTest:) Using the balancer for determining the
server: /127.0.0.1:15272
...
10577 TRACE [org.infinispan.server.hotrod.HotRodDecoder] (HotRodServerWorker-86-1:)
Decoded header HotRodHeader{op=PutRequest, version=22, messageId=1099, cacheName=, flag=6,
clientIntelligence=3, topologyId=2}
...
10589 TRACE [org.infinispan.server.hotrod.HotRodEncoder] (HotRodServerWorker-86-1:) Encode
msg ErrorResponse{version=22, messageId=1099, operation=ErrorResponse,
status=IllegalLifecycleState, msg=org.infinispan.IllegalLifecycleStateException:
ISPN000324: Default cache is in 'STOPPING' state and this is an invocation not
belonging to an on-going transaction, so it does not accept new invocations. Either
restart it or recreate the cache container.}
...
10591 ERROR [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation]
(ForkThread-1,DistTopologyChangeUnderLoadTest:) ISPN004007: Exception encountered. Retry 1
out of 1
...
10591 ERROR [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation]
(ForkThread-1,DistTopologyChangeUnderLoadTest:) ISPN004007: Exception encountered. Retry 1
out of 1
...
10722 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory]
(testng-DistTopologyChangeUnderLoadTest:) New list: [/127.0.0.1:15267]
...
10723 ERROR [org.infinispan.test.fwk.UnitTestTestNGListener]
(testng-DistTopologyChangeUnderLoadTest:) Test
testPutsSucceedWhileTopologyChanges(org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest)
failed.
infinispan.log:539144: at
org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest.testPutsSucceedWhileTopologyChanges(DistTopologyChangeUnderLoadTest.java:67)
{code}
As we see in the snippet, when the client makes a call against the server, it has the
updated topology with the two servers (:15272 and :15267). Based on consistent hashing, it
chooses :15272, which is fine but it fails, so it then falls back on the load balancer to
decide on next server to try, but it chooses again :15272 which is not right. It should
have chosen :15267.
The reason that's happening is because when the client receives a
{{TransportException}}, it updates the {{failedServers}} collection with the failed
server's address, but if a {{RemoteNodeSuspectException}} or a
{{RemoteIllegalLifecycleStateException}} is received, it does not update this collection.
{{RemoteNodeSuspectException}} should probably still not update the failed server
collection because this exception is normally related to a different server having been
suspected. {{RemoteIllegalLifecycleStateException}} on the other hand can be thrown
indicating failure about the node to which the client directed the call, so it should
update the failed server collection.
was (Author: galder.zamarreno):
As part of the final fix for ISPN-5477 in
https://github.com/infinispan/infinispan/pull/3503, I've reduced the max retries for
DistTopologyChangeUnderLoadTest to 1 retry. That should be enough because there's only
two servers, so if one fails, it should go to the other one. That's not what's
happening in the logs, here's some snippets:
{code}
06694 INFO [org.infinispan.test.fwk.UnitTestTestNGListener]
(testng-DistTopologyChangeUnderLoadTest:) Starting test
testPutsSucceedWhileTopologyChanges(org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest)
...
06736 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory]
(testng-DistTopologyChangeUnderLoadTest:) New list: [/127.0.0.1:15267]
...
2015-05-26 09:43:39,929 8418 TRACE
[org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory]
(ForkThread-1,DistTopologyChangeUnderLoadTest:) New list: [/127.0.0.1:15272,
/127.0.0.1:15267]
...
10560 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory]
(ForkThread-1,DistTopologyChangeUnderLoadTest:) Using consistent hash for determining the
server: /127.0.0.1:15272
...
10574 TRACE [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation]
(ForkThread-1,DistTopologyChangeUnderLoadTest:) Exception encountered. Retry 0 out of 1
...
10575 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory]
(ForkThread-1,DistTopologyChangeUnderLoadTest:) Using the balancer for determining the
server: /127.0.0.1:15272
...
10577 TRACE [org.infinispan.server.hotrod.HotRodDecoder] (HotRodServerWorker-86-1:)
Decoded header HotRodHeader{op=PutRequest, version=22, messageId=1099, cacheName=, flag=6,
clientIntelligence=3, topologyId=2}
...
10589 TRACE [org.infinispan.server.hotrod.HotRodEncoder] (HotRodServerWorker-86-1:) Encode
msg ErrorResponse{version=22, messageId=1099, operation=ErrorResponse,
status=IllegalLifecycleState, msg=org.infinispan.IllegalLifecycleStateException:
ISPN000324: Default cache is in 'STOPPING' state and this is an invocation not
belonging to an on-going transaction, so it does not accept new invocations. Either
restart it or recreate the cache container.}
...
10591 ERROR [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation]
(ForkThread-1,DistTopologyChangeUnderLoadTest:) ISPN004007: Exception encountered. Retry 1
out of 1
...
10591 ERROR [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation]
(ForkThread-1,DistTopologyChangeUnderLoadTest:) ISPN004007: Exception encountered. Retry 1
out of 1
...
10722 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory]
(testng-DistTopologyChangeUnderLoadTest:) New list: [/127.0.0.1:15267]
...
10723 ERROR [org.infinispan.test.fwk.UnitTestTestNGListener]
(testng-DistTopologyChangeUnderLoadTest:) Test
testPutsSucceedWhileTopologyChanges(org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest)
failed.
infinispan.log:539144: at
org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest.testPutsSucceedWhileTopologyChanges(DistTopologyChangeUnderLoadTest.java:67)
{code}
As we see in the snippet, when the client makes a call against the server, it has the
updated topology with the two servers (:15272 and :15267). Based on consistent hashing, it
chooses :15272, which is fine but it fails, so it then falls back on the load balancer to
decide on next server to try, but it chooses again :15272 which is not right. It should
have chosen :15267.
The reason that's happening is because when the client receives a
{{TransportException}}, it updates the {{failedServers}} collection with the failed
server's address, but if a {{RemoteNodeSuspectException}} or a
{{RemoteIllegalLifecycleStateException}} is received, it does not update this collection.
{{RemoteNodeSuspectException}} should probably still not update the failed server
collection because this exception is normally related to a different server having been
suspected. {{RemoteIllegalLifecycleStateException}} on the other hand can be thrown
indicating failure about the node to which the client directed the call, so it should
update the failed server collection.
DistTopologyChangeUnderLoadTest randomly fails with
IllegalLifecycleStateException
----------------------------------------------------------------------------------
Key: ISPN-5505
URL:
https://issues.jboss.org/browse/ISPN-5505
Project: Infinispan
Issue Type: Bug
Components: Remote Protocols
Affects Versions: 7.2.1.Final, 8.0.0.Alpha1
Reporter: Galder Zamarreño
Assignee: Galder Zamarreño
Labels: testsuite_stability
Fix For: 8.0.0.Alpha2
{code}
testPutsSucceedWhileTopologyChanges(org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest)
Time elapsed: 0.334 sec <<< FAILURE!
java.util.concurrent.ExecutionException:
org.infinispan.client.hotrod.exceptions.RemoteIllegalLifecycleStateException:Request for
message id[1099] returned server error (status=0x88):
org.infinispan.IllegalLifecycleStateException: ISPN000324: Default cache is in
'STOPPING' state and this is an invocation not belonging to an on-going
transaction, so it does not accept new invocations. Either restart it or recreate the
cache container.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest.testPutsSucceedWhileTopologyChanges(DistTopologyChangeUnderLoadTest.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:714)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
at org.testng.TestRunner.privateRun(TestRunner.java:767)
at org.testng.TestRunner.run(TestRunner.java:617)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:348)
at org.testng.SuiteRunner.access$000(SuiteRunner.java:38)
at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:382)
at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by:
org.infinispan.client.hotrod.exceptions.RemoteIllegalLifecycleStateException:Request for
message id[1099] returned server error (status=0x88):
org.infinispan.IllegalLifecycleStateException: ISPN000324: Default cache is in
'STOPPING' state and this is an invocation not belonging to an on-going
transaction, so it does not accept new invocations. Either restart it or recreate the
cache container.
at
org.infinispan.client.hotrod.impl.protocol.Codec20.checkForErrorsInResponseStatus(Codec20.java:340)
at
org.infinispan.client.hotrod.impl.protocol.Codec20.readPartialHeader(Codec20.java:126)
at org.infinispan.client.hotrod.impl.protocol.Codec20.readHeader(Codec20.java:112)
at
org.infinispan.client.hotrod.impl.operations.HotRodOperation.readHeaderAndValidate(HotRodOperation.java:56)
at
org.infinispan.client.hotrod.impl.operations.AbstractKeyValueOperation.sendPutOperation(AbstractKeyValueOperation.java:57)
at
org.infinispan.client.hotrod.impl.operations.PutOperation.executeOperation(PutOperation.java:31)
at
org.infinispan.client.hotrod.impl.operations.PutOperation.executeOperation(PutOperation.java:20)
at
org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:52)
at org.infinispan.client.hotrod.impl.RemoteCacheImpl.put(RemoteCacheImpl.java:247)
at org.infinispan.client.hotrod.impl.RemoteCacheSupport.put(RemoteCacheSupport.java:79)
at
org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest$PutHammer.call(DistTopologyChangeUnderLoadTest.java:80)
at
org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest$PutHammer.call(DistTopologyChangeUnderLoadTest.java:72)
at
org.infinispan.test.AbstractInfinispanTest$LoggingCallable.call(AbstractInfinispanTest.java:435)
... 4 more
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)