[infinispan-issues] [JBoss JIRA] (ISPN-5505) DistTopologyChangeUnderLoadTest randomly fails with IllegalLifecycleStateException
Galder Zamarreño (JIRA)
issues at jboss.org
Tue May 26 07:01:19 EDT 2015
[ https://issues.jboss.org/browse/ISPN-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070954#comment-13070954 ]
Galder Zamarreño edited comment on ISPN-5505 at 5/26/15 7:00 AM:
-----------------------------------------------------------------
As part of the final fix for ISPN-5477 in https://github.com/infinispan/infinispan/pull/3503, I've reduced the max retries for DistTopologyChangeUnderLoadTest to 1 retry. That should be enough because there's only two servers, so if one fails, it should go to the other one. That's not what's happening in the logs, here's some snippets:
{code}
06694 INFO [org.infinispan.test.fwk.UnitTestTestNGListener] (testng-DistTopologyChangeUnderLoadTest:) Starting test testPutsSucceedWhileTopologyChanges(org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest)
...
06736 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (testng-DistTopologyChangeUnderLoadTest:) New list: [/127.0.0.1:15267]
...
2015-05-26 09:43:39,929 8418 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (ForkThread-1,DistTopologyChangeUnderLoadTest:) New list: [/127.0.0.1:15272, /127.0.0.1:15267]
...
10560 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (ForkThread-1,DistTopologyChangeUnderLoadTest:) Using consistent hash for determining the server: /127.0.0.1:15272
...
10574 TRACE [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation] (ForkThread-1,DistTopologyChangeUnderLoadTest:) Exception encountered. Retry 0 out of 1
...
10575 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (ForkThread-1,DistTopologyChangeUnderLoadTest:) Using the balancer for determining the server: /127.0.0.1:15272
...
10577 TRACE [org.infinispan.server.hotrod.HotRodDecoder] (HotRodServerWorker-86-1:) Decoded header HotRodHeader{op=PutRequest, version=22, messageId=1099, cacheName=, flag=6, clientIntelligence=3, topologyId=2}
...
10589 TRACE [org.infinispan.server.hotrod.HotRodEncoder] (HotRodServerWorker-86-1:) Encode msg ErrorResponse{version=22, messageId=1099, operation=ErrorResponse, status=IllegalLifecycleState, msg=org.infinispan.IllegalLifecycleStateException: ISPN000324: Default cache is in 'STOPPING' state and this is an invocation not belonging to an on-going transaction, so it does not accept new invocations. Either restart it or recreate the cache container.}
...
10591 ERROR [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation] (ForkThread-1,DistTopologyChangeUnderLoadTest:) ISPN004007: Exception encountered. Retry 1 out of 1
...
10591 ERROR [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation] (ForkThread-1,DistTopologyChangeUnderLoadTest:) ISPN004007: Exception encountered. Retry 1 out of 1
...
10722 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (testng-DistTopologyChangeUnderLoadTest:) New list: [/127.0.0.1:15267]
...
10723 ERROR [org.infinispan.test.fwk.UnitTestTestNGListener] (testng-DistTopologyChangeUnderLoadTest:) Test testPutsSucceedWhileTopologyChanges(org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest) failed.
infinispan.log:539144: at org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest.testPutsSucceedWhileTopologyChanges(DistTopologyChangeUnderLoadTest.java:67)
{code}
As we see in the snippet, when the client makes a call against the server, it has the updated topology with the two servers (:15272 and :15267). Based on consistent hashing, it chooses :15272, which is fine but it fails, so it then falls back on the load balancer to decide on next server to try, but it chooses again :15272 which is not right. It should have chosen :15267.
The reason that's happening is because when the client receives a {{TransportException}}, it updates the {{failedServers}} collection with the failed server's address, but if a {{RemoteNodeSuspectException}} or a {{RemoteIllegalLifecycleStateException}} is received, it does not update this collection. {{RemoteNodeSuspectException}} should probably still not update the failed server collection because this exception is normally related to a different server having been suspected. {{RemoteIllegalLifecycleStateException}} on the other hand can be thrown indicating failure about the node to which the client directed the call, so it should update the failed server collection.
was (Author: galder.zamarreno):
As part of the final fix for ISPN-5477 in https://github.com/infinispan/infinispan/pull/3503, I've reduced the max retries for DistTopologyChangeUnderLoadTest to 1 retry. That should be enough because there's only two servers, so if one fails, it should go to the other one. That's not what's happening in the logs, here's some snippets:
{code}
2015-05-26 09:43:38,205 6694 INFO [org.infinispan.test.fwk.UnitTestTestNGListener] (testng-DistTopologyChangeUnderLoadTest:) Starting test testPutsSucceedWhileTopologyChanges(org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest)
...
2015-05-26 09:43:38,247 6736 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (testng-DistTopologyChangeUnderLoadTest:) New list: [/127.0.0.1:15267]
...
2015-05-26 09:43:39,929 8418 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (ForkThread-1,DistTopologyChangeUnderLoadTest:) New list: [/127.0.0.1:15272, /127.0.0.1:15267]
...
2015-05-26 09:43:42,071 10560 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (ForkThread-1,DistTopologyChangeUnderLoadTest:) Using consistent hash for determining the server: /127.0.0.1:15272
...
2015-05-26 09:43:42,085 10574 TRACE [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation] (ForkThread-1,DistTopologyChangeUnderLoadTest:) Exception encountered. Retry 0 out of 1
...
2015-05-26 09:43:42,086 10575 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (ForkThread-1,DistTopologyChangeUnderLoadTest:) Using the balancer for determining the server: /127.0.0.1:15272
...
2015-05-26 09:43:42,088 10577 TRACE [org.infinispan.server.hotrod.HotRodDecoder] (HotRodServerWorker-86-1:) Decoded header HotRodHeader{op=PutRequest, version=22, messageId=1099, cacheName=, flag=6, clientIntelligence=3, topologyId=2}
...
2015-05-26 09:43:42,100 10589 TRACE [org.infinispan.server.hotrod.HotRodEncoder] (HotRodServerWorker-86-1:) Encode msg ErrorResponse{version=22, messageId=1099, operation=ErrorResponse, status=IllegalLifecycleState, msg=org.infinispan.IllegalLifecycleStateException: ISPN000324: Default cache is in 'STOPPING' state and this is an invocation not belonging to an on-going transaction, so it does not accept new invocations. Either restart it or recreate the cache container.}
...
2015-05-26 09:43:42,102 10591 ERROR [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation] (ForkThread-1,DistTopologyChangeUnderLoadTest:) ISPN004007: Exception encountered. Retry 1 out of 1
...
2015-05-26 09:43:42,102 10591 ERROR [org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation] (ForkThread-1,DistTopologyChangeUnderLoadTest:) ISPN004007: Exception encountered. Retry 1 out of 1
...
2015-05-26 09:43:42,233 10722 TRACE [org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory] (testng-DistTopologyChangeUnderLoadTest:) New list: [/127.0.0.1:15267]
...
2015-05-26 09:43:42,234 10723 ERROR [org.infinispan.test.fwk.UnitTestTestNGListener] (testng-DistTopologyChangeUnderLoadTest:) Test testPutsSucceedWhileTopologyChanges(org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest) failed.
infinispan.log:539144: at org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest.testPutsSucceedWhileTopologyChanges(DistTopologyChangeUnderLoadTest.java:67)
{code}
As we see in the snippet, when the client makes a call against the server, it has the updated topology with the two servers (:15272 and :15267). Based on consistent hashing, it chooses :15272, which is fine but it fails, so it then falls back on the load balancer to decide on next server to try, but it chooses again :15272 which is not right. It should have chosen :15267.
The reason that's happening is because when the client receives a {{TransportException}}, it updates the {{failedServers}} collection with the failed server's address, but if a {{RemoteNodeSuspectException}} or a {{RemoteIllegalLifecycleStateException}} is received, it does not update this collection. {{RemoteNodeSuspectException}} should probably still not update the failed server collection because this exception is normally related to a different server having been suspected. {{RemoteIllegalLifecycleStateException}} on the other hand can be thrown indicating failure about the node to which the client directed the call, so it should update the failed server collection.
> DistTopologyChangeUnderLoadTest randomly fails with IllegalLifecycleStateException
> ----------------------------------------------------------------------------------
>
> Key: ISPN-5505
> URL: https://issues.jboss.org/browse/ISPN-5505
> Project: Infinispan
> Issue Type: Bug
> Components: Remote Protocols
> Affects Versions: 7.2.1.Final, 8.0.0.Alpha1
> Reporter: Galder Zamarreño
> Assignee: Galder Zamarreño
> Labels: testsuite_stability
> Fix For: 8.0.0.Alpha2
>
>
> {code}
> testPutsSucceedWhileTopologyChanges(org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest) Time elapsed: 0.334 sec <<< FAILURE!
> java.util.concurrent.ExecutionException: org.infinispan.client.hotrod.exceptions.RemoteIllegalLifecycleStateException:Request for message id[1099] returned server error (status=0x88): org.infinispan.IllegalLifecycleStateException: ISPN000324: Default cache is in 'STOPPING' state and this is an invocation not belonging to an on-going transaction, so it does not accept new invocations. Either restart it or recreate the cache container.
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest.testPutsSucceedWhileTopologyChanges(DistTopologyChangeUnderLoadTest.java:67)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84)
> at org.testng.internal.Invoker.invokeMethod(Invoker.java:714)
> at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
> at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
> at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
> at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
> at org.testng.TestRunner.privateRun(TestRunner.java:767)
> at org.testng.TestRunner.run(TestRunner.java:617)
> at org.testng.SuiteRunner.runTest(SuiteRunner.java:348)
> at org.testng.SuiteRunner.access$000(SuiteRunner.java:38)
> at org.testng.SuiteRunner$SuiteWorker.run(SuiteRunner.java:382)
> at org.testng.internal.thread.ThreadUtil$2.call(ThreadUtil.java:64)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.infinispan.client.hotrod.exceptions.RemoteIllegalLifecycleStateException:Request for message id[1099] returned server error (status=0x88): org.infinispan.IllegalLifecycleStateException: ISPN000324: Default cache is in 'STOPPING' state and this is an invocation not belonging to an on-going transaction, so it does not accept new invocations. Either restart it or recreate the cache container.
> at org.infinispan.client.hotrod.impl.protocol.Codec20.checkForErrorsInResponseStatus(Codec20.java:340)
> at org.infinispan.client.hotrod.impl.protocol.Codec20.readPartialHeader(Codec20.java:126)
> at org.infinispan.client.hotrod.impl.protocol.Codec20.readHeader(Codec20.java:112)
> at org.infinispan.client.hotrod.impl.operations.HotRodOperation.readHeaderAndValidate(HotRodOperation.java:56)
> at org.infinispan.client.hotrod.impl.operations.AbstractKeyValueOperation.sendPutOperation(AbstractKeyValueOperation.java:57)
> at org.infinispan.client.hotrod.impl.operations.PutOperation.executeOperation(PutOperation.java:31)
> at org.infinispan.client.hotrod.impl.operations.PutOperation.executeOperation(PutOperation.java:20)
> at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:52)
> at org.infinispan.client.hotrod.impl.RemoteCacheImpl.put(RemoteCacheImpl.java:247)
> at org.infinispan.client.hotrod.impl.RemoteCacheSupport.put(RemoteCacheSupport.java:79)
> at org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest$PutHammer.call(DistTopologyChangeUnderLoadTest.java:80)
> at org.infinispan.client.hotrod.DistTopologyChangeUnderLoadTest$PutHammer.call(DistTopologyChangeUnderLoadTest.java:72)
> at org.infinispan.test.AbstractInfinispanTest$LoggingCallable.call(AbstractInfinispanTest.java:435)
> ... 4 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
More information about the infinispan-issues
mailing list