Failover valve logic flawed, causing
MultiThreadFailoverTest.testMultiThreadFailoverSeveralThreadsTransacted() to loop forever
------------------------------------------------------------------------------------------------------------------------------
Key: JBMESSAGING-893
URL:
http://jira.jboss.com/jira/browse/JBMESSAGING-893
Project: JBoss Messaging
Issue Type: Sub-task
Reporter: Ovidiu Feodorov
Assigned To: Clebert Suconic
Priority: Blocker
Fix For: 1.2.0.GA
This is an analysis of the
MultiThreadFailoverTest.testMultiThreadFailoverSeveralThreadsTransacted() failure
5 producers and 10 consumers concurrently send/receive messages transactionally, using a
connection to server 1.
At 20:18:23,484, the test kills server 1:
20:18:23,484 INFO @main [MultiThreadFailoverTest] Killing server 1
At 20:18:24,156, the first failure is detected, LocalThreadConsumer-6 that was sending a
transaction to the server detects remoting failure:
20:18:24,156 DEBUG @LocalThreadConsumer-6 [MicroSocketClientInvoker]
SocketClientInvoker[faa03e, bisocket://localhost:2808] got SocketException
java.net.SocketException: Connection reset
LocalThreadConsumer-6 triggers FailoverValveInterceptor and notifies FailoverCommandCenter
of failure, that uses it to close the valve:
20:18:24,234 DEBUG @LocalThreadConsumer-6 [FailoverValveInterceptor]
FailoverValve.ConnectionDelegate[5, SID=1] detected network failure, putting
sendTransaction() on hold until failover completes
20:18:24,234 DEBUG @LocalThreadConsumer-6 [FailoverCommandCenter] failure detected by
FailoverValve.ConnectionDelegate[5, SID=1]
20:18:24,234 TRACE @LocalThreadConsumer-6 [FailoverCommandCenter]
FailoverCommandCenter[ConnectionState[5]] broadcasting FAILURE_DETECTED
20:18:24,234 DEBUG @LocalThreadConsumer-6 [FailoverCommandCenter]
FailoverCommandCenter[ConnectionState[5]] initiating client-side failover
20:18:24,234 DEBUG @LocalThreadConsumer-6 [FailoverValve] FailoverValve[connectionID=5]
closing ...
However, the valve cannot be closed at this moment, since it's "busy", the
write lock is taken by other threads, so LocalThreadConsumer-6 starts looping trying to
close it.
Concurrently, various other threads detect failure, in this order: LocalThreadConsumer-2
(20:18:25,234), @Thread-117 (connection validator thread, 20:18:25,328),
LocalThreadConsumer-0 (20:18:26,234), LocalThreadProducer-1 (20:18:27,250),
LocalThreadProducer-3 (20:18:28,250), LocalThreadProducer-2 (20:18:29,250),
LocalThreadConsumer-9 (20:18:30,359), LocalThreadProducer-4 (20:18:31,359 ),
LocalThreadConsumer-1 (20:18:32,375).
They cannot close the valve, so they start looping.
At 20:18:33,281, LocalThreadProducer-0 detects failure, and successfully closes the valve,
and the client side fail-over begins:
20:18:33,281 DEBUG @LocalThreadProducer-0 [MicroSocketClientInvoker]
SocketClientInvoker[faa03e, bisocket://localhost:2808] got Exception
java.net.ConnectException: Connection refused: connect, creation attempt took 906 ms
20:18:33,281 DEBUG @LocalThreadProducer-0 [FailoverValveInterceptor]
FailoverValve.ConnectionDelegate[5, SID=1] detected network failure, putting
sendTransaction() on hold until failover completes
20:18:33,281 DEBUG @LocalThreadProducer-0 [FailoverCommandCenter] failure detected by
FailoverValve.ConnectionDelegate[5, SID=1]
20:18:33,281 TRACE @LocalThreadProducer-0 [FailoverCommandCenter]
FailoverCommandCenter[ConnectionState[5]] broadcasting FAILURE_DETECTED
20:18:33,281 DEBUG @LocalThreadProducer-0 [FailoverCommandCenter]
FailoverCommandCenter[ConnectionState[5]] initiating client-side failover
20:18:33,281 DEBUG @LocalThreadProducer-0 [FailoverValve] FailoverValve[connectionID=5]
closing ...
20:18:33,281 DEBUG @LocalThreadProducer-0 [FailoverValve] FailoverValve[connectionID=5]
closed
...
20:18:34,281 TRACE @LocalThreadProducer-0 [FailoverCommandCenter]
FailoverCommandCenter[ConnectionState[5]] broadcasting FAILOVER_STARTED
20:18:34,281 TRACE @LocalThreadProducer-0 [ClientLogInterceptor] invoking
ClusteredConnectionFactoryDelegate[SIDs={0,2}].createConnectionDelegate(null, *****, 1)
...
However, the threads that were trying to close the valve are still spinning, and they
will, forever.
For example:
20:18:40,250 WARN @LocalThreadConsumer-2 [FailoverValve] FailoverValve[connectionID=6]
could not close, trying again ...
java.lang.Exception
at org.jboss.jms.client.FailoverValve.close(FailoverValve.java:160)
at
org.jboss.jms.client.FailoverCommandCenter.failureDetected(FailoverCommandCenter.java:80)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira