[
http://jira.jboss.com/jira/browse/JBMESSAGING-1114?page=comments#action_1... ]
Tim Fox commented on JBMESSAGING-1114:
--------------------------------------
I found out why this is happening (why the callback handler can't be found). :
1. An invocation fails because remoting timed out waiting for a TCP connection to be
returned to the pool. (Pool size to small).
2. In this case remoting throws a java.net.SocketException:
throw new SocketException("Can not obtain client socket connection from pool. "
+
"Have waited " + (System.currentTimeMillis() - start) +
" milliseconds for available connection (" + usedPooled + "in
use)");
3. JBM catches the exception and since it is a java.net.SocketException, assumes some
fatal problem has happened with the connection, and falsely initiates failover to another
node.
4. Before failing over JBM closes the failed connection.
5. This results in the JBR Callback Connector getting closed.
6. All this while callbacks are still arriving from the server, since that connection is
actually still fine.
7. The connector closing process removes the callbackhandler from the callback
ServerInvoker so any more callbacks arrive barf with not being able to find the handler.
So:
After increasing the client pool size I'm not getting these exceptions any more, but
imho there are two issues here:
1) Why does remoting throw SocketException? - I would say this is inappropriate since the
socket is fine - I would suggest some kind of org.jboss.remoting.RemotingException should
be thrown?
This would allow JBM to catch it and not initiate failover in this case.
In the mean-time I will have to do some kind of text comparison:
if (exception.getMessage().startsWith("Can not obtain client....))
{
//don't do failover
}
else
{
//do failover
}
This will work for now but is ugly and brittle.
2) The connector close process is not clean. I would suggest the code should be changed so
the server threads are shut down at the beginning of the connector close process.
I.e.
a) Wait for current invocations on that server invoker to complete - and don't allow
any more.
b) Once all are complete shut it down.
(I believe we also have a deadlock on JBoss AS that Clebert discovered that was something
due to connectors closing. I don't know if this is related?)
JBoss Remoting fails under load
-------------------------------
Key: JBMESSAGING-1114
URL:
http://jira.jboss.com/jira/browse/JBMESSAGING-1114
Project: JBoss Messaging
Issue Type: Bug
Affects Versions: 1.4.0.GA
Reporter: Tim Fox
Assigned To: Tim Fox
Priority: Critical
Fix For: 1.4.0.SP1
JBoss Remoting fails with various different errors when under extreme load.
To replicate this, set up two clustered server nodes, using a MySQL database.
These can both be on the same machine, using ServiceBindingManager.
On a second machine run Ovidiu's messkit toolki, first to send some messages:
mess -stat send -size 10240 50000
And then to receive them back using 50 concurrent consumers:
mess -stat -sessions 50 receive all
You will notice that JBoss Remoting fails with errors:
I believe this is due to remoting incorrectly thinking a connection has failed and
shutting down the connection. Perhaps due to the load, the ping does not get through in
time to refresh the lease?
I would like a remoting solution that *does not ping* from server to client - for us this
is unnecessary.
It also seems remoting is continually timing out and recreating connections - this could
also be a source of error.
How do we configure remoting so it does not do this?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira