Hello!
I have a custom clustering system built on top of JBoss Remoting 2.2.0.SP4. It uses HAJndi
and JNDIDetector to manage and discover nodes. HAJndi is hosted on cluster coordinators.
Everything has worked fine until we had gone to pre-production state.
The network load has increased, and central nodes began to treat slave nodes as dead and
remove them from the registry. However at this moment, the slave nodes communicate with
the coordinators without any problem. So this is not a hardware failure. Also, I can setup
TCP connection from a coordinator node to the lost slave node by telnet at the moment of
failures.
Below you can see a quote from the log. It seems the slave and the master race with each
other to remove/insert detection object to JNDI.
anonymous wrote :
| 2007-07-04 10:04:50,406 DEBUG [org.jboss.remoting.detection.jndi.JNDIDetector] Removed
detection Detection (org.jboss.remoting.detection.Detection@8cddac31)
| 2007-07-04 10:05:56,848 DEBUG [org.jboss.remoting.detection.jndi.JNDIDetector] Removed
detection Detection (org.jboss.remoting.detection.Detection@8cddac31)
| 2
|
The first thing I've found during Remoting code review is small timeout for connection
validation and only 1 retry number. I'm going to patch the code and check if this
will solve the problem. But I suppose there was a cause why the timeout is so small. And
this does not seem as an appropriate solution anyway.
I have another unusual artifact in slaves log:
anonymous wrote :
| 2007-07-04 00:00:56,447 INFO
[org.jboss.remoting.transport.socket.MicroSocketClientInvoker] Received version 254:
treating as end of file
| 2007-07-04 00:00:56,447 INFO
[org.jboss.remoting.transport.socket.MicroSocketClientInvoker] Received version 254:
treating as end of file
|
May be this connected with the first problem? What is this?
Any hints or advances ?
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4060341#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...