[netty-dev] Problem during connection to the server which hangs the server

Frederic Bregier fredbregier at free.fr
Tue Oct 21 11:48:26 EDT 2008


Hi Trustin,

I'm not sure this problem comes from Netty or from my program.
I'm really stuck with it, also because I am not able to write a simple
example that reproduces this behaviour.
So let me explain what I observe, and perhaps (magic?) you can find a clue?

The Server side accepts as many clients as they come.
Both client and server have bootstrap.setOption("connectTimeoutMillis",x)
("child.connect..." for server) where x = 30 000 (so 30 seconds).
Both use a ChannelPipelineFactory in order to create a new
SimpleChannelHandler
for each connection (for private data for each connection on both sides).

The client is hosted by JSP within Tomcat, where each new connection to
Tomcat will
launch one new connection to Netty server through a new Netty client. This
connection
is closed immediately, so its lifetime is the lifetime of the execution of 1
JSP 
request.

Everything is OK when I run the test manually or if I benchmark with low
number
of connections. But when I load with 20 concurrent users and repeat this
load more
than 10 000 (long time testing) on my Windows Vista Laptop using JDK 1.6.5
with
the very last version of Netty (from svn), everything is ok up to about 2400
tries.

Server has the following starts: (exec is CachedThreadPool).
NB: Since I don't know what is good for server or child, I repeat almost
everything.
Except "readWriteFair" since it hangs when set as "child.readWriteFair".

ChannelFactory factory = new NioServerSocketChannelFactory(exec, exec, 64);
ServerBootstrap bootstrap = new ServerBootstrap(factory);
bootstrap.setPipelineFactory(new
IDMPipelineFactory(FileServerHandler.class));
bootstrap.setOption("child.tcpNoDelay", true);
bootstrap.setOption("child.keepAlive", true);
bootstrap.setOption("child.reuseAddress", true);
bootstrap.setOption("child.connectTimeoutMillis", 30000);
//NEVER bootstrap.setOption("child.readWriteFair", true);
bootstrap.setOption("tcpNoDelay", true);
bootstrap.setOption("reuseAddress", true);
bootstrap.setOption("connectTimeoutMillis", 30000);
bootstrap.setOption("readWriteFair", true);
bootstrap.bind(new InetSocketAddress(SERVER_PORT));

I've got the same behaviour with as simple as:
ChannelFactory factory = new NioServerSocketChannelFactory(exec, exec, 64);
ServerBootstrap bootstrap = new ServerBootstrap(factory);
bootstrap.setPipelineFactory(new
IDMPipelineFactory(FileServerHandler.class));
bootstrap.setOption("child.tcpNoDelay", true);
bootstrap.setOption("child.keepAlive", true);
bootstrap.bind(new InetSocketAddress(SERVER_PORT));

Then I've got the following (after about 2400 requests):

- On client side (Tomcat JSP), the client is waiting for the server answer,
and the
test load is blocked until a timeout occurs from the client point of view.

"2008-10-21 15:02:57,868 WARN [http-8080-Processor20]
openidm.web.client.IDMGet [NetworkHandler.java : 355] Connection (exemple:
timeout 30000 < 20938) impossible since Failed to create a selector." 

is an example of logging. The last part of the message is the
exception.getMessage
(i.e. "Failed to create a selector.").

Note that the first number is the theorical timeout (30 seconds), the second
number 
is the practical time (in ms) between the creation of the channelHandler and
this 
logging. We can observe that the number measured is less than the one given
as 
configuration in bootstrap.

- On server side, the server hangs on the bootstrap for client with the
following 
message directly from Netty:

2008-10-21 15:02:57,985 WARN [New I/O server boss #1 (channelId:
1f7ea494-011d-1000-b292-01fd0b48a883, /0.0.0.0:6666)]
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink
[LoggingHandler.java warn : 148] Failed to initialize an accepted socket.
org.jboss.netty.channel.ChannelException: Failed to create a selector.
	at org.jboss.netty.channel.socket.nio.NioWorker.register(NioWorker.java:95)
	at
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:209)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
	at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Unable to establish loopback connection
	at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:106)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.nio.ch.PipeImpl.<init>(PipeImpl.java:122)
	at sun.nio.ch.SelectorProviderImpl.openPipe(SelectorProviderImpl.java:27)
	at java.nio.channels.Pipe.open(Pipe.java:133)
	at sun.nio.ch.WindowsSelectorImpl.<init>(WindowsSelectorImpl.java:105)
	at
sun.nio.ch.WindowsSelectorProvider.openSelector(WindowsSelectorProvider.java:26)
	at java.nio.channels.Selector.open(Selector.java:209)
	at org.jboss.netty.channel.socket.nio.NioWorker.register(NioWorker.java:93)
	... 4 common frames omitted
Caused by: java.net.ConnectException: Connection timed out: connect
	at sun.nio.ch.Net.connect(Native Method)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
	at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
	at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:78)
	... 12 common frames omitted
2008-10-21 15:03:19,589 WARN [New I/O server boss #1 (channelId:
1f7ea494-011d-1000-b292-01fd0b48a883, /0.0.0.0:6666)]
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink
[LoggingHandler.java warn : 148] Failed to initialize an accepted socket.
org.jboss.netty.channel.ChannelException: Failed to create a selector.
	at org.jboss.netty.channel.socket.nio.NioWorker.register(NioWorker.java:95)
	at
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:209)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
	at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Unable to establish loopback connection
	at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:106)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.nio.ch.PipeImpl.<init>(PipeImpl.java:122)
	at sun.nio.ch.SelectorProviderImpl.openPipe(SelectorProviderImpl.java:27)
	at java.nio.channels.Pipe.open(Pipe.java:133)
	at sun.nio.ch.WindowsSelectorImpl.<init>(WindowsSelectorImpl.java:105)
	at
sun.nio.ch.WindowsSelectorProvider.openSelector(WindowsSelectorProvider.java:26)
	at java.nio.channels.Selector.open(Selector.java:209)
	at org.jboss.netty.channel.socket.nio.NioWorker.register(NioWorker.java:93)
	... 4 common frames omitted
Caused by: java.net.ConnectException: Connection timed out: connect
	at sun.nio.ch.Net.connect(Native Method)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
	at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
	at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:78)
	... 12 common frames omitted
	
No error or exception is going to the channel (no creation) so I can't have
more info.

When this log appears, the server process is going 100% CPU and is no more
responsive 
to the bootstrap associated to the client. However, I've got another
bootstrap (bind)
for special client (shutdown order) which implies the shutdown of everything
(channels and ExecutorServices). This second bootstrap is functionnal (the
connection
is done, the message is get and answered and the function is going). But the
server
does not stopped (it does when I didn't get the previous error).

So I've done an output of the JVM running. Here it is:
NB: There is only one code still running, and this is "New I/O server boss
#1"

C:\Program Files\Java\jdk1.6.0_05\bin>jstack -l 10204
2008-10-21 15:04:07
Full thread dump Java HotSpot(TM) Server VM (10.0-b19 mixed mode):

"DestroyJavaVM" prio=6 tid=0x02011c00 nid=0x10d0 waiting on condition
[0x00000000..0x0035fd20]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"New I/O server boss #1 (channelId: 1f7ea494-011d-1000-b292-01fd0b48a883,
/0.0.0.0:6666)" prio=6 tid=0x298bf400 nid=0x58c runnable
[0x2995f000..0x2995f9e8]
   java.lang.Thread.State: RUNNABLE
        at java.lang.Thread.yield(Native Method)
        at
org.jboss.netty.channel.socket.nio.NioWorker.register(NioWorker.java:102)
        at
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:209)
        at
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:78)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)

   Locked ownable synchronizers:
        - <0x092dd910> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)

"Low Memory Detector" daemon prio=6 tid=0x020eac00 nid=0x27f0 runnable
[0x00000000..0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"CompilerThread1" daemon prio=10 tid=0x020e4c00 nid=0x1830 waiting on
condition[0x00000000..0x2961f258]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"CompilerThread0" daemon prio=10 tid=0x020e0c00 nid=0x2038 waiting on
condition[0x00000000..0x295cf2d8]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Attach Listener" daemon prio=10 tid=0x020e0400 nid=0x2da8 waiting on
condition[0x00000000..0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Signal Dispatcher" daemon prio=10 tid=0x020e0000 nid=0x1cf4 runnable
[0x00000000..0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Finalizer" daemon prio=8 tid=0x020c9c00 nid=0x177c in Object.wait()
[0x294df000..0x294dfd68]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x09264c50> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
        - locked <0x09264c50> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

   Locked ownable synchronizers:
        - None

"Reference Handler" daemon prio=10 tid=0x020c6000 nid=0x2be8 in
Object.wait() [0x2948f000..0x2948f9e8]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x092b75a0> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        - locked <0x092b75a0> (a java.lang.ref.Reference$Lock)

   Locked ownable synchronizers:
        - None

"VM Thread" prio=10 tid=0x020c2800 nid=0x2a94 runnable

"GC task thread#0 (ParallelGC)" prio=6 tid=0x008efc00 nid=0x2e80 runnable

"GC task thread#1 (ParallelGC)" prio=6 tid=0x02012800 nid=0x16a8 runnable

"VM Periodic Task Thread" prio=10 tid=0x020eb000 nid=0x1f54 waiting on
condition


JNI global references: 1163


I don't know what's happening. I'm surely missing something...
I know it will not be obvious without any functional example that reproduces
this behaviour to know where could be the problem. But, who knows, perhaps
you
will find a clue... ?

Thank you,
Frederic

NB: I've got another version where I try to implement something close to a
pool
of connection to Netty Server in order to keep low the cost with less
connection/close,
but I'm stuck with strange behaviour this time from what it seems to be
Tomcat 
servlet... (Content-Length: 0, Bogus chunk size or missing CR), and the
problems seem
to come from the client side this time, not the server side.


-----
Hardware/Software Architect
-- 
View this message in context: http://n2.nabble.com/Problem-during-connection-to-the-server-which-hangs-the-server-tp1359992p1359992.html
Sent from the Netty Developer Group mailing list archive at Nabble.com.




More information about the netty-dev mailing list