We are replacing tomcat connector with undertow in a spring-boot setting, for servlet 3.0
based container. During load performance test, see a possible issue that some request are
not dispatched from undertow, the count is very small (~ 0.01%), but highly repeatable.
Example test Setting:
Jmeter 1 client (keep-alive traffic) --> undertow, traffic is on the same client
thread, it is a blocking client, so all in-coming request to undertow are sequential.
Request sent non-stopping (as fast as possible).
(1) verified that it is not due to network packet, we used tcpdump to trace the missed
call that client-side sent the request, and server "ack" the request. But the
response is never sent. See below shows when a call is successful vs. unsuccessful"
(successful)
(client send request)
05:01:05.325799 IP 10.24.48.104.39687 > 10.25.171.155.pcsync-https: Flags [P.], seq
20304084:20304121, ack 100946003, win 501, options [nop,nop,TS val 3490584463 ecr
72906421], length 37
05:01:05.325850 IP 10.24.48.104.39687 > 10.25.171.155.pcsync-https: Flags [P.], seq
20304121:20304366, ack 100946003, win 501, options [nop,nop,TS val 3490584463 ecr
72906421], length 245
(from server)
05:01:05.325995 IP 10.25.171.155.pcsync-https > 10.24.48.104.39687: Flags [.], ack
20304366, win 1636, options [nop,nop,TS val 72906421 ecr 3490584463], length 0
05:01:05.327110 IP 10.25.171.155.pcsync-https > 10.24.48.104.39687: Flags [P.], seq
100946003:100947405, ack 20304366, win 1636, options [nop,nop,TS val 72906422 ecr
3490584463], length 1402
(unsuccessful)
(client send request)
05:01:05.327440 IP 10.24.48.104.39687 > 10.25.171.155.pcsync-https: Flags [P.], seq
20304366:20304403, ack 100947405, win 501, options [nop,nop,TS val 3490584465 ecr
72906422], length 37
05:01:05.327485 IP 10.24.48.104.39687 > 10.25.171.155.pcsync-https: Flags [P.], seq
20304403:20304648, ack 100947405, win 501, options [nop,nop,TS val 3490584465 ecr
72906422], length 245
(from server)
05:01:05.327628 IP 10.25.171.155.pcsync-https > 10.24.48.104.39687: Flags [.], ack
20304648, win 1636, options [nop,nop,TS val 72906422 ecr 3490584465], length 0
[response from server never received]
(2) From application logging, here is example when request is successful vs unsuccessful
(successful response)
2016-10-19 04:05:23,268 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Selected on
sun.nio.ch.EPollSelectorImpl@4129489f
2016-10-19 04:05:23,268 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Beginning
select on sun.nio.ch.EPollSelectorImpl@4129489f (with timeout)
2016-10-19 04:05:23,269 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Selected on
sun.nio.ch.EPollSelectorImpl@4129489f
2016-10-19 04:05:23,269 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Selected key
sun.nio.ch.SelectionKeyImpl@4218d66e for java.nio.channels.SocketChannel[connected
local=/127.0.0.1:8443 remote=/127.0.0.1:51626]
2016-10-19 04:05:23,269 TRACE [XNIO-APPLICATION I/O-8] org.xnio.listener Invoking listener
io.undertow.server.protocol.http.HttpReadListener@192666c4 on channel
org.xnio.conduits.ConduitStreamSourceChannel@5d338d2a
2016-10-19 04:05:23,270 WARN [XNIO-APPLICATION task-14]
com.paypal.raptor.samples.impl.SampleResourceImpl 156539
2016-10-19 04:05:23,270 TRACE [XNIO-APPLICATION task-14]
io.undertow.server.HttpServerExchange Starting to write response for HttpServerExchange{
GET /v1/qbd1serv/lnp/hello request {Connection=[keep-alive], Accept=[text/html, image/gif,
image/jpeg, *; q=.2, */*; q=.2], User-Agent=[Java/1.7.0_45],
Host=[qbd1serv59287.qa.paypal.com:8443]} response {Connection=[keep-alive],
SERVER_INFO=[qbd1serv:ppaas_1_2.v1.qbd1serv.lnp.hello.GET&CalThreadId=114&TopLevelTxnStartTime=157dc9dfa45&Host=kncntusrstage13875&pid=2901],
Content-Type=[text/plain; charset=UTF-8], Content-Length=[1024],
CORRELATION-ID=[4dcef4a4f99c3], Date=[Wed, 19 Oct 2016 11:05:23 GMT]}}
2016-10-19 04:05:23,270 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio Running task
io.undertow.protocols.ssl.SslConduit$1@326a9cf4
2016-10-19 04:05:23,270 TRACE [XNIO-APPLICATION task-14] org.xnio.safe-close Closing
resource
io.undertow.servlet.core.ServletBlockingHttpExchange@d9add7a<mailto:io.undertow.servlet.core.ServletBlockingHttpExchange@d9add7a>
(unsuccessful )
2016-10-19 04:05:23,270 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Beginning
select on sun.nio.ch.EPollSelectorImpl@4129489f (with timeout)
2016-10-19 04:05:24,001 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Selected on
sun.nio.ch.EPollSelectorImpl@4129489f
2016-10-19 04:05:24,001 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio Running task
io.undertow.util.DateUtils$2@d27f1e9
2016-10-19 04:05:24,001 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Beginning
select on sun.nio.ch.EPollSelectorImpl@4129489f (with timeout)
Note that rest server actions are missing, from server log, I used a counter in a servlet
filter, also server counter < client count
Note: the timestamp on the above might not perfect match on client vs server time stamp
respectively.
(3) I am on spring-boot 1.4.1, also tested using 1.3.x same behavior. I also overrides
xnio 3.4 vs 3.3.x, also same behavior. Changed configured the following other than default
..
(3.a) Undertow io- 8 threads, task worker 25 core threads, max 64 threads. Server-idle
timeout 20 sec, server-no request timeout 20 sec.
(3.b) If I use concurrent jmeter client, same behavior is also observed, the ratio did not
worse, actually, it might be slightly better than just single test client.
Please share insights if there is any configuration from my side, or else.
Thanks much,
Qian