We are replacing tomcat connector with undertow in a spring-boot setting, for servlet 3.0 based container. During load performance test, see a possible issue that some request are
not dispatched from undertow, the count is very small (~ 0.01%), but highly repeatable.
Example test Setting:
Jmeter 1 client (keep-alive traffic)
à undertow, traffic is on the same client thread, it is a blocking client, so all in-coming request to undertow are sequential. Request sent non-stopping
(as fast as possible).
(1) verified that it is not due to network packet, we used tcpdump to trace the missed call that client-side sent the request, and server “ack” the request. But the response is never
sent. See below shows when a call is successful vs. unsuccessful”
(successful)
(client send request)
05:01:05.325799 IP 10.24.48.104.39687 > 10.25.171.155.pcsync-https: Flags [P.], seq 20304084:20304121, ack 100946003, win 501, options [nop,nop,TS val 3490584463 ecr 72906421], length
37
05:01:05.325850 IP 10.24.48.104.39687 > 10.25.171.155.pcsync-https: Flags [P.], seq 20304121:20304366, ack 100946003, win 501, options [nop,nop,TS val 3490584463 ecr 72906421], length
245
(from server)
05:01:05.325995 IP 10.25.171.155.pcsync-https > 10.24.48.104.39687: Flags [.], ack 20304366, win 1636, options [nop,nop,TS val 72906421 ecr 3490584463], length 0
05:01:05.327110 IP 10.25.171.155.pcsync-https > 10.24.48.104.39687: Flags [P.], seq 100946003:100947405, ack 20304366, win 1636, options [nop,nop,TS val 72906422 ecr 3490584463], length
1402
(unsuccessful)
(client send request)
05:01:05.327440 IP 10.24.48.104.39687 > 10.25.171.155.pcsync-https: Flags [P.], seq 20304366:20304403, ack 100947405, win 501, options [nop,nop,TS val 3490584465 ecr 72906422], length
37
05:01:05.327485 IP 10.24.48.104.39687 > 10.25.171.155.pcsync-https: Flags [P.], seq 20304403:20304648, ack 100947405, win 501, options [nop,nop,TS val 3490584465 ecr 72906422], length
245
(from server)
05:01:05.327628 IP 10.25.171.155.pcsync-https > 10.24.48.104.39687: Flags [.], ack 20304648, win 1636, options [nop,nop,TS val 72906422 ecr 3490584465], length 0
[response from server never received]
(2) From application logging, here is example when request is successful vs unsuccessful
(successful response)
2016-10-19 04:05:23,268 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Selected on sun.nio.ch.EPollSelectorImpl@4129489f
2016-10-19 04:05:23,268 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Beginning select on sun.nio.ch.EPollSelectorImpl@4129489f (with timeout)
2016-10-19 04:05:23,269 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Selected on sun.nio.ch.EPollSelectorImpl@4129489f
2016-10-19 04:05:23,269 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Selected key sun.nio.ch.SelectionKeyImpl@4218d66e for java.nio.channels.SocketChannel[connected local=/127.0.0.1:8443 remote=/127.0.0.1:51626]
2016-10-19 04:05:23,269 TRACE [XNIO-APPLICATION I/O-8] org.xnio.listener Invoking listener io.undertow.server.protocol.http.HttpReadListener@192666c4 on channel org.xnio.conduits.ConduitStreamSourceChannel@5d338d2a
2016-10-19 04:05:23,270 WARN [XNIO-APPLICATION task-14] com.paypal.raptor.samples.impl.SampleResourceImpl 156539
2016-10-19 04:05:23,270 TRACE [XNIO-APPLICATION task-14] io.undertow.server.HttpServerExchange Starting to write response for HttpServerExchange{ GET /v1/qbd1serv/lnp/hello request {Connection=[keep-alive], Accept=[text/html, image/gif, image/jpeg, *; q=.2,
*/*; q=.2], User-Agent=[Java/1.7.0_45], Host=[qbd1serv59287.qa.paypal.com:8443]} response {Connection=[keep-alive], SERVER_INFO=[qbd1serv:ppaas_1_2.v1.qbd1serv.lnp.hello.GET&CalThreadId=114&TopLevelTxnStartTime=157dc9dfa45&Host=kncntusrstage13875&pid=2901],
Content-Type=[text/plain; charset=UTF-8], Content-Length=[1024], CORRELATION-ID=[4dcef4a4f99c3], Date=[Wed, 19 Oct 2016 11:05:23 GMT]}}
2016-10-19 04:05:23,270 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio Running task io.undertow.protocols.ssl.SslConduit$1@326a9cf4
2016-10-19 04:05:23,270 TRACE [XNIO-APPLICATION task-14] org.xnio.safe-close Closing resource
io.undertow.servlet.core.ServletBlockingHttpExchange@d9add7a
(unsuccessful )
2016-10-19 04:05:23,270 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Beginning select on sun.nio.ch.EPollSelectorImpl@4129489f (with timeout)
2016-10-19 04:05:24,001 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Selected on sun.nio.ch.EPollSelectorImpl@4129489f
2016-10-19 04:05:24,001 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio Running task io.undertow.util.DateUtils$2@d27f1e9
2016-10-19 04:05:24,001 TRACE [XNIO-APPLICATION I/O-8] org.xnio.nio.selector Beginning select on sun.nio.ch.EPollSelectorImpl@4129489f (with timeout)
Note that rest server actions are missing, from server log, I used a counter in a servlet filter, also server counter < client count
Note: the timestamp on the above might not perfect match on client vs server time stamp respectively.
(3) I am on spring-boot 1.4.1, also tested using 1.3.x same behavior. I also overrides xnio 3.4 vs 3.3.x, also same behavior. Changed configured the following other than default ..
(3.a) Undertow io- 8 threads, task worker 25 core threads, max 64 threads. Server-idle timeout 20 sec, server-no request timeout 20 sec.
(3.b) If I use concurrent jmeter client, same behavior is also observed, the ratio did not worse, actually, it might be slightly better than just single test client.
Please share insights if there is any configuration from my side, or else.
Thanks much,
Qian