I am not 100% sure what you are trying to test here
The global state protected by a sync lock will not perform very well,
and does not in any way represent the way the majority of web
applications manage
their state.
What sort of real world situation you are trying to simulate here?
i've written a java database engine - the sync lock is simulating this
database. initially i'm focusing on low-end VPS hosts, so i'm more
interested in efficiency than peak performance. i'm just trying to get a
baseline (as opposed to a benchmark) of what async is capable of on various
platforms
my database is (kilim) fiber-based and can handle far more concurrent
queries than the OS can support threads, so my motivation is to verify that
the web server can support a comparable number of open connections. this
isn't apples to apples - in the real case there could be many more
connections and much more delay, but this seems like a good start
here's the faster version:
https://github.com/nqzero/jempower/blob/master/utow/src/main/java/UtowAsy...
it still has the global lock, but does the actual processing in multiple
threads. the repository has versions for some other platforms as well
(jetty, comsat and kilim)
comparing the 3 undertow handlers
124k req/s - sync (from TechEmpower benchmark)
92k req/s - async (reply inside lock)
118k req/s - async2 (with 3 threads)
this is on an i3, and i'm testing with ab (2 instances are required to
saturate the sync and async2 versions) at various concurrency levels
(mostly 1000 - 10000), and the TechEmpower version results in some receive
errors at lower concurrencies than the async versions
i also need an example of asynchronous usage so i can tell people how to
build a webapp using my database with undertow, so i want to be doing the
"right" things - hence the simple async version. thanks for the feedback to
my questions. if you can point me to a simpler example of async undertow
(that doesn't involve a thread-per-connection), i'm happy to use that
instead
my conclusion is that for undertow, using async adds at most an
insignificant overhead, perhaps something on the order of 5% throughput and
can be implemented easily, using the dispatch/getResponseSender pair
does this sound right to you ?
any idea how much memory is allocated for each outstanding request ? i'm
assuming it is much less than the memory needed for an OS thread (using top
i see a 40k concurrency delta -> a delta of 300M in resident memory usage,
but realize that this is a crude technique)
if you'd like to see any changes or want to add a caveat so i don't
misrepresent undertow, add a pull request or just let me know - my goal is
to advocate for async while being (reasonably) platform agnostic