> I am not 100% sure what you are trying to test here
> The global state protected by a sync lock will not perform very well, 
> and does not in any way represent the way the majority of web applications manage their state.
>
> What sort of real world situation you are trying to simulate here?


i've written a java database engine - the sync lock is simulating this database. initially i'm focusing on low-end VPS hosts, so i'm more interested in efficiency than peak performance. i'm just trying to get a baseline (as opposed to a benchmark) of what async is capable of on various platforms

my database is (kilim) fiber-based and can handle far more concurrent queries than the OS can support threads, so my motivation is to verify that the web server can support a comparable number of open connections. this isn't apples to apples - in the real case there could be many more connections and much more delay, but this seems like a good start



here's the faster version:
https://github.com/nqzero/jempower/blob/master/utow/src/main/java/UtowAsync2.java

it still has the global lock, but does the actual processing in multiple threads. the repository has versions for some other platforms as well (jetty, comsat and kilim)

comparing the 3 undertow handlers
124k req/s - sync (from TechEmpower benchmark)
 92k req/s - async (reply inside lock)
118k req/s - async2 (with 3 threads)

this is on an i3, and i'm testing with ab (2 instances are required to saturate the sync and async2 versions) at various concurrency levels (mostly 1000 - 10000), and the TechEmpower version results in some receive errors at lower concurrencies than the async versions

i also need an example of asynchronous usage so i can tell people how to build a webapp using my database with undertow, so i want to be doing the "right" things - hence the simple async version. thanks for the feedback to my questions. if you can point me to a simpler example of async undertow (that doesn't involve a thread-per-connection), i'm happy to use that instead

my conclusion is that for undertow, using async adds at most an insignificant overhead, perhaps something on the order of 5% throughput and can be implemented easily, using the dispatch/getResponseSender pair

does this sound right to you ?

any idea how much memory is allocated for each outstanding request ? i'm assuming it is much less than the memory needed for an OS thread (using top i see a 40k concurrency delta -> a delta of 300M in resident memory usage, but realize that this is a crude technique)

if you'd like to see any changes or want to add a caveat so i don't misrepresent undertow, add a pull request or just let me know - my goal is to advocate for async while being (reasonably) platform agnostic