Adding additional IO threads was sufficient to alleviate my problem for
now.
While investigating I also realized I was deserializing inbound
websocket messages on the IO thread, and moving that to a worker thread
has also reduced the amount of non-IO work being doing on the IO
threads.
Thanks all,
-pete
--
(peter.royal|osi)(a)pobox.com -
Another approach we could potentially try is simply maintaining a
count of active connections per thread, and allocating new connections
to the one with the lowest number of open connections.
This will result in the connect itself being slightly slower, however
it guarantees the most even load distribution you can get without
somehow moving connections between threads (which is not really
something I see as being doable, any gains would probably be lost due
additional thread safety constructs being needed).
Stuart
On Fri, Jun 17, 2016 at 6:15 AM, peter royal <peter.royal(a)pobox.com>
wrote:
> Undertow standalone.
>
> Thanks for the reminder about that - I went back and looked at things
> and realized I was doing deserialization of incoming messages on the IO
> thread prior to dispatching to the worker thread. The outbound path was
> clean though.
>
> Thanks again!
>
> -pete
>
> --
> (peter.royal|osi)(a)pobox.com -
http://fotap.org/~osi
>
> On Thu, Jun 16, 2016, at 02:59 PM, Jason Greene wrote:
>> Are you using WildFly or Undertow standalone?
>>
>> If you are using undertow standalone, you might want to try enabling
>> dispatch to worker (this is the default on WildFly):
>> webSocketDeploymentInfo.setDispatchToWorkerThread(true)
>>
>> If you have message handlers that use significant CPU time or introduce
>> blocking (which a disparity like you see could potentially indicate),
>> then it can negatively impact the I/O thread’s ability to handle
>> connection events efficiently, and so dispatching to the worker pool
>> allows for long running tasks to execute without interfering with other
>> connections/activity.
>>
>> > On Jun 16, 2016, at 2:36 PM, peter royal <peter.royal(a)pobox.com>
wrote:
>> >
>> > Understood.
>> >
>> > I'm going to test with increased IO threads, and if that fixes things
>> > I'm good. Using thread user CPU time might be a good metric, as
looking
>> > at that the imbalance is clear:
>> >
>> > CPU: 2673514 ms
>> > CPU: 31270 ms
>> > CPU: 61962 ms
>> > CPU: 7952561 ms
>> >
>> > As I think through this more, optimal balancing requires pushing a lot
>> > of application-specific info down low, because a given WS connection
>> > might be high volume or not. it would be easier to migrate a connection
>> > that is detected to be high volume to another IO thread, but that'd be
a
>> > hugely invasive change. The optimal strategy for me might just be to
>> > have 1 thread per connection as the counts aren't very high.
>> >
>> > Thanks for the help!
>> >
>> > --
>> > (peter.royal|osi)(a)pobox.com -
http://fotap.org/~osi
>> >
>> > On Thu, Jun 16, 2016, at 02:17 PM, Jason Greene wrote:
>> >> The way our current approach works, which is the same approach as
>> >> SO_REUSEPORT’s impl is that address:port are hashed to select the
>> >> destination, this is mainly so we can transition with no real
behavioral
>> >> surprises. If you have some connections lasting significantly longer
than
>> >> others, then you will eventually go out of balance because the current
>> >> allocation state isn’t a factor into the decision. It’s possible to do
>> >> more advanced algorithms factoring in state, but once you do that you
tie
>> >> yourself to a single threaded acceptor (although thats currently the
case
>> >> with our emulated SO_REUSEPORT implementation). For many workloads
this
>> >> won’t matter though, as you need massive connection rates to hit the
>> >> accept stability limits.
>> >>
>> >> Maybe you want to play with modifying QueuedTcpNioServer to compare a
few
>> >> different algorithms? You could try balancing active connection count
as
>> >> one strategy, and perhaps thread user cpu time as another. For both
>> >> approaches you probably want to have i/o threads individually updating
a
>> >> volatile statistic field as part of their standard work, and then the
>> >> accept queuing thread scanning those values to select the best
>> >> destination.
>> >>
>> >>> On Jun 16, 2016, at 2:01 PM, peter royal
<peter.royal(a)pobox.com> wrote:
>> >>>
>> >>> Gotcha. I was digging through things and found the change where the
new
>> >>> strategy was introduced. With my current # of IO threads it is
giving
>> >>> un-even weighings:
>> >>>
>> >>> thread, connections
>> >>> 0, 6
>> >>> 1, 5
>> >>> 2, 3
>> >>> 3, 2
>> >>>
>> >>> I'm going to double my IO threads, but it will still be less
than
>> >>> optimal, but improved:
>> >>>
>> >>> thread, connections
>> >>> 0, 2
>> >>> 1, 1
>> >>> 2, 1
>> >>> 3, 1
>> >>> 4, 4
>> >>> 5, 4
>> >>> 6, 2
>> >>> 7, 1
>> >>>
>> >>> Random is only slightly better, eyeballing things.
>> >>>
>> >>> I'm using Undertow 1.3.22 which uses XNIO 3.3.6. Linux kernel
2.6.32
>> >>> though.
>> >>>
>> >>> Digging into my problem more, I would probably need to balance on
more
>> >>> than just connection count per IO thread, as some connections are
busier
>> >>> than others.
>> >>>
>> >>> Can you point me towards any references about the forthcoming
access to
>> >>> native facility? I'm curious as to how that will work
>> >>>
>> >>> -pete
>> >>>
>> >>> --
>> >>> (peter.royal|osi)(a)pobox.com -
http://fotap.org/~osi
>> >>>
>> >>> On Thu, Jun 16, 2016, at 01:41 PM, Jason T. Greene wrote:
>> >>>> We recently changed xnio to balance connections by default
using a
>> >>>> strategy similar to the new SO_REUSEPORT facility in the Linux
kernel
>> >>>> (3.3.3 or later). In the short future, we will be switching to
the
>> >>>> native facility when accessible in the JDK NIO implementation.
Older
>> >>>> versions had a feature called balancing tokens that you could
use to
>> >>>> balance connections fairly, but it had to be especially
configured.
>> >>>>
>> >>>>
>> >>>>> On Jun 16, 2016, at 1:00 PM, peter royal
<peter.royal(a)pobox.com> wrote:
>> >>>>>
>> >>>>> (I believe the following is true... please correct me if
not!)
>> >>>>>
>> >>>>> I have an application which heavily utilizes web sockets.
It is an
>> >>>>> internal application which uses a small number of
connections with
>> >>>>> reasonable load on each.
>> >>>>>
>> >>>>> When a new connection is received by Undertow, there is an
>> >>>>> at-connection-time assignment of an XNIO IO Thread to the
connection.
>> >>>>> This is causing uneven load on my IO threads, due to
chance.
>> >>>>>
>> >>>>> I'm increasing the number of IO threads as a temporary
fix, but it might
>> >>>>> be useful to be able to either migrate a long-lived
connection to
>> >>>>> another IO thread (harder) or do better load balancing
amongst IO
>> >>>>> threads. For the latter, if Undertow was able to provide a
strategy for
>> >>>>> picking a thread in NioXnioWorker.getIoThread(hashCode), it
could try
>> >>>>> and pick a thread that had fewer connections assigned to
it.
>> >>>>>
>> >>>>> Has anyone else run into this problem? Would a fix be
accepted?
>> >>>>>
>> >>>>> -pete
>> >>>>>
>> >>>>> --
>> >>>>> (peter.royal|osi)(a)pobox.com -
http://fotap.org/~osi
>> >>>>> _______________________________________________
>> >>>>> undertow-dev mailing list
>> >>>>> undertow-dev(a)lists.jboss.org
>> >>>>>
https://lists.jboss.org/mailman/listinfo/undertow-dev
>> >>
>> >> --
>> >> Jason T. Greene
>> >> WildFly Lead / JBoss EAP Platform Architect
>> >> JBoss, a division of Red Hat
>> >>
>>
>> --
>> Jason T. Greene
>> WildFly Lead / JBoss EAP Platform Architect
>> JBoss, a division of Red Hat
>>
>
> _______________________________________________
> undertow-dev mailing list
> undertow-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/undertow-dev