[
https://issues.jboss.org/browse/WFLY-315?page=com.atlassian.jira.plugin.s...
]
Brian Stansberry commented on WFLY-315:
---------------------------------------
My work on mgmt op cancellation has led into a general look at thread management issues
(to allow interruption of request handling without screwing up connections, we have to
isolate IO tasks from the cancellable thread, so more async tasks, more thread
complication.)
I'm thinking we should go for a single thread pool (the host or server's main
pool) with an unlimited size. We can use other techniques to throttle certain kinds of
requests. For example, stick tasks in a queue and then only let a certain number of the
general purpose threads concurrently process the queue.
Tasks we might throttle:
1) End user requests. I believe this case was the original reason for a limited pool
anyway. Note that I don't think there's any limit on the number of HTTP/REST
requests, so this limit is pretty incomplete anyway.
2) Slave registration requests. These will block anyway if executed concurrently, so why
have > 1 thread doing them? I'm referring to the initial request -- once the
message exchange starts no reason to throttle the other requests, as the initial request
throttle will naturally do this.
3) Master->Slave and Slave->server requests. Why throttle these? It can just lead to
deadlock issues. If we're throttling end user requests we shouldn't need to worry
about how many requests a controlling process is sending to its controlled process.
4) Post-registration slave->master requests. Same as 3 above.
The other aspect of this we should look into is why these "slave pulls down missing
data" requests are transactional. It's not obvious to me why they can't
return and release the lock immediately. But I'm probably missing something.
Avoid running out of threads when connecting to the DC from a slave
to pull down missing data
---------------------------------------------------------------------------------------------
Key: WFLY-315
URL:
https://issues.jboss.org/browse/WFLY-315
Project: WildFly
Issue Type: Feature Request
Security Level: Public(Everyone can see)
Components: Domain Management
Reporter: Kabir Khan
Assignee: Emanuel Muckenhuber
Priority: Blocker
Fix For: 9.0.0.CR1
For WFLY-259 when a slave connects to the DC to pull down missing data, it does this by
either getting a lock for the DC, or by joining the permit of the existing DC lock if the
request to update a slave's server-config was executed as part of a composite
obtaining a lock on the DC.
The way it works at present there is a thread per slave which is blocked until the
transaction completes. The DC threads are a finite resource, so a large number of slaves
trying to pull down dats will cause deadlock
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira