Hi Robert,
The semaphore is in place to prevent over-use, without it you can get too many concurrent requests which would stall the entire system. Should that condition be unlikely, but you still want to guard against it with less overhead, you should contemplate using a non-fair semaphore instead.
Other than that I don't see any issues.
Hmm, weird that the JDK does not have a fix-capacity queue implementation, that would fit the bill.
If you want, please share the code on github. I would very much like to see it.