[
https://issues.redhat.com/browse/WFLY-13871?page=com.atlassian.jira.plugi...
]
Richard Achmatowicz edited comment on WFLY-13871 at 10/14/20 5:01 PM:
----------------------------------------------------------------------
Some further comments on this issue.
After looking in other parts of the codebase, I see that there *already is* a mechanism in
place for advising EJB clients of suspend/resume events on the server, but it seems like
it is just not working correctly. I'm going to try to describe the mechanism, which is
related to
https://issues.redhat.com/browse/WFLY-7207
Here is a brief summary: when an invocation for EJB component C arrives at the server, it
passes through EjbSuspendInterceptor. This interceptor checks the ControlPoint for
component C to see if the invocation is allowed to proceed, according to the statistics
collected by the RequestController subsystem. If it is not allowed to proceed, before
rejecting the invocation, a service EjbSuspendHandlerService is checked to see if *it*
thinks the invocation should be allowed to proceed. EjbSuspendHandlerService decides
whether or not an invocation should proceed based on two counts that it keeps: a count of
active invocations, and a count of active transactions. If a transaction is active, it
lets the invocation proceed, as transaction completion may depend on the invocation
completing. This is the point of WFLY-7207, to provide a kind of clean shutdown for
active transactions. EjbSuspendHandlerService will only report that an invocation may not
proceed when (roughly speaking) there are no longer any active transactions on the server.
This behaviour is activated by an attribute, gracefulTransactionShutdown, of the EJB
subsystem. If gracefulTransactionShutdown is true, the EjbSuspendInterceptor will wait
until all active transactions have completed; if false, it will not.
EjbSuspendInterceptor takes information from the SuspendController (i.e. it is a listener
for suspend/resume events) and takes information from LocalTransactionContext (i.e. it
listens for transaction create and transaction completion events). It also keeps track of
active invocations, via calls from EjbSuspendInterceptor.
In addition to this, EjbSuspendHandlerService, once it receives a suspend event from the
SuspendController, it will tell the LocalTransactionContext to prohibit the creation of
any new transactions which are created with a call to beginTransaction which has the
parameter failOnSuspend == true. It will also wait for any active transactions to complete
(either commit or rollback) and when the last one completes, it will do two things: notify
the SuspendController that, as a ServerActivity listener, it is ready to suspend, and
secondly, it suspends the DeploymentRepositoryService by calling
DeploymentRepository.suspend().
The DeploymentRepository, when suspended, sends out deploymentSuspended() notifications to
all DeploymentRepository listeners for all currently deployed modules, and sends out
deploymentAvailable() followed by deploymentSuspended() notifications for any subsequently
deployed modules. When the DeploymentRepository is later resumed (by the
EjbSuspendHandlerService), it sends out deploymentResumed () notifications to all
DeploymentRepository listeners.
These deploymentSuspended() and deploymentResumed() notifications get converted into
moduleUnavailable() and moduleUnavailable() messages sent out to all EJB clients connected
to the server.
So, this arrangement should be providing the correct sequence of moduleUnAvailable and
moduleAvailable messages to EJB clients, but something is going wrong somewhere.
So, to summarise, the current PR will be thrown out and i'll focus on fixing the
existing mechanism.
was (Author: rachmato):
Some further comments on this issue. After looking in other parts of the codebase, I see
that there *already is* a mechanism in place for advising EJB clients of suspend/resume
events on the server, but it seems like it is just not working correctly. I'm going to
try to describe the mechanism, which is related to
https://issues.redhat.com/browse/WFLY-7207
Here is a brief summary: when an invocation for EJB component C arrives at the server, it
passes through EjbSuspendInterceptor. This interceptor checks the ControlPoint for
component C to see if the invocation is allowed to proceed. If it is not allowed to
proceed, before rejecting the invocation, a service EjbSuspendHandlerService is checked to
see if *it* thinks the invocation should be allowed to proceed. EjbSuspendHandlerService
decides whether or not an invocation should proceed based on two counts that it keeps: a
count of active invocations, and a count of active transactions. If a transaction is
active, it lets the invocation proceed, as transaction completion may depend on the
invocation completing. EjbSuspendHandlerService will only report that an invocation may
not proceed when (roughly speaking) there are no longer any active transactions on the
server. This behaviour is controlled by an attribute, gracefulTransactionShutdown, of the
EJB subsyetem. If gracefulTransactionShutdown is true, the EjbSuspendInterceptor will wait
until all active transactions have completed; if false, it will not.
EjbSuspendInterceptor takes information from the SuspendController (i.e. it is a listener
for suspend/resume events) and takes information from LocalTransactionContext (i.e. it
listens for transaction create and transaction completion events). It also keeps track of
active invocations, via calls from EjbSuspendInterceptor.
In addition to this, EjbSuspendHandlerService, once it receives a suspend event from the
SuspendController, it will tell the LocalTransactionContext to prohibit the creation of
any new transactions which are created with a call to beginTransaction which has the
parameter failOnSuspend == true. It will also wait for any active transactions to complete
(either commit or rollback) and when the last one completes, it will do two things: notify
the SuspendController that, as a ServerActivity kistener, it is ready to suspend, and
secondly, it suspends the DeploymentRepositoryService by calling
DeploymentRepository.suspend().
The DeploymentRepository, when suspended, sends out deploymentSuspended() notifications to
all DeploymentRepository listeners for all currently deployed modules, and sends out
deploymentAvailable() followed by deploymentSuspended() notifications for any subsequently
deployed modules. When the DeploymentRepository is later resumed (by the
EjbSuspendHandlerService), it sends out deploymentResumed () notifications to all
DeploymentRepository listeners.
These deploymentSuspended() and deploymentResumed() notifications get converted into
moduleUnavailable() and moduleUnavailable() messages sent out to all EJB clients connected
to the server.
So, this arrangement should be providing the correct sequence of moduleUnAvailable and
moduleAvailable messages to EJB clients, but something is going wrong somewhere.
So, to summarise, the current PR will be thrown out and i'll focus on fixing the
existing mechanism.
Re-implement correct suspend/resume behaviour for EJB client
------------------------------------------------------------
Key: WFLY-13871
URL:
https://issues.redhat.com/browse/WFLY-13871
Project: WildFly
Issue Type: Bug
Components: EJB
Affects Versions: 21.0.0.Beta1
Reporter: Richard Achmatowicz
Assignee: Richard Achmatowicz
Priority: Major
Labels: downstream_dependency
Fix For: 22.0.0.Beta1
When a server is suspended, in order to avoid receipt of invocations which will never be
correctly processed, all connected clients should be notified that all modules on that
server are now unavailable; when the server is resumed, all connected clients should be
notified that all modules on that server are now again available so that the server may
again take part in processing invocations.
This behaviour was present in versions of EAP before EAP 7.1 but was not ported over to
the new EJB client server-side classes which were substantially refactored.
This issue will re-instante that behaviour. The two cases of standalone and clustered
deployments should be considered.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)