Great, now I fully understand the
behavior and consequences of changes we discussed. Thanks a lot
Marco!!!
In fact, idea behind putting transaction boundary inside session
was exactly the same to enclose TaskServerHandler operations in
transaction, but went to far with simplifications (without
considering LocalTaskService).
Looking forward to the wiki :)
P.S.
Great discussion btw, I hope more people will get involved soon
(as Eric did).
Maciej
On 13.09.2012 12:44, Marco Rietveld wrote:
Hi Maciej,
Thanks for reading the whole explanation (It probably could
have been a little shorter. :) )
Remote users will not have the same problem as local users:
1. Local users are in the same JVM as the (Local)TaskService,
which means that they can still access (lazy-load) the
properties.
2. Remote users will never be able to lazy-load properties: the
Task instance that they access has been serialized and the
remote user is also in a different JVM.
Serialization of the Task object (which happens when the Task
object is sent from the server to the remote user) forces all of
the (task) properties to be accessed and, of course, serialized
-- so that when the remote user receives the Task instance, all
properties are there.
In short, the serialization required to send the Task instance
to the remote user forces a preload.
You have a good point about having too (2) many tx's in one
operation: what we can do to avoid that is simply have the
TaskServerHandler open a tx before the (taskServiceSession)
operation and close it after the write. The tx logic that
happens in the operation will then recognize that there's an
active tx and do nothing.
Of course, this is yet another band-aid on the
human-task structure, but it's the best one -- any other changes
would impact the LocalTaskService as we've discussed.
One thing I'm glad about is that it seems that I've been able to
communicate just how painful the current human-task code is for
me -- and why we need to change it.
I have a bunch of code and text that shows my ideas about how
Human-Task should work -- I'll make sure to push that to a git
repository/wiki as soon as possible so that other people can
contribute and advise.
Some of the problems/issues that I fix or try to fix are:
1. non-normalized data model
2. "Ingrown" API -- problem domain was taken as the solution
domain
3. transformation/business logic is not centralized
4. the human-task thread structure is not at all enterprise
friendly
5. badly defined API
6. Unneccessary use of a notification architecture for logging
(Human-task events are logs, not events).
Thanks,
Marco
13-09-12 12:27, Maciej Swiderski:
Thanks
Marco for explanation. I would say that making LocalTaskService
transaction based on every operation is right way to do it that
will ensure we are consistent for all cases (regardless remote
or local).
As it comes for lazy loaded properties of a task we will have
that for both remote or local, don't you think? Even if
session.write is transactional users on the other side of the
wire won't be able to access properties that are lazy loaded
(difference could be that exception will not be thrown but
null/empty list will be returned) - assuming we are not going to
preload everything in advance before writing to the session.
All comes down to the issue we expose entities to the outside
world, so to say.
I agree that making session.write will resolve issue we
currently have with remote task services but it in fact could
sightly affect performance as in some case it will mean two
transactions for one operations, correct?
Thanks
Maciej
On 13.09.2012 09:57, Marco Rietveld wrote:
Hi Maciej,
Just so were on the same page, (and for clarification to
others reading along), this is what we're talking about (I
think :) ):
1. Changing the TaskServiceSession so that instantiation
starts a transaction and disposal ends the transaction.
(Currently, tx's in human-task are started at various
different points depending on the operation requested).
2. Changing the LocalTaskService so that a TaskServiceSession
is instantiated and disposed with every operation.
The main reason to do 2 is because otherwise, programs that
are already written that use the LocalTaskService might break.
At this point, users currently using the LocalTaskService
expect that the transaction (whether it's a local or JTA tx)
will be ended by the LocalTaskService at the end of an
operation.
If we only do 1 (change tx behaviour) but not 2, then a tx
will be opened when the LocalTaskService is initiated and a tx
will only be closed when the LocalTaskService is closed. (All
the tx logic inbetween will not fire, because the tx mgr will
see that there's an active tx and not do anything to modify
the status of the active tx.).
Except, for JTA tx's -- and probably also for Spring tx's --
this isn't true. Something else that the user is doing could
then end those tx's, and that would break the LocalTaskService
instance (which expects to be able to close a tx when it
disposes -- but can't, because the user already has. ) True,
in this situation everything would work (because of the inner
tx logic) until the LocalTaskService was disposed.
Besides the technical consideration above, there's also the
fact that users now expect the behaviour of the
LocalTaskService to be transactional. That means that if
they're using the LocalTaskService, and an exception is thrown
halfway through, that the things that have been already done
using the LocalTaskService will .. well, be done.
If we don't do 2 (ensure similar tx behaviour), then the
following situation can occur, and users will definitely be
angry about this:
1. User initializes LocalTaskService
2. User starts process (where by 5 tasks are created).
3. User completes task 1 (of 5) via LocalTaskService
4. User completes task 2 (of 5) via LocalTaskService
5. Exception is thrown by something, and we exit the stack.
If we don't do ensure similar tx behavior, then a. none of the
5 tasks will have been saved and b. 2 of the 5 tasks (which
won't even exist) won't have the status "Completed".
--------------------
On another note, I'm realizing that what I'm proposing above
is not something we can do anyways.
The problem, of course, comes back to the fact that the
API/DTO object is our entity. That means that if we go through
with the 1,2 (tx by init/dispose and new TaskServiceSession
per op), then we can have the following:
1. LocalTaskService initiated, etc..
2. User calls LocalTaskService and gets a Task object back.
- which means: a. entitymanager opened, b. tx opened, c.
retrieve task d. tx closed e. em closed.
3. User does something else with LocalTaskService
- which means.. (see above)
4. User tries to access something in the Task object -- but
something in a (lazily-loaded) collection that of course
hasn't been loaded yet.
5. Proxy instance of collection element tries to retrieve the
element using the em.. that was closed in 2e.
6. "Boom!" as they say, or in other words, exception and User
doesn't understand wtf is going on.
So it looks like we're back to my original Option 2 or 3:
2. Run through option tree in order to force loading
3. tx around session.write().
I'm favoring option 3, mostly because it's the least work and
probably the most robust. Obviously, neither option involves
changing the LocalTaskService.
Thanks,
Marco
13-09-12 09:15, Maciej Swiderski:
Marco, why we need to do that? Can't
we just use it as is, meaning that several operations will
be included in same transaction, like start, complete for
example? Will this break on query level or ...
I am not sure how often it is used like that - two task
service operations in single task service session?
I can see that in some cases beneficial (like all or
nothing) and in some cases not really welcome (inserting
users/groups - one fails roll backs all others).
Thanks
Maciej
On 12.09.2012 23:56, Marco Rietveld wrote:
Maciej,
I was thinking about that -- but doing that breaks the
LocalTaskService (or otherwise, we have to rewrite
LocalTaskService so that it opens a new TaskServiceSession
for every operation, just the way the TaskServerHandler
handles that).
Actually, the more I think about that, the better it
sounds. It might impact the performance of
LocalTaskService slightly, but it will be worth it, I
think.
Thanks,
Marco
12-09-12 17:16, Maciej Swiderski:
Marco, another way could be to
ensure transaction is started when taskservicesession is
created and closed (committed/rolledback) when
taskservicesession is disposed, I did that for a fix on
https://issues.jboss.org/browse/JBPM-3763
which is on postgresql and worked fine. So that way we
ensure that session.write is in transaction as well. Of
course not tested all possible cases but worked for main
ones.
Wdyt?
Maciej
On 12.09.2012 12:22, Marco Rietveld wrote:
Hi Maciej and Mauricio,
I'm struggling to find a good solution for a problem
and was hoping to get your advice about the following.
The human-task service uses it's entities as DTO's,
namely the Task class/instances.
However, we use Hibernate, which uses lazy-loading,
which means that Hibernate substitutes proxy instances
in collections until the actual collection elements
are needed.
With Hibernate 3, we miraculously were able to avoid
any large problems. However, testing with EAP 6 has
uncovered situations, primarily with postgresql, in
which this strategy (entity as DTO) just won't work.
The problem is that even if all the "persistence" work
is done in one tx, the collections are still
lazily-loaded. That means if a task service operation
has to return a Task instance, that the serialization
of the Task object (when it's being sent) triggers the
loading of entities. Due to postgresql's Large Object
facility, this means that there needs to be a
transaction around this action. Because we don't
surround the session.write(resultsCmnd); operation
with a tx, we get an exception.
(To tell the truth, I don't understand why this worked
with Hibernate 3.. )
As I've been writing this, I've come up with a couple
of solutions:
1. Turn off lazy-loading for all entities.
2. Force the loading of all relevant entities by going
through the object tree
(task.getPeopleAssignments().size(), etc.. )
3. Put a transaction around
session.write(resultsCmnd);
Option 1 has a big impact on performance, especially
if we start talking about high-volumes.
Option 2 has a slightly larger impact on performance
but Option 3 seems a little bit ugly to me.
Are there any options I missed? Any advice or
comments?
Thanks,
Marco
PS. This is (IMHO) one of the reasons we need to
rewrite human-task. I've been working on a
proposal/POC, but the important thing is that certain
problems that we have now aren't also present in the
rewritten version.
--
jBPM/Drools developer
Utrecht, the Netherlands