Great, now I fully understand the behavior and consequences of changes we discussed. Thanks a lot Marco!!!

In fact, idea behind putting transaction boundary inside session was exactly the same to enclose TaskServerHandler operations in transaction, but went to far with simplifications (without  considering LocalTaskService).

Looking forward to the wiki :)

P.S.
Great discussion btw, I hope more people will get involved soon (as Eric did).

Maciej

On 13.09.2012 12:44, Marco Rietveld wrote:
Hi Maciej,

Thanks for reading the whole explanation  (It probably could have been a little shorter. :) )

Remote users will not have the same problem as local users:

1. Local users are in the same JVM as the (Local)TaskService, which means that they can still access (lazy-load) the properties.
2. Remote users will never be able to lazy-load properties: the Task instance that they access has been serialized and the remote user is also in a different JVM.

Serialization of the Task object (which happens when the Task object is sent from the server to the remote user) forces all of the (task) properties to be accessed and, of course, serialized -- so that when the remote user receives the Task instance, all properties are there.

In short, the serialization required to send the Task instance to the remote user forces a preload.


You have a good point about having too (2) many tx's in one operation: what we can do to avoid that is simply have the TaskServerHandler open a tx before the (taskServiceSession) operation and close it after the write. The tx logic that happens in the operation will then recognize that there's an active tx and do nothing.

Of course, this is yet another band-aid on the human-task structure, but it's the best one -- any other changes would impact the LocalTaskService as we've discussed.


One thing I'm glad about is that it seems that I've been able to communicate just how painful the current human-task code is for me -- and why we need to change it.

I have a bunch of code and text that shows my ideas about how Human-Task should work -- I'll make sure to push that to a git repository/wiki as soon as possible so that other people can contribute and advise.


Some of the problems/issues that I fix or try to fix are:

1. non-normalized data model
2. "Ingrown" API -- problem domain was taken as the solution domain
3. transformation/business logic is not centralized
4. the human-task thread structure is not at all enterprise friendly
5. badly defined API
6. Unneccessary use of a notification architecture for logging (Human-task events are logs, not events).


Thanks,
Marco


13-09-12 12:27, Maciej Swiderski:
Thanks Marco for explanation. I would say that making LocalTaskService transaction based on every operation is right way to do it that will ensure we are consistent for all cases (regardless remote or local).

As it comes for lazy loaded properties of a task we will have that for both remote or local, don't you think? Even if session.write is transactional users on the other side of the wire won't be able to access properties that are lazy loaded (difference could be that exception will not be thrown but null/empty list will be returned) - assuming we are not going to preload everything in advance before writing to the session.

All comes down to the issue we expose entities to the outside world, so to say.

I agree that making session.write will resolve issue we currently have with remote task services but it in fact could sightly affect performance as in some case it will mean two transactions for one operations, correct?

Thanks
Maciej

On 13.09.2012 09:57, Marco Rietveld wrote:
Hi Maciej,

Just so were on the same page, (and for clarification to others reading along), this is what we're talking about (I think  :) ):

1. Changing the TaskServiceSession so that instantiation starts a transaction and disposal ends the transaction. (Currently, tx's in human-task are started at various different points depending on the operation requested).
2. Changing the LocalTaskService so that a TaskServiceSession is instantiated and disposed with every operation.


The main reason to do 2 is because otherwise, programs that are already written that use the LocalTaskService might break. At this point, users currently using the LocalTaskService expect that the transaction (whether it's a local or JTA tx) will be ended by the LocalTaskService at the end of an operation.

If we only do 1 (change tx behaviour) but not 2, then a tx will be opened when the LocalTaskService is initiated and a tx will only be closed when the LocalTaskService is closed. (All the tx logic inbetween will not fire, because the tx mgr will see that there's an active tx and not do anything to modify the status of the active tx.).

Except, for JTA tx's -- and probably also for Spring tx's -- this isn't true. Something else that the user is doing could then end those tx's, and that would break the LocalTaskService instance (which expects to be able to close a tx when it disposes -- but can't, because the user already has. ) True, in this situation everything would work (because of the inner tx logic) until the LocalTaskService was disposed.

Besides the technical consideration above, there's also the fact that users now expect the behaviour of the LocalTaskService to be transactional. That means that if they're using the LocalTaskService, and an exception is thrown halfway through, that the things that have been already done using the LocalTaskService will .. well, be done.

If we don't do 2 (ensure similar tx behaviour), then the following situation can occur, and users will definitely be angry about this:

1. User initializes LocalTaskService
2. User starts process (where by 5 tasks are created).
3. User completes task 1 (of 5) via LocalTaskService
4. User completes task 2 (of 5) via LocalTaskService
5. Exception is thrown by something, and we exit the stack.

If we don't do ensure similar tx behavior, then a. none of the 5 tasks will have been saved and b. 2 of the 5 tasks (which won't even exist) won't have the status "Completed".


--------------------

On another note, I'm realizing that what I'm proposing above is not something we can do anyways.

The problem, of course, comes back to the fact that the API/DTO object is our entity. That means that if we go through with the 1,2 (tx by init/dispose and new TaskServiceSession per op), then we can have the following:

1. LocalTaskService initiated, etc..
2. User calls LocalTaskService and gets a Task object back.
   - which means: a. entitymanager opened, b. tx opened, c. retrieve task d. tx closed e. em closed.
3. User does something else with LocalTaskService
  - which means.. (see above)
4. User tries to access something in the Task object -- but something in a (lazily-loaded) collection that of course hasn't been loaded yet.
5. Proxy instance of collection element tries to retrieve the element using the em.. that was closed in 2e.
6. "Boom!" as they say, or in other words, exception and User doesn't understand wtf is going on.

So it looks like we're  back to my original Option 2 or 3:
2. Run through option tree in order to force loading
3. tx around session.write().

I'm favoring option 3, mostly because it's the least work and probably the most robust. Obviously, neither option involves changing the LocalTaskService.


Thanks,
Marco


13-09-12 09:15, Maciej Swiderski:
Marco, why we need to do that? Can't we just use it as is, meaning that several operations will be included in same transaction, like start, complete for example? Will this break on query level or ...
I am not sure how often it is used like that - two task service operations in single task service session?

I can see that in some cases beneficial (like all or nothing) and in some cases not really welcome (inserting users/groups - one fails roll backs all others).

Thanks
Maciej

On 12.09.2012 23:56, Marco Rietveld wrote:
Maciej,

I was thinking about that -- but doing that breaks the LocalTaskService (or otherwise, we have to rewrite LocalTaskService so that it opens a new TaskServiceSession for every operation, just the way the TaskServerHandler handles that).

Actually, the more I think about that, the better it sounds. It might impact the performance of LocalTaskService slightly, but it will be worth it, I think.

Thanks,
Marco

12-09-12 17:16, Maciej Swiderski:
Marco, another way could be to ensure transaction is started when taskservicesession is created and closed (committed/rolledback) when taskservicesession is disposed, I did that for a fix on https://issues.jboss.org/browse/JBPM-3763 which is on postgresql and worked fine. So that way we ensure that session.write is in transaction as well. Of course not tested all possible cases but worked for main ones.

Wdyt?

Maciej

On 12.09.2012 12:22, Marco Rietveld wrote:
Hi Maciej and Mauricio,

I'm struggling to find a good solution for a problem and was hoping to get your advice about the following.


The human-task service uses it's entities as DTO's, namely the Task class/instances.

However, we use Hibernate, which uses lazy-loading, which means that Hibernate substitutes proxy instances in collections until the actual collection elements are needed.

With Hibernate 3, we miraculously were able to avoid any large problems. However, testing with EAP 6 has uncovered situations, primarily with postgresql, in which this strategy (entity as DTO) just won't work.

The problem is that even if all the "persistence" work is done in one tx, the collections are still lazily-loaded. That means if a task service operation has to return a Task instance, that the serialization of the Task object (when it's being sent) triggers the loading of entities. Due to postgresql's Large Object facility, this means that there needs to be a transaction around this action. Because we don't surround the session.write(resultsCmnd); operation with a tx, we get an exception.

(To tell the truth, I don't understand why this worked with Hibernate 3.. )

As I've been writing this, I've come up with a couple of solutions:

1. Turn off lazy-loading for all entities.
2. Force the loading of all relevant entities by going through the object tree (task.getPeopleAssignments().size(), etc.. )
3. Put a transaction around session.write(resultsCmnd);

Option 1 has a big impact on performance, especially if we start talking about high-volumes.
Option 2 has a slightly larger impact on performance  but Option 3 seems a little bit ugly to me.


Are there any options I missed? Any advice or comments?

Thanks,
Marco

PS. This is (IMHO) one of the reasons we need to rewrite human-task. I've been working on a proposal/POC, but the important thing is that certain problems that we have now aren't also present in the rewritten version.










-- 
jBPM/Drools developer
Utrecht, the Netherlands