[wildfly-dev] discussion about JSR352 Batch and JPA

Thu Sep 19 09:49:36 EDT 2013

On 09/19/2013 05:13 AM, Stuart Douglas wrote:
>
> On Wed, Sep 18, 2013 at 2:54 PM, Scott Marlow <smarlow at redhat.com
> <mailto:smarlow at redhat.com>> wrote:
>
>     What are the requirements for supporting Batch section 11.6 [1]?  From
>     looking at JSR352, I get that each "chunk" has its own JTA transaction.
>        I previously had heard that we only supported starting the
>     transaction
>     from the application processing code (via UserTransaction) but I think
>     the Batch container/runtime should start a JTA transaction for each
>     "chunk" that is processed.  What are we really doing for managing the
>     JTA transactions for batch?
>
>
> The spec says:
>
> 8.2.1 Each chunk is processed in a separate transaction. See section 9.7
> for more
> details on transactionality.
>
> To me that implies the batch implementation starts the transaction
> itself, although it does seem very vague. For instance looking at those
> diagrams it looks like the transaction is committed even if an exception
> is thrown, and it also does not seem to give you any possibility of
> managing the transaction yourself, or doing non-transactional work.

It looks like our batch implementation [4] is starting the transaction 
(via UserTransaction), however, we do rollback the transaction (see line 
284) if an exception is thrown.  I'm not clear on what happens to the 
transaction for the different exception handling cases talked about in 
"8.2.1.4.2 Retrying Exceptions".  Cheng?

For non-transactional work, batch has the concept of a batchlet which is 
described as:

"
The batchlet element specifies a task-oriented batch step. It is 
specified as a child element of the step element. It is mutually 
exclusive with the chunk element.  See 9.1.2 for further details about 
batchlets. Steps of this type are useful for performing a variety of 
tasks that are not item-oriented, such as executing a command or doing 
file transfer.
"

Also, I gave a sample batch application [5] a spin yesterday and our JPA 
deployment code scanned the transaction scoped persistence context and 
did the right thing.

[4] 
https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/org/jberet/runtime/runner/ChunkRunner.java

[5] 
https://github.com/arun-gupta/javaee7-samples/tree/master/batch/chunk-csv-database
>
>
>
>     REGULAR CHUNK PROCESSING & JPA
>
>
>     For the JPA support for regular chunk processing [1], the following will
>     give a new underlying persistence context per chunk (jta tx):
>
>     @PersistenceContext(unitName = "chunkNonpartitionedAZTT4443334")
>        EntityManager em;
>
>
>     PARTITIONED CHUNK PROCESSING & JPA
>
>
>     For the JPA support for partitioned chunk processing [2], the following
>     will give a new underlying persistence context per chunk (jta tx):
>
>     @PersistenceContext(unitName = "chunkpartitionedAZTT4443334")
>        EntityManager em;
>
>     One concern that I have about partitioning is the performance impact of
>     deadlocking and waiting for the JTA transaction to time out.  Depending
>     on whether the work is configured to retry or not, hitting several dead
>     locks in a batch process could defeat some of the performance gains of
>     partitioning.  Avoiding deadlocks by always reading/writing/locking the
>     underlying database resources in the same exact order, would help avoid
>     deadlocks.
>
>
> IMHO if you are attempting to partition work that results in a deadlock
> then you are basically doing something wrong. Also, as each chunk should
> be basically running the same code but with difference data, in general
> it should acquire locks in the same order anyway.

Hard to say since this is user code that is being invoked (all kinds of 
ordering issues could easily be introduced via several layers of 
conditional code).  There could also be unexpected use of the same 
application database while the batch process is running.  The 
application code could also access the database rows in a different 
order as well (or pages if using page level locking).

>
> Stuart
>
>
>     Beyond the basic JPA capability of ensuring that each "chunk", has its
>     own persistence context, how else can we help the batch processing
>     experts writing JSR-352 applications on WildFly/Hibernate?
>
>     Anything else to discuss about JPA & Batch?
>
>     [1] Batch spec section section 11.6 Regular Chunk Processing
>     https://gist.github.com/scottmarlow/6603746
>
>     [2] Batch spec section Batch 11.7 Partitioned Chunk Processing
>     https://gist.github.com/scottmarlow/6607667
>
>     [3] persistence.xml https://gist.github.com/scottmarlow/6608533
>
>     _______________________________________________
>     wildfly-dev mailing list
>     wildfly-dev at lists.jboss.org <mailto:wildfly-dev at lists.jboss.org>
>     https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
>