On 9/19/13 9:49 AM, Scott Marlow wrote:
On 09/19/2013 05:13 AM, Stuart Douglas wrote:

On Wed, Sep 18, 2013 at 2:54 PM, Scott Marlow <smarlow@redhat.com
<mailto:smarlow@redhat.com>> wrote:

    What are the requirements for supporting Batch section 11.6 [1]?  From
    looking at JSR352, I get that each "chunk" has its own JTA transaction.
       I previously had heard that we only supported starting the
    transaction
    from the application processing code (via UserTransaction) but I think
    the Batch container/runtime should start a JTA transaction for each
    "chunk" that is processed.  What are we really doing for managing the
    JTA transactions for batch?


The spec says:

8.2.1 Each chunk is processed in a separate transaction. See section 9.7
for more
details on transactionality.

To me that implies the batch implementation starts the transaction
itself, although it does seem very vague. For instance looking at those
diagrams it looks like the transaction is committed even if an exception
is thrown, and it also does not seem to give you any possibility of
managing the transaction yourself, or doing non-transactional work.

It looks like our batch implementation [4] is starting the transaction (via UserTransaction), however, we do rollback the transaction (see line 284) if an exception is thrown.  I'm not clear on what happens to the transaction for the different exception handling cases talked about in "8.2.1.4.2 Retrying Exceptions".  Cheng?
For a retryable exception, its transaction behavior is further specified in 8.2.1.4.4 Default Retry Behavior - Rollback.  Basically it says the transaction will be rolled back before retry, but it can be configured not to rollback.  You can also configure a retry limit for the chunk.

8.2.1.4.4 Default Retry Behavior - Rollback
When a retryable exception occurs, the default behavior is for the batch runtime to rollback the current chunk and re-process it with an item-count of 1 and a checkpoint policy of item. If the optional ChunkListener is configured on the step, the onError method is called before rollback. The default retry behavior can be overridden by configuring the no-rollback-exception-classes element. See section 8.2.1.4.5 for more information on specifying no-rollback exceptions.
Cheng

For non-transactional work, batch has the concept of a batchlet which is described as:

"
The batchlet element specifies a task-oriented batch step. It is specified as a child element of the step element. It is mutually exclusive with the chunk element.  See 9.1.2 for further details about batchlets. Steps of this type are useful for performing a variety of tasks that are not item-oriented, such as executing a command or doing file transfer.
"

Also, I gave a sample batch application [5] a spin yesterday and our JPA deployment code scanned the transaction scoped persistence context and did the right thing.

[4] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/org/jberet/runtime/runner/ChunkRunner.java

[5] https://github.com/arun-gupta/javaee7-samples/tree/master/batch/chunk-csv-database



    REGULAR CHUNK PROCESSING & JPA


    For the JPA support for regular chunk processing [1], the following will
    give a new underlying persistence context per chunk (jta tx):

    @PersistenceContext(unitName = "chunkNonpartitionedAZTT4443334")
       EntityManager em;


    PARTITIONED CHUNK PROCESSING & JPA


    For the JPA support for partitioned chunk processing [2], the following
    will give a new underlying persistence context per chunk (jta tx):

    @PersistenceContext(unitName = "chunkpartitionedAZTT4443334")
       EntityManager em;

    One concern that I have about partitioning is the performance impact of
    deadlocking and waiting for the JTA transaction to time out.  Depending
    on whether the work is configured to retry or not, hitting several dead
    locks in a batch process could defeat some of the performance gains of
    partitioning.  Avoiding deadlocks by always reading/writing/locking the
    underlying database resources in the same exact order, would help avoid
    deadlocks.


IMHO if you are attempting to partition work that results in a deadlock
then you are basically doing something wrong. Also, as each chunk should
be basically running the same code but with difference data, in general
it should acquire locks in the same order anyway.

Hard to say since this is user code that is being invoked (all kinds of ordering issues could easily be introduced via several layers of conditional code).  There could also be unexpected use of the same application database while the batch process is running.  The application code could also access the database rows in a different order as well (or pages if using page level locking).


Stuart


    Beyond the basic JPA capability of ensuring that each "chunk", has its
    own persistence context, how else can we help the batch processing
    experts writing JSR-352 applications on WildFly/Hibernate?

    Anything else to discuss about JPA & Batch?

    [1] Batch spec section section 11.6 Regular Chunk Processing
    https://gist.github.com/scottmarlow/6603746

    [2] Batch spec section Batch 11.7 Partitioned Chunk Processing
    https://gist.github.com/scottmarlow/6607667

    [3] persistence.xml https://gist.github.com/scottmarlow/6608533

    _______________________________________________
    wildfly-dev mailing list
    wildfly-dev@lists.jboss.org <mailto:wildfly-dev@lists.jboss.org>
    https://lists.jboss.org/mailman/listinfo/wildfly-dev