[wildfly-dev] discussion about JSR352 Batch and JPA
Cheng Fang
cfang at redhat.com
Thu Sep 19 10:22:11 EDT 2013
On 9/19/13 9:49 AM, Scott Marlow wrote:
> On 09/19/2013 05:13 AM, Stuart Douglas wrote:
>>
>> On Wed, Sep 18, 2013 at 2:54 PM, Scott Marlow <smarlow at redhat.com
>> <mailto:smarlow at redhat.com>> wrote:
>>
>> What are the requirements for supporting Batch section 11.6 [1]?
>> From
>> looking at JSR352, I get that each "chunk" has its own JTA
>> transaction.
>> I previously had heard that we only supported starting the
>> transaction
>> from the application processing code (via UserTransaction) but I
>> think
>> the Batch container/runtime should start a JTA transaction for each
>> "chunk" that is processed. What are we really doing for managing
>> the
>> JTA transactions for batch?
>>
>>
>> The spec says:
>>
>> 8.2.1 Each chunk is processed in a separate transaction. See section 9.7
>> for more
>> details on transactionality.
>>
>> To me that implies the batch implementation starts the transaction
>> itself, although it does seem very vague. For instance looking at those
>> diagrams it looks like the transaction is committed even if an exception
>> is thrown, and it also does not seem to give you any possibility of
>> managing the transaction yourself, or doing non-transactional work.
>
> It looks like our batch implementation [4] is starting the transaction
> (via UserTransaction), however, we do rollback the transaction (see
> line 284) if an exception is thrown. I'm not clear on what happens to
> the transaction for the different exception handling cases talked
> about in "8.2.1.4.2 Retrying Exceptions". Cheng?
For a retryable exception, its transaction behavior is further specified
in 8.2.1.4.4 Default Retry Behavior - Rollback. Basically it says the
transaction will be rolled back before retry, but it can be configured
not to rollback. You can also configure a retry limit for the chunk.
8.2.1.4.4 Default Retry Behavior - Rollback
When a retryable exception occurs, the default behavior is for the
batch runtime to rollback the current chunk and re-process it with
an item-count of 1 and a checkpoint policy of item. If the optional
ChunkListener is configured on the step, the onError method is
called before rollback. The default retry behavior can be overridden
by configuring the no-rollback-exception-classes element. See
section 8.2.1.4.5 for more information on specifying no-rollback
exceptions.
Cheng
>
> For non-transactional work, batch has the concept of a batchlet which
> is described as:
>
> "
> The batchlet element specifies a task-oriented batch step. It is
> specified as a child element of the step element. It is mutually
> exclusive with the chunk element. See 9.1.2 for further details about
> batchlets. Steps of this type are useful for performing a variety of
> tasks that are not item-oriented, such as executing a command or doing
> file transfer.
> "
>
> Also, I gave a sample batch application [5] a spin yesterday and our
> JPA deployment code scanned the transaction scoped persistence context
> and did the right thing.
>
> [4]
> https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/org/jberet/runtime/runner/ChunkRunner.java
>
> [5]
> https://github.com/arun-gupta/javaee7-samples/tree/master/batch/chunk-csv-database
>>
>>
>>
>> REGULAR CHUNK PROCESSING & JPA
>>
>>
>> For the JPA support for regular chunk processing [1], the
>> following will
>> give a new underlying persistence context per chunk (jta tx):
>>
>> @PersistenceContext(unitName = "chunkNonpartitionedAZTT4443334")
>> EntityManager em;
>>
>>
>> PARTITIONED CHUNK PROCESSING & JPA
>>
>>
>> For the JPA support for partitioned chunk processing [2], the
>> following
>> will give a new underlying persistence context per chunk (jta tx):
>>
>> @PersistenceContext(unitName = "chunkpartitionedAZTT4443334")
>> EntityManager em;
>>
>> One concern that I have about partitioning is the performance
>> impact of
>> deadlocking and waiting for the JTA transaction to time out.
>> Depending
>> on whether the work is configured to retry or not, hitting
>> several dead
>> locks in a batch process could defeat some of the performance
>> gains of
>> partitioning. Avoiding deadlocks by always
>> reading/writing/locking the
>> underlying database resources in the same exact order, would help
>> avoid
>> deadlocks.
>>
>>
>> IMHO if you are attempting to partition work that results in a deadlock
>> then you are basically doing something wrong. Also, as each chunk should
>> be basically running the same code but with difference data, in general
>> it should acquire locks in the same order anyway.
>
> Hard to say since this is user code that is being invoked (all kinds
> of ordering issues could easily be introduced via several layers of
> conditional code). There could also be unexpected use of the same
> application database while the batch process is running. The
> application code could also access the database rows in a different
> order as well (or pages if using page level locking).
>
>>
>> Stuart
>>
>>
>> Beyond the basic JPA capability of ensuring that each "chunk",
>> has its
>> own persistence context, how else can we help the batch processing
>> experts writing JSR-352 applications on WildFly/Hibernate?
>>
>> Anything else to discuss about JPA & Batch?
>>
>> [1] Batch spec section section 11.6 Regular Chunk Processing
>> https://gist.github.com/scottmarlow/6603746
>>
>> [2] Batch spec section Batch 11.7 Partitioned Chunk Processing
>> https://gist.github.com/scottmarlow/6607667
>>
>> [3] persistence.xml https://gist.github.com/scottmarlow/6608533
>>
>> _______________________________________________
>> wildfly-dev mailing list
>> wildfly-dev at lists.jboss.org <mailto:wildfly-dev at lists.jboss.org>
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/wildfly-dev/attachments/20130919/46d412d9/attachment-0001.html
More information about the wildfly-dev
mailing list