WFLY-508, JBeret initial review and integration issues

Feature showcasing

Package name

David M. Lloyd

Wednesday, 24 July 2013 Wed, 24 Jul '13

11:09 a.m.

On initial review of JBeret we have noticed a number of issues that need to be addressed. The culmination amounts to a series of questions and observations here: #1) Why did we not choose to just use the RI? In other words, what benefit do we get from JBeret that is not also in the RI? In other, other words, why should we *use* this code instead of the RI at this point in time? #2) Why does JBeret duplicate facilities already present in the WildFly code base and deployer chain - e.g. annotation indexing, reflection indexing, thread management, parsing facilities, etc.? #3) Specific to algorithmic complexity - it appears that jobs are keyed by ID, yet accessed using a sequential search [1] - this does not scale well to large numbers of jobs. Is there no better approach? #4) JAXB seems to be being used to parse XML, which is a departure from all of our other services which expect parsing to be done during deployment processing in a more efficient manner. Is there any better way we can integrate this, preferably not using JAXB? #5) There are a number of resources present that seem inappropriate for the production JAR [2] [3]. Is this intentional? #6) This code base makes extensive use of static state, including static fields that seem not to be adequately protected for thread-safety, and at least one static thread pool [4]. This needs to be fixed, as these kinds of things make embedding difficult or impossible. [1] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... [2] https://github.com/jberet/jsr352/tree/master/jberet-core/src/main/resourc... [3] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/resourc... [4] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... -- - DML

Show replies by date

Jaikiran Pai

Wednesday, 24 July Wed, 24 Jul

11:15 a.m.

...

Cheng Fang

12:09 p.m.

...

One other question I had is related to where this code lies. I am not familiar with JBeret project goals or even the spec, but I am not sure what purpose the code will serve outside of wildfly codebase when most of it is going to be WildFly integration code (from what I understand). -Jaikiran On Wednesday 24 July 2013 09:39 PM, David M. Lloyd wrote: > On initial review of JBeret we have noticed a number of issues that need > to be addressed. The culmination amounts to a series of questions and > observations here: > > #1) Why did we not choose to just use the RI? In other words, what > benefit do we get from JBeret that is not also in the RI? In other, > other words, why should we *use* this code instead of the RI at this > point in time? > > #2) Why does JBeret duplicate facilities already present in the WildFly > code base and deployer chain - e.g. annotation indexing, reflection > indexing, thread management, parsing facilities, etc.? > > #3) Specific to algorithmic complexity - it appears that jobs are keyed > by ID, yet accessed using a sequential search [1] - this does not scale > well to large numbers of jobs. Is there no better approach? > > #4) JAXB seems to be being used to parse XML, which is a departure from > all of our other services which expect parsing to be done during > deployment processing in a more efficient manner. Is there any better > way we can integrate this, preferably not using JAXB? > > #5) There are a number of resources present that seem inappropriate for > the production JAR [2] [3]. Is this intentional? > > #6) This code base makes extensive use of static state, including static > fields that seem not to be adequately protected for thread-safety, and > at least one static thread pool [4]. This needs to be fixed, as these > kinds of things make embedding difficult or impossible. > > [1] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > [2] > https://github.com/jberet/jsr352/tree/master/jberet-core/src/main/resourc... > [3] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/resourc... > [4] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Jaikiran Pai

Thursday, 25 July Thu, 25 Jul

6 a.m.

Thank you for outlining the goals. -Jaikiran On Wednesday 24 July 2013 10:39 PM, Cheng Fang wrote:

...

As required by the spec, it supports both Java SE and EE execution environment. As of now, I see the most important purpose is to integrate it into WildFly and fulfill the EE 7n requirement. I also see in the longer term, there are business opportunities to promote it as a standalone distro, just like Spring Batch, and to integrate into the middleware stack in a more flexible way. Thanks, Cheng On 7/24/13 12:15 PM, Jaikiran Pai wrote: > One other question I had is related to where this code lies. I am not > familiar with JBeret project goals or even the spec, but I am not sure > what purpose the code will serve outside of wildfly codebase when most > of it is going to be WildFly integration code (from what I understand). > > -Jaikiran > On Wednesday 24 July 2013 09:39 PM, David M. Lloyd wrote: >> On initial review of JBeret we have noticed a number of issues that need >> to be addressed. The culmination amounts to a series of questions and >> observations here: >> >> #1) Why did we not choose to just use the RI? In other words, what >> benefit do we get from JBeret that is not also in the RI? In other, >> other words, why should we *use* this code instead of the RI at this >> point in time? >> >> #2) Why does JBeret duplicate facilities already present in the WildFly >> code base and deployer chain - e.g. annotation indexing, reflection >> indexing, thread management, parsing facilities, etc.? >> >> #3) Specific to algorithmic complexity - it appears that jobs are keyed >> by ID, yet accessed using a sequential search [1] - this does not scale >> well to large numbers of jobs. Is there no better approach? >> >> #4) JAXB seems to be being used to parse XML, which is a departure from >> all of our other services which expect parsing to be done during >> deployment processing in a more efficient manner. Is there any better >> way we can integrate this, preferably not using JAXB? >> >> #5) There are a number of resources present that seem inappropriate for >> the production JAR [2] [3]. Is this intentional? >> >> #6) This code base makes extensive use of static state, including static >> fields that seem not to be adequately protected for thread-safety, and >> at least one static thread pool [4]. This needs to be fixed, as these >> kinds of things make embedding difficult or impossible. >> >> [1] >> https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... >> [2] >> https://github.com/jberet/jsr352/tree/master/jberet-core/src/main/resourc... >> [3] >> https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/resourc... >> [4] >> https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... >> > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Stuart Douglas

Wednesday, 24 July Wed, 24 Jul

11:52 a.m.

I have also been looking at this today, and there are quite a few things in the code base that worry me about its quality. 1) JdbcRepository seems to save jobs to the database but never seems to actually load them again or remove them? [1] 2) Some things seem to be implemented in a very inefficient manner, using lists when a map or a set would be more appropriate. For example i<https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/ja... AbstractJobRepository all jobs are stored in a list, and as a result every operation on the repo is O(n) on the number of jobs. A map would be a far more suitable data structure here. This will be a real problem is a customer is ever trying to scale to even a moderately sized number of jobs and job instances. 3) Thread safety Almost all objects in the code base are mutable (i.e. no use of final), and with a few exceptions most of the code is not synchronized. From what I can see not much thought has been given to thread safety, and looking through the code I think there are quite a few places where there are the potential to have threading issues. e.g. In [2], where a list that is being modified concurrently is returned to the caller. The caller cannot safely use the list, as it may be modified by another thread as it is being iterated. There are other places where I think there are the potential for races, however I don't know the code well enough to be sure. 4) It looks like it has been designed as a standalone project to be embedded into a deployment, and no thought has been given to how to actually integrate it into Wildfly. I know David already mentioned the statics issue, but this is a big problem. e.g. only one jberet.properties will be loaded, so if two applications have different properties files then one will leak into the other app, depending on the current TCCL when the BatchConfig class is first accessed. Stuart [1] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... [2] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... On Wed, Jul 24, 2013 at 6:09 PM, David M. Lloyd <david.lloyd(a)redhat.com>wrote:

...

Cheng Fang

1:14 p.m.

On 7/24/13 12:52 PM, Stuart Douglas wrote:

...

JdbcRepository is still work in progress, and is currently not used yet.

...

2) Some things seem to be implemented in a very inefficient manner, using lists when a map or a set would be more appropriate. For example i <https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... AbstractJobRepository all jobs are stored in a list, and as a result every operation on the repo is O(n) on the number of jobs. A map would be a far more suitable data structure here. This will be a real problem is a customer is ever trying to scale to even a moderately sized number of jobs and job instances.

See my reply to previous message. Initially I did implement it as a map, but didn't like duplicating id as the key so changed it to list. I don't expect the number of jobs to be that large, or access to jobs to be a hot spot. But I'm open to switch it since the feedback so far has favored a mapping lookup.

...

3) Thread safety Almost all objects in the code base are mutable (i.e. no use of final), and with a few exceptions most of the code is not synchronized. From what I can see not much thought has been given to thread safety, and looking through the code I think there are quite a few places where there are the potential to have threading issues. e.g. In [2], where a list that is being modified concurrently is returned to the caller. The caller cannot safely use the list, as it may be modified by another thread as it is being iterated. There are other places where I think there are the potential for races, however I don't know the code well enough to be sure.

If most of the code is synchronized, I would also be worried;-) . But I agreeed, thread safety is the area we need to look more closely as we integrate to WildFly. In {2], what's your recommendation? to always return a new list the the caller, which seems a bit wasteful.

...

jberet.properties is only for standalone distro. For running in WildFly, all configuration will be included in subsystem configuration. Appreciate all the feedback! Cheng

...

Stuart [1] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... [2] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... On Wed, Jul 24, 2013 at 6:09 PM, David M. Lloyd <david.lloyd(a)redhat.com <mailto:david.lloyd@redhat.com>> wrote: On initial review of JBeret we have noticed a number of issues that need to be addressed. The culmination amounts to a series of questions and observations here: #1) Why did we not choose to just use the RI? In other words, what benefit do we get from JBeret that is not also in the RI? In other, other words, why should we *use* this code instead of the RI at this point in time? #2) Why does JBeret duplicate facilities already present in the WildFly code base and deployer chain - e.g. annotation indexing, reflection indexing, thread management, parsing facilities, etc.? #3) Specific to algorithmic complexity - it appears that jobs are keyed by ID, yet accessed using a sequential search [1] - this does not scale well to large numbers of jobs. Is there no better approach? #4) JAXB seems to be being used to parse XML, which is a departure from all of our other services which expect parsing to be done during deployment processing in a more efficient manner. Is there any better way we can integrate this, preferably not using JAXB? #5) There are a number of resources present that seem inappropriate for the production JAR [2] [3]. Is this intentional? #6) This code base makes extensive use of static state, including static fields that seem not to be adequately protected for thread-safety, and at least one static thread pool [4]. This needs to be fixed, as these kinds of things make embedding difficult or impossible. [1] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... [2] https://github.com/jberet/jsr352/tree/master/jberet-core/src/main/resourc... [3] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/resourc... [4] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... -- - DML _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org <mailto:wildfly-dev@lists.jboss.org> https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Stuart Douglas

4:25 p.m.

On Wed, Jul 24, 2013 at 8:14 PM, Cheng Fang <cfang(a)redhat.com> wrote:

...

On 7/24/13 12:52 PM, Stuart Douglas wrote: I have also been looking at this today, and there are quite a few things in the code base that worry me about its quality. 1) JdbcRepository seems to save jobs to the database but never seems to actually load them again or remove them? [1] JdbcRepository is still work in progress, and is currently not used yet. 2) Some things seem to be implemented in a very inefficient manner, using lists when a map or a set would be more appropriate. For example i<https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/ja... AbstractJobRepository all jobs are stored in a list, and as a result every operation on the repo is O(n) on the number of jobs. A map would be a far more suitable data structure here. This will be a real problem is a customer is ever trying to scale to even a moderately sized number of jobs and job instances. See my reply to previous message. Initially I did implement it as a map, but didn't like duplicating id as the key so changed it to list. I don't expect the number of jobs to be that large, or access to jobs to be a hot spot. But I'm open to switch it since the feedback so far has favored a mapping lookup. 3) Thread safety Almost all objects in the code base are mutable (i.e. no use of final), and with a few exceptions most of the code is not synchronized. From what I can see not much thought has been given to thread safety, and looking through the code I think there are quite a few places where there are the potential to have threading issues. e.g. In [2], where a list that is being modified concurrently is returned to the caller. The caller cannot safely use the list, as it may be modified by another thread as it is being iterated. There are other places where I think there are the potential for races, however I don't know the code well enough to be sure. If most of the code is synchronized, I would also be worried ;-) . But I agreeed, thread safety is the area we need to look more closely as we integrate to WildFly. In {2], what's your recommendation? to always return a new list the the caller, which seems a bit wasteful.

You must either return a new list or use a concurrent list as the backing data structure. Stuart

...

4) It looks like it has been designed as a standalone project to be embedded into a deployment, and no thought has been given to how to actually integrate it into Wildfly. I know David already mentioned the statics issue, but this is a big problem. e.g. only one jberet.properties will be loaded, so if two applications have different properties files then one will leak into the other app, depending on the current TCCL when the BatchConfig class is first accessed. jberet.properties is only for standalone distro. For running in WildFly, all configuration will be included in subsystem configuration. Appreciate all the feedback! Cheng Stuart [1] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... [2] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... On Wed, Jul 24, 2013 at 6:09 PM, David M. Lloyd <david.lloyd(a)redhat.com>wrote: > On initial review of JBeret we have noticed a number of issues that need > to be addressed. The culmination amounts to a series of questions and > observations here: > > #1) Why did we not choose to just use the RI? In other words, what > benefit do we get from JBeret that is not also in the RI? In other, > other words, why should we *use* this code instead of the RI at this > point in time? > > #2) Why does JBeret duplicate facilities already present in the WildFly > code base and deployer chain - e.g. annotation indexing, reflection > indexing, thread management, parsing facilities, etc.? > > #3) Specific to algorithmic complexity - it appears that jobs are keyed > by ID, yet accessed using a sequential search [1] - this does not scale > well to large numbers of jobs. Is there no better approach? > > #4) JAXB seems to be being used to parse XML, which is a departure from > all of our other services which expect parsing to be done during > deployment processing in a more efficient manner. Is there any better > way we can integrate this, preferably not using JAXB? > > #5) There are a number of resources present that seem inappropriate for > the production JAR [2] [3]. Is this intentional? > > #6) This code base makes extensive use of static state, including static > fields that seem not to be adequately protected for thread-safety, and > at least one static thread pool [4]. This needs to be fixed, as these > kinds of things make embedding difficult or impossible. > > [1] > > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > [2] > > https://github.com/jberet/jsr352/tree/master/jberet-core/src/main/resourc... > [3] > > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/resourc... > [4] > > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > > -- > - DML > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev > _______________________________________________ wildfly-dev mailing listwildfly-dev@lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Stuart Douglas

4:34 p.m.

...

wrote:

> > > > On Wed, Jul 24, 2013 at 8:14 PM, Cheng Fang <cfang(a)redhat.com

...

wrote:

> >> >> On 7/24/13 12:52 PM, Stuart Douglas wrote: >> >> I have also been looking at this today, and there are quite a few things >> in the code base that worry me about its quality. >> >> 1) JdbcRepository seems to save jobs to the database but never seems to >> actually load them again or remove them? [1] >> >> JdbcRepository is still work in progress, and is currently not used yet. >> >> >> 2) Some things seem to be implemented in a very inefficient manner, >> using lists when a map or a set would be more appropriate. For example i<https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/ja... >> AbstractJobRepository all jobs are stored in a list, and as a result every >> operation on the repo is O(n) on the number of jobs. A map would be a far >> more suitable data structure here. This will be a real problem is a >> customer is ever trying to scale to even a moderately sized number of jobs >> and job instances. >> >> See my reply to previous message. Initially I did implement it as a map, >> but didn't like duplicating id as the key so changed it to list. I don't >> expect the number of jobs to be that large, or access to jobs to be a hot >> spot. But I'm open to switch it since the feedback so far has favored a >> mapping lookup. >> >> >> 3) Thread safety >> Almost all objects in the code base are mutable (i.e. no use of final), >> and with a few exceptions most of the code is not synchronized. From what I >> can see not much thought has been given to thread safety, and looking >> through the code I think there are quite a few places where there are the >> potential to have threading issues. e.g. In [2], where a list that is being >> modified concurrently is returned to the caller. The caller cannot safely >> use the list, as it may be modified by another thread as it is being >> iterated. There are other places where I think there are the potential for >> races, however I don't know the code well enough to be sure. >> >> If most of the code is synchronized, I would also be worried ;-) . >> But I agreeed, thread safety is the area we need to look more closely as we >> integrate to WildFly. In {2], what's your recommendation? to always return >> a new list the the caller, which seems a bit wasteful. >> > > > You must either return a new list or use a concurrent list as the backing > data structure. > > Stuart > > >> >> >> 4) It looks like it has been designed as a standalone project to be >> embedded into a deployment, and no thought has been given to how to >> actually integrate it into Wildfly. I know David already mentioned the >> statics issue, but this is a big problem. e.g. only one jberet.properties >> will be loaded, so if two applications have different properties files then >> one will leak into the other app, depending on the current TCCL when the >> BatchConfig class is first accessed. >> >> jberet.properties is only for standalone distro. For running in WildFly, >> all configuration will be included in subsystem configuration. >> >> Appreciate all the feedback! >> >> Cheng >> >> >> Stuart >> >> [1] >> https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... >> [2] >> https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... >> >> >> >> On Wed, Jul 24, 2013 at 6:09 PM, David M. Lloyd <david.lloyd(a)redhat.com>wrote: >> >>> On initial review of JBeret we have noticed a number of issues that need >>> to be addressed. The culmination amounts to a series of questions and >>> observations here: >>> >>> #1) Why did we not choose to just use the RI? In other words, what >>> benefit do we get from JBeret that is not also in the RI? In other, >>> other words, why should we *use* this code instead of the RI at this >>> point in time? >>> >>> #2) Why does JBeret duplicate facilities already present in the WildFly >>> code base and deployer chain - e.g. annotation indexing, reflection >>> indexing, thread management, parsing facilities, etc.? >>> >>> #3) Specific to algorithmic complexity - it appears that jobs are keyed >>> by ID, yet accessed using a sequential search [1] - this does not scale >>> well to large numbers of jobs. Is there no better approach? >>> >>> #4) JAXB seems to be being used to parse XML, which is a departure from >>> all of our other services which expect parsing to be done during >>> deployment processing in a more efficient manner. Is there any better >>> way we can integrate this, preferably not using JAXB? >>> >>> #5) There are a number of resources present that seem inappropriate for >>> the production JAR [2] [3]. Is this intentional? >>> >>> #6) This code base makes extensive use of static state, including static >>> fields that seem not to be adequately protected for thread-safety, and >>> at least one static thread pool [4]. This needs to be fixed, as these >>> kinds of things make embedding difficult or impossible. >>> >>> [1] >>> >>> https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... >>> [2] >>> >>> https://github.com/jberet/jsr352/tree/master/jberet-core/src/main/resourc... >>> [3] >>> >>> https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/resourc... >>> [4] >>> >>> https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... >>> >>> -- >>> - DML >>> _______________________________________________ >>> wildfly-dev mailing list >>> wildfly-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>> >> >> >> >> _______________________________________________ >> wildfly-dev mailing listwildfly-dev@lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/wildfly-dev >> >> >> >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev >> >> >

Cheng Fang

6:46 p.m.

JBeret currently only supports in-VM multi-threaded job executions. We are aware of this requirement and is planned in future iterations, at least after it's well integrated into WildFly. I'm glad you brought up that point. Cheng On 7/24/13 5:34 PM, Stuart Douglas wrote:

...

Something else I thought I should ask, has any thought been given to how this would work in a clustered environment? I would assume that most customers that would want this would also want some form of HA for the jobs, if a single node goes down you would not want all you batch jobs to grind to a halt. Stuart On Wed, Jul 24, 2013 at 11:25 PM, Stuart Douglas <stuart.w.douglas(a)gmail.com <mailto:stuart.w.douglas@gmail.com>> wrote: On Wed, Jul 24, 2013 at 8:14 PM, Cheng Fang <cfang(a)redhat.com <mailto:cfang@redhat.com>> wrote: On 7/24/13 12:52 PM, Stuart Douglas wrote: > I have also been looking at this today, and there are quite a > few things in the code base that worry me about its quality. > > 1) JdbcRepository seems to save jobs to the database but > never seems to actually load them again or remove them? [1] JdbcRepository is still work in progress, and is currently not used yet. > > 2) Some things seem to be implemented in a very inefficient > manner, using lists when a map or a set would be more > appropriate. For example i > <https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > AbstractJobRepository all jobs are stored in a list, and as a > result every operation on the repo is O(n) on the number of > jobs. A map would be a far more suitable data structure here. > This will be a real problem is a customer is ever trying to > scale to even a moderately sized number of jobs and job > instances. See my reply to previous message. Initially I did implement it as a map, but didn't like duplicating id as the key so changed it to list. I don't expect the number of jobs to be that large, or access to jobs to be a hot spot. But I'm open to switch it since the feedback so far has favored a mapping lookup. > > 3) Thread safety > Almost all objects in the code base are mutable (i.e. no use > of final), and with a few exceptions most of the code is not > synchronized. From what I can see not much thought has been > given to thread safety, and looking through the code I think > there are quite a few places where there are the potential to > have threading issues. e.g. In [2], where a list that is > being modified concurrently is returned to the caller. The > caller cannot safely use the list, as it may be modified by > another thread as it is being iterated. There are other > places where I think there are the potential for races, > however I don't know the code well enough to be sure. If most of the code is synchronized, I would also be worried;-) . But I agreeed, thread safety is the area we need to look more closely as we integrate to WildFly. In {2], what's your recommendation? to always return a new list the the caller, which seems a bit wasteful. You must either return a new list or use a concurrent list as the backing data structure. Stuart > > 4) It looks like it has been designed as a standalone project > to be embedded into a deployment, and no thought has been > given to how to actually integrate it into Wildfly. I know > David already mentioned the statics issue, but this is a big > problem. e.g. only one jberet.properties will be loaded, so > if two applications have different properties files then one > will leak into the other app, depending on the current TCCL > when the BatchConfig class is first accessed. jberet.properties is only for standalone distro. For running in WildFly, all configuration will be included in subsystem configuration. Appreciate all the feedback! Cheng > > Stuart > > [1] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > [2] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > > > > On Wed, Jul 24, 2013 at 6:09 PM, David M. Lloyd > <david.lloyd(a)redhat.com <mailto:david.lloyd@redhat.com>> wrote: > > On initial review of JBeret we have noticed a number of > issues that need > to be addressed. The culmination amounts to a series of > questions and > observations here: > > #1) Why did we not choose to just use the RI? In other > words, what > benefit do we get from JBeret that is not also in the RI? > In other, > other words, why should we *use* this code instead of the > RI at this > point in time? > > #2) Why does JBeret duplicate facilities already present > in the WildFly > code base and deployer chain - e.g. annotation indexing, > reflection > indexing, thread management, parsing facilities, etc.? > > #3) Specific to algorithmic complexity - it appears that > jobs are keyed > by ID, yet accessed using a sequential search [1] - this > does not scale > well to large numbers of jobs. Is there no better approach? > > #4) JAXB seems to be being used to parse XML, which is a > departure from > all of our other services which expect parsing to be done > during > deployment processing in a more efficient manner. Is > there any better > way we can integrate this, preferably not using JAXB? > > #5) There are a number of resources present that seem > inappropriate for > the production JAR [2] [3]. Is this intentional? > > #6) This code base makes extensive use of static state, > including static > fields that seem not to be adequately protected for > thread-safety, and > at least one static thread pool [4]. This needs to be > fixed, as these > kinds of things make embedding difficult or impossible. > > [1] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > [2] > https://github.com/jberet/jsr352/tree/master/jberet-core/src/main/resourc... > [3] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/resourc... > [4] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > > -- > - DML > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > <mailto:wildfly-dev@lists.jboss.org> > https://lists.jboss.org/mailman/listinfo/wildfly-dev > > > > > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org <mailto:wildfly-dev@lists.jboss.org> > https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org <mailto:wildfly-dev@lists.jboss.org> https://lists.jboss.org/mailman/listinfo/wildfly-dev

Bruno Georges

Thursday, 25 July Thu, 25 Jul

5:56 a.m.

Good point. I met recently large customer who not only need to run in a clustered environment but also large number of tasks. They are currently using Spring batch and would prefer to go with what JBoss can ship and support. They like the idea of using standards rather than depending on non standards impl /libraries. Sent from my iPhone On 25 Jul, 2013, at 5:35, Stuart Douglas <stuart.w.douglas(a)gmail.com> wrote:

...

Something else I thought I should ask, has any thought been given to how this would work in a clustered environment? I would assume that most customers that would want this would also want some form of HA for the jobs, if a single node goes down you would not want all you batch jobs to grind to a halt. Stuart On Wed, Jul 24, 2013 at 11:25 PM, Stuart Douglas <stuart.w.douglas(a)gmail.com> wrote: > > > > On Wed, Jul 24, 2013 at 8:14 PM, Cheng Fang <cfang(a)redhat.com> wrote: >> >> On 7/24/13 12:52 PM, Stuart Douglas wrote: >>> I have also been looking at this today, and there are quite a few things in the code base that worry me about its quality. >>> >>> 1) JdbcRepository seems to save jobs to the database but never seems to actually load them again or remove them? [1] >> JdbcRepository is still work in progress, and is currently not used yet. >> >>> >>> 2) Some things seem to be implemented in a very inefficient manner, using lists when a map or a set would be more appropriate. For example in AbstractJobRepository all jobs are stored in a list, and as a result every operation on the repo is O(n) on the number of jobs. A map would be a far more suitable data structure here. This will be a real problem is a customer is ever trying to scale to even a moderately sized number of jobs and job instances. >> See my reply to previous message. Initially I did implement it as a map, but didn't like duplicating id as the key so changed it to list. I don't expect the number of jobs to be that large, or access to jobs to be a hot spot. But I'm open to switch it since the feedback so far has favored a mapping lookup. >> >>> >>> 3) Thread safety >>> Almost all objects in the code base are mutable (i.e. no use of final), and with a few exceptions most of the code is not synchronized. From what I can see not much thought has been given to thread safety, and looking through the code I think there are quite a few places where there are the potential to have threading issues. e.g. In [2], where a list that is being modified concurrently is returned to the caller. The caller cannot safely use the list, as it may be modified by another thread as it is being iterated. There are other places where I think there are the potential for races, however I don't know the code well enough to be sure. >> If most of the code is synchronized, I would also be worried ;-) . But I agreeed, thread safety is the area we need to look more closely as we integrate to WildFly. In {2], what's your recommendation? to always return a new list the the caller, which seems a bit wasteful. > > > You must either return a new list or use a concurrent list as the backing data structure. > > Stuart > >> >>> >>> 4) It looks like it has been designed as a standalone project to be embedded into a deployment, and no thought has been given to how to actually integrate it into Wildfly. I know David already mentioned the statics issue, but this is a big problem. e.g. only one jberet.properties will be loaded, so if two applications have different properties files then one will leak into the other app, depending on the current TCCL when the BatchConfig class is first accessed. >> jberet.properties is only for standalone distro. For running in WildFly, all configuration will be included in subsystem configuration. >> >> Appreciate all the feedback! >> >> Cheng >> >>> >>> Stuart >>> >>> [1] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... >>> [2] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... >>> >>> >>> >>> On Wed, Jul 24, 2013 at 6:09 PM, David M. Lloyd <david.lloyd(a)redhat.com> wrote: >>>> On initial review of JBeret we have noticed a number of issues that need >>>> to be addressed. The culmination amounts to a series of questions and >>>> observations here: >>>> >>>> #1) Why did we not choose to just use the RI? In other words, what >>>> benefit do we get from JBeret that is not also in the RI? In other, >>>> other words, why should we *use* this code instead of the RI at this >>>> point in time? >>>> >>>> #2) Why does JBeret duplicate facilities already present in the WildFly >>>> code base and deployer chain - e.g. annotation indexing, reflection >>>> indexing, thread management, parsing facilities, etc.? >>>> >>>> #3) Specific to algorithmic complexity - it appears that jobs are keyed >>>> by ID, yet accessed using a sequential search [1] - this does not scale >>>> well to large numbers of jobs. Is there no better approach? >>>> >>>> #4) JAXB seems to be being used to parse XML, which is a departure from >>>> all of our other services which expect parsing to be done during >>>> deployment processing in a more efficient manner. Is there any better >>>> way we can integrate this, preferably not using JAXB? >>>> >>>> #5) There are a number of resources present that seem inappropriate for >>>> the production JAR [2] [3]. Is this intentional? >>>> >>>> #6) This code base makes extensive use of static state, including static >>>> fields that seem not to be adequately protected for thread-safety, and >>>> at least one static thread pool [4]. This needs to be fixed, as these >>>> kinds of things make embedding difficult or impossible. >>>> >>>> [1] >>>> https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... >>>> [2] >>>> https://github.com/jberet/jsr352/tree/master/jberet-core/src/main/resourc... >>>> [3] >>>> https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/resourc... >>>> [4] >>>> https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... >>>> >>>> -- >>>> - DML >>>> _______________________________________________ >>>> wildfly-dev mailing list >>>> wildfly-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >>> >>> >>> >>> _______________________________________________ >>> wildfly-dev mailing list >>> wildfly-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/wildfly-dev >> >> >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Cheng Fang

Wednesday, 24 July Wed, 24 Jul

12:05 p.m.

David, Thanks for sharing your comments and observations. More inline... On 7/24/13 12:09 PM, David M. Lloyd wrote:

...

Batch RI (http://java.net/projects/jbatch from IBM) was created solely for the purpose of a reference implementation, and is a subset of IBM's batch offering. The RI code base is refreshed periodically by IBM contributors and it doesn't seem to open to community contribution. I haven't done a deep technical comparison between the 2 yet, but I guess there are areas that one is better than the other and vise versa. Looking a bit longer term, batch has been an area Java EE and JBoss haven't paid much attention to, and I believe is an area that can offer future growth potential. Having our own impl would give us more flexibility when it comes to integration with the rest of the stack, design choices, and community building. I'm also adding Kev and Pete for their perspectives.

...

#2) Why does JBeret duplicate facilities already present in the WildFly code base and deployer chain - e.g. annotation indexing, reflection indexing, thread management, parsing facilities, etc.?

Batch spec require an impl to be run in either Java EE or Java SE environment. So inevitably certain services have to reside in JBeret itself to satisfy the SE runtime. Since we started the impl as a standalone first, there may be certain aspects that do not fit nicely in WildFly. It is in the plan to better align with the appserver by leveraging existing services when running inside WildFly. For example, use the concurrency utils in EE.

...

#3) Specific to algorithmic complexity - it appears that jobs are keyed by ID, yet accessed using a sequential search [1] - this does not scale well to large numbers of jobs. Is there no better approach?

The expectation is there is large amount of data, but the number of jobs are not that large. Say we run a reporting job every day, it is still one single job with many JobInstance and JobExecution. So I think the sequential access is acceptable. I guess another reason I didn't want to maintain a mapping is I really don't want to duplicate the job id as the key.

...

#4) JAXB seems to be being used to parse XML, which is a departure from all of our other services which expect parsing to be done during deployment processing in a more efficient manner. Is there any better way we can integrate this, preferably not using JAXB?

It works well so far in standalone distro, but I'm open to alternative mechanism in either standalone or EE.

...

#5) There are a number of resources present that seem inappropriate for the production JAR [2] [3]. Is this intentional?

These are work in progress. sql files are for implementing a jdbc job repository. Why are they inappropriate?

...

#6) This code base makes extensive use of static state, including static fields that seem not to be adequately protected for thread-safety, and at least one static thread pool [4]. This needs to be fixed, as these kinds of things make embedding difficult or impossible.

In EE environment, thread pool will switch to the managed service provided by WildFly, preferably the new concurrency utils. Can you list other places you've noticed that make bad use of static state? Appreciate your feedback. Cheng

...

[1] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... [2] https://github.com/jberet/jsr352/tree/master/jberet-core/src/main/resourc... [3] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/resourc... [4] https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or...

Pete Muir

12:16 p.m.

On 24 Jul 2013, at 18:05, Cheng Fang <cfang(a)redhat.com> wrote:

...

Thanks for sharing your comments and observations. More inline... On 7/24/13 12:09 PM, David M. Lloyd wrote: > On initial review of JBeret we have noticed a number of issues that need to be addressed. The culmination amounts to a series of questions and observations here: > > #1) Why did we not choose to just use the RI? In other words, what benefit do we get from JBeret that is not also in the RI? In other, other words, why should we *use* this code instead of the RI at this point in time? Batch RI (http://java.net/projects/jbatch from IBM) was created solely for the purpose of a reference implementation, and is a subset of IBM's batch offering. The RI code base is refreshed periodically by IBM contributors and it doesn't seem to open to community contribution. I haven't done a deep technical comparison between the 2 yet, but I guess there are areas that one is better than the other and vise versa. Looking a bit longer term, batch has been an area Java EE and JBoss haven't paid much attention to, and I believe is an area that can offer future growth potential. Having our own impl would give us more flexibility when it comes to integration with the rest of the stack, design choices, and community building. I'm also adding Kev and Pete for their perspectives.

IIRC Jason G was keen that we build a batch impl, rather than reuse the RI. I can't remember his reasoning.

Jason Greene

12:37 p.m.

On Jul 24, 2013, at 12:16 PM, Pete Muir <pmuir(a)redhat.com> wrote:

...

On 24 Jul 2013, at 18:05, Cheng Fang <cfang(a)redhat.com> wrote: > Thanks for sharing your comments and observations. More inline... > On 7/24/13 12:09 PM, David M. Lloyd wrote: >> On initial review of JBeret we have noticed a number of issues that need to be addressed. The culmination amounts to a series of questions and observations here: >> >> #1) Why did we not choose to just use the RI? In other words, what benefit do we get from JBeret that is not also in the RI? In other, other words, why should we *use* this code instead of the RI at this point in time? > Batch RI (http://java.net/projects/jbatch from IBM) was created solely for the purpose of a reference implementation, and is a subset of IBM's batch offering. The RI code base is refreshed periodically by IBM contributors and it doesn't seem to open to community contribution. I haven't done a deep technical comparison between the 2 yet, but I guess there are areas that one is better than the other and vise versa. Looking a bit longer term, batch has been an area Java EE and JBoss haven't paid much attention to, and I believe is an area that can offer future growth potential. Having our own impl would give us more flexibility when it comes to integration with the rest of the stack, design choices, and community building. I'm also adding Kev and Pete for their perspectives. IIRC Jason G was keen that we build a batch impl, rather than reuse the RI. I can't remember his reasoning.

It wasn't me, but I do remember someone advocating this approach. My general opinion on this is that we have to weigh the pro/cons in each scenario. There are a number of cases when I advocate our own implementation, which is usually around licensing, dependencies, or we think we can have a competitive advantage that users will see. -- Jason T. Greene WildFly Lead / JBoss EAP Platform Architect JBoss, a division of Red Hat

Andrig Miller

12:42 p.m.

----- Original Message -----

...

From: "Cheng Fang" <cfang(a)redhat.com> To: wildfly-dev(a)lists.jboss.org, "Kevin Conner" <kconner(a)redhat.com>, "Pete Muir" <pmuir(a)redhat.com> Sent: Wednesday, July 24, 2013 11:05:36 AM Subject: Re: [wildfly-dev] WFLY-508, JBeret initial review and integration issues David, Thanks for sharing your comments and observations. More inline... On 7/24/13 12:09 PM, David M. Lloyd wrote: > On initial review of JBeret we have noticed a number of issues that > need to be addressed. The culmination amounts to a series of > questions and observations here: > > #1) Why did we not choose to just use the RI? In other words, what > benefit do we get from JBeret that is not also in the RI? In > other, > other words, why should we *use* this code instead of the RI at > this > point in time? Batch RI (http://java.net/projects/jbatch from IBM) was created solely for the purpose of a reference implementation, and is a subset of IBM's batch offering. The RI code base is refreshed periodically by IBM contributors and it doesn't seem to open to community contribution. I haven't done a deep technical comparison between the 2 yet, but I guess there are areas that one is better than the other and vise versa. Looking a bit longer term, batch has been an area Java EE and JBoss haven't paid much attention to, and I believe is an area that can offer future growth potential. Having our own impl would give us more flexibility when it comes to integration with the rest of the stack, design choices, and community building. I'm also adding Kev and Pete for their perspectives. > > #2) Why does JBeret duplicate facilities already present in the > WildFly code base and deployer chain - e.g. annotation indexing, > reflection indexing, thread management, parsing facilities, etc.? Batch spec require an impl to be run in either Java EE or Java SE environment. So inevitably certain services have to reside in JBeret itself to satisfy the SE runtime. Since we started the impl as a standalone first, there may be certain aspects that do not fit nicely in WildFly. It is in the plan to better align with the appserver by leveraging existing services when running inside WildFly. For example, use the concurrency utils in EE. > > #3) Specific to algorithmic complexity - it appears that jobs are > keyed by ID, yet accessed using a sequential search [1] - this does > not scale well to large numbers of jobs. Is there no better > approach? The expectation is there is large amount of data, but the number of jobs are not that large. Say we run a reporting job every day, it is still one single job with many JobInstance and JobExecution. So I think the sequential access is acceptable. I guess another reason I didn't want to maintain a mapping is I really don't want to duplicate the job id as the key.

I think the assumption that there will be a small number of jobs, with a large amount of data is a bad one. I did large scale batch applications when I worked at Sprint, and the number of jobs was very large indeed. Many thousands, in fact. I Java EE batch is going to be anything more than a toy, then this is fine, but if customers try to adopt this under this current design, I think they will quickly give up on it. Andy

...

> > #4) JAXB seems to be being used to parse XML, which is a departure > from all of our other services which expect parsing to be done > during > deployment processing in a more efficient manner. Is there any > better > way we can integrate this, preferably not using JAXB? It works well so far in standalone distro, but I'm open to alternative mechanism in either standalone or EE. > > #5) There are a number of resources present that seem inappropriate > for the production JAR [2] [3]. Is this intentional? These are work in progress. sql files are for implementing a jdbc job repository. Why are they inappropriate? > > #6) This code base makes extensive use of static state, including > static fields that seem not to be adequately protected for > thread-safety, and at least one static thread pool [4]. This needs > to > be fixed, as these kinds of things make embedding difficult or > impossible. In EE environment, thread pool will switch to the managed service provided by WildFly, preferably the new concurrency utils. Can you list other places you've noticed that make bad use of static state? Appreciate your feedback. Cheng > > [1] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > [2] > https://github.com/jberet/jsr352/tree/master/jberet-core/src/main/resourc... > [3] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/resourc... > [4] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Jason Greene

12:46 p.m.

On Jul 24, 2013, at 12:05 PM, Cheng Fang <cfang(a)redhat.com> wrote:

...

There are essentially 3 issues with JAXB: 1) It hurts either deployment time or startup time 2) It is very poor at handling versioned descriptors 3) It is not effective at reporting the location of syntax errors We prefer StAX or SAX based parsers. They are harder to write initially, as they require lots of boilerplate. However they don't have the above problems, and maintaining them is not very difficult. -- Jason T. Greene WildFly Lead / JBoss EAP Platform Architect JBoss, a division of Red Hat

Kevin Conner

12:49 p.m.

On 24/07/2013 10:05, Cheng Fang wrote:

...

Batch RI (http://java.net/projects/jbatch from IBM) was created solely for the purpose of a reference implementation, and is a subset of IBM's batch offering.

The Batch RI did not exist when we started the jBeret project. The batch RI has always been developed internally within IBM with code drops as and when they thought was appropriate (not necessarily when promised). IBM have no obligation to support its development outside of what is needed to satisfy the JSR requirements and it has not been run as a pure open source project. Kev -- JBoss by Red Hat

David M. Lloyd

1:16 p.m.

On 07/24/2013 12:49 PM, Kevin Conner wrote:

...

On 24/07/2013 10:05, Cheng Fang wrote: > Batch RI (http://java.net/projects/jbatch from IBM) was created solely > for the purpose of a reference implementation, and is a subset of IBM's > batch offering. The Batch RI did not exist when we started the jBeret project. The batch RI has always been developed internally within IBM with code drops as and when they thought was appropriate (not necessarily when promised). IBM have no obligation to support its development outside of what is needed to satisfy the JSR requirements and it has not been run as a pure open source project.

That is good information, but the question is mainly around choices we make today as a result of this review. If there are problems with the proposed implementation, the choice is to either fix it or use another, and the criteria which feed into this choice are limited specifically to implementation quality (i.e. technical details) and supportability, which derives directly from implementation quality (by us, not relating to any other parties involved; e.g. we don't really care who "supports", say, commons-beanutils in practice because we "support" it in WildFly). -- - DML

Jason Greene

1:38 p.m.

On Jul 24, 2013, at 1:16 PM, "David M. Lloyd" <david.lloyd(a)redhat.com> wrote:

...

On 07/24/2013 12:49 PM, Kevin Conner wrote: > On 24/07/2013 10:05, Cheng Fang wrote: >> Batch RI (http://java.net/projects/jbatch from IBM) was created solely >> for the purpose of a reference implementation, and is a subset of IBM's >> batch offering. > > The Batch RI did not exist when we started the jBeret project. The > batch RI has always been developed internally within IBM with code drops > as and when they thought was appropriate (not necessarily when > promised). IBM have no obligation to support its development outside of > what is needed to satisfy the JSR requirements and it has not been run > as a pure open source project. That is good information, but the question is mainly around choices we make today as a result of this review. If there are problems with the proposed implementation, the choice is to either fix it or use another, and the criteria which feed into this choice are limited specifically to implementation quality (i.e. technical details) and supportability, which derives directly from implementation quality (by us, not relating to any other parties involved; e.g. we don't really care who "supports", say, commons-beanutils in practice because we "support" it in WildFly).

Looking at the RI, I see a number of problems there as well. -- Jason T. Greene WildFly Lead / JBoss EAP Platform Architect JBoss, a division of Red Hat

Jason Greene

12:52 p.m.

On Jul 24, 2013, at 12:05 PM, Cheng Fang <cfang(a)redhat.com> wrote:

...

> #2) Why does JBeret duplicate facilities already present in the > WildFly code base and deployer chain - e.g. annotation indexing, > reflection indexing, thread management, parsing facilities, etc.? Batch spec require an impl to be run in either Java EE or Java SE environment. So inevitably certain services have to reside in JBeret itself to satisfy the SE runtime. Since we started the impl as a standalone first, there may be certain aspects that do not fit nicely in WildFly. It is in the plan to better align with the appserver by leveraging existing services when running inside WildFly. For example, use the concurrency utils in EE.

Yeah it would be nice if we could abstract that bit. -- Jason T. Greene WildFly Lead / JBoss EAP Platform Architect JBoss, a division of Red Hat

Stuart Douglas

4:22 p.m.

On Wed, Jul 24, 2013 at 7:05 PM, Cheng Fang <cfang(a)redhat.com> wrote:

...

David, Thanks for sharing your comments and observations. More inline... On 7/24/13 12:09 PM, David M. Lloyd wrote: > On initial review of JBeret we have noticed a number of issues that > need to be addressed. The culmination amounts to a series of > questions and observations here: > > #1) Why did we not choose to just use the RI? In other words, what > benefit do we get from JBeret that is not also in the RI? In other, > other words, why should we *use* this code instead of the RI at this > point in time? Batch RI (http://java.net/projects/jbatch from IBM) was created solely for the purpose of a reference implementation, and is a subset of IBM's batch offering. The RI code base is refreshed periodically by IBM contributors and it doesn't seem to open to community contribution. I haven't done a deep technical comparison between the 2 yet, but I guess there are areas that one is better than the other and vise versa. Looking a bit longer term, batch has been an area Java EE and JBoss haven't paid much attention to, and I believe is an area that can offer future growth potential. Having our own impl would give us more flexibility when it comes to integration with the rest of the stack, design choices, and community building. I'm also adding Kev and Pete for their perspectives. > > #2) Why does JBeret duplicate facilities already present in the > WildFly code base and deployer chain - e.g. annotation indexing, > reflection indexing, thread management, parsing facilities, etc.? Batch spec require an impl to be run in either Java EE or Java SE environment. So inevitably certain services have to reside in JBeret itself to satisfy the SE runtime. Since we started the impl as a standalone first, there may be certain aspects that do not fit nicely in WildFly. It is in the plan to better align with the appserver by leveraging existing services when running inside WildFly. For example, use the concurrency utils in EE.

Where does the spec say this? From a Wildfly point of view we should only need the Java EE implementation, it is only if you want to promote JBeret as a standalone JSR-352 implementation that this will be an issue. Either way, in order to make this work properly with wildfly it needs some kind of bootstrap SPI. For the Java SE impl just just provide another jar that implements the SPI but handles scanning and parsing etc in a standalone manner. A really good example of this is Weld, which provides a SPI that Weld-SE implements for Java SE support. If you design this SPI correctly you should no longer need 1 maven artifact per test, it should be possible to bootstrap the JBeret implementation with different data each time, run the test, and then shut it down.

...

> > #3) Specific to algorithmic complexity - it appears that jobs are > keyed by ID, yet accessed using a sequential search [1] - this does > not scale well to large numbers of jobs. Is there no better approach? The expectation is there is large amount of data, but the number of jobs are not that large. Say we run a reporting job every day, it is still one single job with many JobInstance and JobExecution. So I think the sequential access is acceptable. I guess another reason I didn't want to maintain a mapping is I really don't want to duplicate the job id as the key.

I'm not sure what you mean by " I really don't want to duplicate the job id as the key".

...

> > #4) JAXB seems to be being used to parse XML, which is a departure > from all of our other services which expect parsing to be done during > deployment processing in a more efficient manner. Is there any better > way we can integrate this, preferably not using JAXB? It works well so far in standalone distro, but I'm open to alternative mechanism in either standalone or EE. > > #5) There are a number of resources present that seem inappropriate > for the production JAR [2] [3]. Is this intentional? These are work in progress. sql files are for implementing a jdbc job repository. Why are they inappropriate? > > #6) This code base makes extensive use of static state, including > static fields that seem not to be adequately protected for > thread-safety, and at least one static thread pool [4]. This needs to > be fixed, as these kinds of things make embedding difficult or > impossible. In EE environment, thread pool will switch to the managed service provided by WildFly, preferably the new concurrency utils. Can you list other places you've noticed that make bad use of static state?

org.jberet.repository.InMemoryRepository.Holder#instance looks like another problematic one, as it means that there is only ever one in memory repository, so jobs will be shared across all deployments. Also org.jberet.util.BatchUtil#executorService which does not look like it is used. Stuart

...

Appreciate your feedback. Cheng > > [1] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > [2] > https://github.com/jberet/jsr352/tree/master/jberet-core/src/main/resourc... > [3] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/resourc... > [4] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Cheng Fang

7:06 p.m.

On 7/24/13 5:22 PM, Stuart Douglas wrote:

...

On Wed, Jul 24, 2013 at 7:05 PM, Cheng Fang <cfang(a)redhat.com <mailto:cfang@redhat.com>> wrote: David, Thanks for sharing your comments and observations. More inline... On 7/24/13 12:09 PM, David M. Lloyd wrote: > On initial review of JBeret we have noticed a number of issues that > need to be addressed. The culmination amounts to a series of > questions and observations here: > > #1) Why did we not choose to just use the RI? In other words, what > benefit do we get from JBeret that is not also in the RI? In other, > other words, why should we *use* this code instead of the RI at this > point in time? Batch RI (http://java.net/projects/jbatch from IBM) was created solely for the purpose of a reference implementation, and is a subset of IBM's batch offering. The RI code base is refreshed periodically by IBM contributors and it doesn't seem to open to community contribution. I haven't done a deep technical comparison between the 2 yet, but I guess there are areas that one is better than the other and vise versa. Looking a bit longer term, batch has been an area Java EE and JBoss haven't paid much attention to, and I believe is an area that can offer future growth potential. Having our own impl would give us more flexibility when it comes to integration with the rest of the stack, design choices, and community building. I'm also adding Kev and Pete for their perspectives. > > #2) Why does JBeret duplicate facilities already present in the > WildFly code base and deployer chain - e.g. annotation indexing, > reflection indexing, thread management, parsing facilities, etc.? Batch spec require an impl to be run in either Java EE or Java SE environment. So inevitably certain services have to reside in JBeret itself to satisfy the SE runtime. Since we started the impl as a standalone first, there may be certain aspects that do not fit nicely in WildFly. It is in the plan to better align with the appserver by leveraging existing services when running inside WildFly. For example, use the concurrency utils in EE. Where does the spec say this? From a Wildfly point of view we should only need the Java EE implementation, it is only if you want to promote JBeret as a standalone JSR-352 implementation that this will be an issue. Either way, in order to make this work properly with wildfly it needs some kind of bootstrap SPI. For the Java SE impl just just provide another jar that implements the SPI but handles scanning and parsing etc in a standalone manner. A really good example of this is Weld, which provides a SPI that Weld-SE implements for Java SE support. If you design this SPI correctly you should no longer need 1 maven artifact per test, it should be possible to bootstrap the JBeret implementation with different data each time, run the test, and then shut it down.

Agreed. Some SPI is needed to abstract out the difference between SE and EE. We do have tests that contain multiple jobs that each can be started individually. Batch spec defines certain batch config file scoped to the whole app or deployment, and so for those tests we organize them into separate test projects. It's mainly a matter of test organization not related to implementation.

...

The map key is already contained in the associated map value, IOW, using a field of the value as map key.

...

BatchUtil.executorService is a leftover after moving concurrency related code to its own class. Yes, I will clean it up. Batch job repository is supposed to be global, accessible from all deployments. In production environment, a database-backed job repository is typically used, especially considering clustered deployment. Thanks, Cheng

...

Stuart Appreciate your feedback. Cheng > > [1] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > [2] > https://github.com/jberet/jsr352/tree/master/jberet-core/src/main/resourc... > [3] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/resourc... > [4] > https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/or... > _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org <mailto:wildfly-dev@lists.jboss.org> https://lists.jboss.org/mailman/listinfo/wildfly-dev

Stuart Douglas

Thursday, 25 July Thu, 25 Jul

9:02 a.m.

...

> > > > #3) Specific to algorithmic complexity - it appears that jobs are > > keyed by ID, yet accessed using a sequential search [1] - this does > > not scale well to large numbers of jobs. Is there no better approach? > The expectation is there is large amount of data, but the number of jobs > are not that large. Say we run a reporting job every day, it is still > one single job with many JobInstance and JobExecution. So I think the > sequential access is acceptable. I guess another reason I didn't want > to maintain a mapping is I really don't want to duplicate the job id as > the key. > I'm not sure what you mean by " I really don't want to duplicate the job id as the key". The map key is already contained in the associated map value, IOW, using a field of the value as map key.

I'm still not sure what you mean. Putting an item into a map does not make a copy of the key. Stuart

4633

days inactive

4634

days old

wildfly-dev@lists.jboss.org

Manage subscription

21 comments

9 participants

tags (0)

participants (9)

Andrig Miller
Bruno Georges
Cheng Fang
David M. Lloyd
Jaikiran Pai
Jason Greene
Kevin Conner
Pete Muir
Stuart Douglas

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

WFLY-508, JBeret initial review and integration issues