Something else I thought I should ask, has any thought been given to how this would work in a clustered environment? 

I would assume that most customers that would want this would also want some form of HA for the jobs, if a single node goes down you would not want all you batch jobs to grind to a halt. 

Stuart


On Wed, Jul 24, 2013 at 11:25 PM, Stuart Douglas <stuart.w.douglas@gmail.com> wrote:



On Wed, Jul 24, 2013 at 8:14 PM, Cheng Fang <cfang@redhat.com> wrote:

On 7/24/13 12:52 PM, Stuart Douglas wrote:
I have also been looking at this today, and there are quite a few things in the code base that worry me about its quality.

1) JdbcRepository seems to save jobs to the database but never seems to actually load them again or remove them? [1]
JdbcRepository is still work in progress, and is currently not used yet.


2) Some things seem to be implemented in a very inefficient manner, using lists when a map or a set would be more appropriate. For example in AbstractJobRepository all jobs are stored in a list, and as a result every operation on the repo is O(n) on the number of jobs. A map would be a far more suitable data structure here. This will be a real problem is a customer is ever trying to scale to even a moderately sized number of jobs and job instances.
See my reply to previous message.  Initially I did implement it as a map, but didn't like duplicating id as the key so changed it to list.  I don't expect the number of jobs to be that large, or access to jobs to be a hot spot.  But I'm open to switch it since the feedback so far has favored a mapping lookup.


3) Thread safety
Almost all objects in the code base are mutable (i.e. no use of final), and with a few exceptions most of the code is not synchronized. From what I can see not much thought has been given to thread safety, and looking through the code I think there are quite a few places where there are the potential to have threading issues. e.g. In [2], where a list that is being modified concurrently is returned to the caller. The caller cannot safely use the list, as it may be modified by another thread as it is being iterated. There are other places where I think there are the potential for races, however I don't know the code well enough to be sure.
If most of the code is synchronized, I would also be worried ;-)     .  But I agreeed, thread safety is the area we need to look more closely as we integrate to WildFly.  In {2], what's your recommendation? to always return a new list the the caller, which seems a bit wasteful.


You must either return a new list or use a concurrent list as the backing data structure.

Stuart
 


4) It looks like it has been designed as a standalone project to be embedded into a deployment, and no thought has been given to how to actually integrate it into Wildfly. I know David already mentioned the statics issue, but this is a big problem. e.g. only one jberet.properties will be loaded, so if two applications have different properties files then one will leak into the other app, depending on the current TCCL when the BatchConfig class is first accessed.
jberet.properties is only for standalone distro.  For running in WildFly, all configuration will be included in subsystem configuration.

Appreciate all the feedback!

Cheng



On Wed, Jul 24, 2013 at 6:09 PM, David M. Lloyd <david.lloyd@redhat.com> wrote:
On initial review of JBeret we have noticed a number of issues that need
to be addressed.  The culmination amounts to a series of questions and
observations here:

#1) Why did we not choose to just use the RI?  In other words, what
benefit do we get from JBeret that is not also in the RI?  In other,
other words, why should we *use* this code instead of the RI at this
point in time?

#2) Why does JBeret duplicate facilities already present in the WildFly
code base and deployer chain - e.g. annotation indexing, reflection
indexing, thread management, parsing facilities, etc.?

#3) Specific to algorithmic complexity - it appears that jobs are keyed
by ID, yet accessed using a sequential search [1] - this does not scale
well to large numbers of jobs.  Is there no better approach?

#4) JAXB seems to be being used to parse XML, which is a departure from
all of our other services which expect parsing to be done during
deployment processing in a more efficient manner.  Is there any better
way we can integrate this, preferably not using JAXB?

#5) There are a number of resources present that seem inappropriate for
the production JAR [2] [3].  Is this intentional?

#6) This code base makes extensive use of static state, including static
fields that seem not to be adequately protected for thread-safety, and
at least one static thread pool [4].  This needs to be fixed, as these
kinds of things make embedding difficult or impossible.

[1]
https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/org/jberet/repository/AbstractRepository.java#L50
[2]
https://github.com/jberet/jsr352/tree/master/jberet-core/src/main/resources/sql
[3]
https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/resources/jobXML.xjb
[4]
https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/org/jberet/util/ConcurrencyService.java#L22

--
- DML
_______________________________________________
wildfly-dev mailing list
wildfly-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/wildfly-dev



_______________________________________________
wildfly-dev mailing list
wildfly-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/wildfly-dev


_______________________________________________
wildfly-dev mailing list
wildfly-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/wildfly-dev