David,
Thanks for sharing your comments and observations. More inline...
On 7/24/13 12:09 PM, David M. Lloyd wrote:
On initial review of JBeret we have noticed a number of issues that
need to be addressed. The culmination amounts to a series of
questions and observations here:
#1) Why did we not choose to just use the RI? In other words, what
benefit do we get from JBeret that is not also in the RI? In other,
other words, why should we *use* this code instead of the RI at this
point in time?
Batch RI (
http://java.net/projects/jbatch from IBM) was created
solely
for the purpose of a reference implementation, and is a subset of IBM's
batch offering. The RI code base is refreshed periodically by IBM
contributors and it doesn't seem to open to community contribution. I
haven't done a deep technical comparison between the 2 yet, but I guess
there are areas that one is better than the other and vise versa.
Looking a bit longer term, batch has been an area Java EE and JBoss
haven't paid much attention to, and I believe is an area that can offer
future growth potential. Having our own impl would give us more
flexibility when it comes to integration with the rest of the stack,
design choices, and community building. I'm also adding Kev and Pete
for their perspectives.
#2) Why does JBeret duplicate facilities already present in the
WildFly code base and deployer chain - e.g. annotation indexing,
reflection indexing, thread management, parsing facilities, etc.?
Batch spec
require an impl to be run in either Java EE or Java SE
environment. So inevitably certain services have to reside in JBeret
itself to satisfy the SE runtime. Since we started the impl as a
standalone first, there may be certain aspects that do not fit nicely in
WildFly. It is in the plan to better align with the appserver by
leveraging existing services when running inside WildFly. For example,
use the concurrency utils in EE.
#3) Specific to algorithmic complexity - it appears that jobs are
keyed by ID, yet accessed using a sequential search [1] - this does
not scale well to large numbers of jobs. Is there no better approach?
The
expectation is there is large amount of data, but the number of jobs
are not that large. Say we run a reporting job every day, it is still
one single job with many JobInstance and JobExecution. So I think the
sequential access is acceptable. I guess another reason I didn't want
to maintain a mapping is I really don't want to duplicate the job id as
the key.
#4) JAXB seems to be being used to parse XML, which is a departure
from all of our other services which expect parsing to be done during
deployment processing in a more efficient manner. Is there any better
way we can integrate this, preferably not using JAXB?
It works well so far in
standalone distro, but I'm open to alternative
mechanism in either standalone or EE.
#5) There are a number of resources present that seem inappropriate
for the production JAR [2] [3]. Is this intentional?
These are work in progress.
sql files are for implementing a jdbc job
repository. Why are they inappropriate?
#6) This code base makes extensive use of static state, including
static fields that seem not to be adequately protected for
thread-safety, and at least one static thread pool [4]. This needs to
be fixed, as these kinds of things make embedding difficult or
impossible.
In EE environment, thread pool will switch to the managed service
provided by WildFly, preferably the new concurrency utils. Can you list
other places you've noticed that make bad use of static state?
Appreciate your feedback.
Cheng