[JBoss JIRA] (ARTIF-683) Switch to RDBMS, Hibernate ORM, Hibernate Search, and Hibernate 2nd Level Cache as the persistence solution

Thursday, 7 May 2015

    [
https://issues.jboss.org/browse/ARTIF-683?page=com.atlassian.jira.plugin....
] 

Howard Pearlmutter commented on ARTIF-683:
------------------------------------------

Hey, this is JBoss land -- and the open standard for this kind of thing is JPA :) 

Use JPA, which wraps HibernateORM etc, and then any RDBMS can be plugged in behind that.

Then Hiberante OGM comes essentially for free. That lets you use the same JPA interface,
but plug NoSQL DBs in behind.

Then consider that the reason for Artificer using JCR ought to be less about hierarchical
structure, and more about the already-well-architected rich support for
content/documents/artifacts. Many potential Artificer users (such as myself) will want
that extra functionality (for example, I want my SRAMP content to cohabit with my
non-SRAMP content, and be able to write code to exploit colocation synergies, and to
exploit JCR/Modeshape functionality.)  Modeshape has very rich relational and network
capability (internally it's has a generalized graph topology, not just hierarchical),
and is a much higher level developer interface onto both relational and filesystem
abstractions -- so your whole argument really only boils down to a matter of runtime
efficiencies (storage space & execution speed). 

I've worked with the JBoss ecosystem now for 15 years (since EJBoss in 1999). And
I've advised many CIOs, architects, and developers on questions like this. I strongly
suggest you see Artificer as a very high level functional application for business
organizations and technical organizations, rather than yet-another-low-level-engine (of
which there are hundreds and hundreds). While space and time efficiency are always
important, they don't make the top 10 list of what CIOs (should, and do) consider
critical -- such as reliability, manageability, scalability, security, longevity,
stability, developer ease, etc. So architect with the key longterm open standards;
don't strip down and try to hotrod.

Instead of stripdown hotrodding, go Architectural. Architecturally, for speed, you'll
want ISPN (which, BTW, has a fast API called Hotrod ;)). Hibernate caching is done with
ISPN. Modeshape sits on ISPN. ISPN can be backed with *anything* (JDBC/RDBMS, NoSQL,
JClouds/S3, etc) as long as a CacheStore is written.

Most importantly, stick with open standards, so you let your users (like me) make our own
best decision of what to put behind the standard interface.

So it boils down to JPA & JCR. 

The real interesting point here is the opportunity to combine the 2 in an intelligent way
to leverage the best of each.

Side-by-side is possible, but there are other more interesting approaches to consider.

JCR can back onto JPA, via ISPN CacheStore. (Tunnelling or bypassing then become options
for your code that needs to get directly at relational for efficiency reasons.)

JPA can back onto JCR --- here are a few to consider --

https://code.google.com/p/jpa4jcr

https://github.com/Kobee1203/jcrom

  --- and then you can have nice integrity between the 2 paradigms. (&/or tunnel, or
bypass, or best *tune* when you need performance)

IMHO, you ought to have a chat with Randall and figure out an optimal architecture.

FYI, I came to Artificer for the Errai, and stayed (so far) for the ModeShape ;)

...
 Switch to RDBMS, Hibernate ORM, Hibernate Search, and Hibernate 2nd
Level Cache as the persistence solution

-----------------------------------------------------------------------------------------------------------

                 Key: ARTIF-683
                 URL: https://issues.jboss.org/browse/ARTIF-683
             Project: Artificer
          Issue Type: Feature Request
            Reporter: Brett Meyer
            Assignee: Brett Meyer

 Artificer currently uses ModeShape + Infinispan + JDBC as its storage.  Back when
Artificer was a simple S-RAMP impl, JCR made a lot of sense.  The S-RAMP spec is
essentially a hierarchical artifact repo that maintains the node metadata and
relationships between them.  However, the "hierarchical" bit is overstated --
it's limited to a primary artifact and its derived artifact (ex -- primary: XSD,
derived: type declarations).  So, the hierarchy is at most 2 levels and could be
represented by a simple relationship or one-to-one foreign key.  The only time the
hierarchical structure is helpful is when we look up an artifact by its UUID (due to a
specific tree structure we use).  But otherwise, I think it's a bit of a misnomer.
 We're now extending well beyond S-RAMP.  In addition to an artifact/metadata/info
repo, we're trying to position the project as a more general repo for multiple
projects and service information.  Most importantly, the relationship requirements will
expand the most.  As such, I'm thinking we'd be better served by alternatives.
 Note that this is essentially a read-intensive system.  Writes do of course occur, but
they're almost always *additions*.  Nodes are rarely updated once created.  Locking
and isolation should be used, but can be extremely optimistic.  Also note that most
artifacts have files with them.  That currently uses a local filesystem store through
ISPN, but could certainly be NAS.
 Additional fuel for the fire: many enterprise-level development shops have millions of
artifacts, exponentially higher once derivation kicks in.  Further, many have multiple
relationships defined.
 Ideas:
 1.) Switch to RDBMS + Hibernate ORM + Hibernate Search + Hibernate 2nd Level Caching. 
Although the structure originally looked JCR-specific, it may make a lot more sense as a
relational DB.  HSearch is a no brainer -- the full-text search capability would be vastly
improved, right out of the box.  And the RDBMS + in-memory-cache would be perfect for the
read-intensive environment and scalability.
 2.) Graph databases: Neo4j (w/ or w/o Hibernate OGM), OrientDB, etc..  The concern here
is mainly horizontal scaling and, from what I understand, their (lack of adequate)
clustering support.  But, it's definitely an option.
 3.) Distributed but strongly consistent database: RocksDB (a variant of LevelDB),
CockroachDB. These are newer, but can (theoretically) scale larger than relational, and
because they replicate data it might be more durable or at least recover faster in the
event of failure. On the other hand, this may be more difficult for enterprises to adopt
 3.) Stick with MS + ISPN, but use Cassandra behind it (instead of JDBC).  Arguably, this
wouldn't really change things and could potentially end up worse.
 4.) Tinkerpop/Blueprints (graph API).  Hawkular is using this.  However, from what
I've heard elsewhere, it's a horrible standard.  Solutions that attempt to
implement it end up in a state of twisted adaptation, resulting in performance hits.
 In the end, I'd argue that #1 is the best from enterprise-level, scalability,
reliability, and configurability standpoints. 

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008