[overlord-issues] [JBoss JIRA] (ARTIF-683) Switch to RDBMS, Hibernate ORM, Hibernate Search, and Hibernate 2nd Level Cache as the persistence solution

Howard Pearlmutter (JIRA) issues at jboss.org
Thu May 7 18:51:46 EDT 2015


    [ https://issues.jboss.org/browse/ARTIF-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066388#comment-13066388 ] 

Howard Pearlmutter edited comment on ARTIF-683 at 5/7/15 6:51 PM:
------------------------------------------------------------------

Hey Brett, this is JBoss land -- and the open standard for this kind of thing is JPA :) 

Use JPA, which wraps HibernateORM etc, and then any RDBMS can be plugged in behind that.

Then Hiberante OGM comes essentially for free. That lets you use the same JPA interface, but plug NoSQL DBs in behind.

Then consider that the reason for Artificer using JCR ought to be less about hierarchical structure, and more about the already-well-architected rich support for content/documents/artifacts. Many potential Artificer users (such as myself) will want that extra functionality (for example, I want my SRAMP content to cohabit with my non-SRAMP content, and be able to write code to exploit colocation synergies, and to exploit JCR/Modeshape functionality.)  Modeshape has very rich relational and network capability (internally it's has a generalized graph topology, not just hierarchical), and is a much higher level developer interface onto both relational and filesystem abstractions -- so your whole argument really only boils down to a matter of runtime efficiencies (storage space & execution speed). 

I've worked with the JBoss ecosystem now for 15 years (since EJBoss in 1999). And I've advised many CIOs, architects, and developers on questions like this. I strongly suggest you see Artificer as a very high level functional application for business organizations and technical organizations, rather than yet-another-low-level-engine (of which there are hundreds and hundreds). While space and time efficiency are always important, they don't make the top 10 list of what CIOs (should, and do) consider critical -- such as reliability, manageability, scalability, security, longevity, stability, developer ease, etc. So architect with the key longterm open standards; don't strip down and try to hotrod.

Instead of stripdown hotrodding, go Architectural. Architecturally, for speed, you'll want ISPN (which, BTW, has a fast API called Hotrod ;)). Hibernate caching is done with ISPN. Modeshape sits on ISPN. ISPN can be backed with *anything* (JDBC/RDBMS, NoSQL, JClouds/S3, etc) as long as a CacheStore is written.

Most importantly, stick with open standards, so you let your users (like me) make our own best decision of what to put behind the standard interface.

So it boils down to JPA & JCR. 

The real interesting point here is the opportunity to combine the 2 in an intelligent way to leverage the best of each.

Side-by-side is possible, but there are other more interesting approaches to consider.

JCR can back onto JPA, via ISPN CacheStore. (Tunnelling or bypassing then become options for your code that needs to get directly at relational for efficiency reasons.)

JPA can back onto JCR --- here are a few to consider --

https://code.google.com/p/jpa4jcr

https://github.com/Kobee1203/jcrom

  --- and then you can have nice integrity between the 2 paradigms. (&/or tunnel, or bypass, or best *tune* when you need performance)

IMHO, you ought to have a chat with Randall and figure out an optimal architecture.

FYI, I came to Artificer for the Errai, and stayed (so far) for the ModeShape ;)






was (Author: hxp):
Hey, this is JBoss land -- and the open standard for this kind of thing is JPA :) 

Use JPA, which wraps HibernateORM etc, and then any RDBMS can be plugged in behind that.

Then Hiberante OGM comes essentially for free. That lets you use the same JPA interface, but plug NoSQL DBs in behind.

Then consider that the reason for Artificer using JCR ought to be less about hierarchical structure, and more about the already-well-architected rich support for content/documents/artifacts. Many potential Artificer users (such as myself) will want that extra functionality (for example, I want my SRAMP content to cohabit with my non-SRAMP content, and be able to write code to exploit colocation synergies, and to exploit JCR/Modeshape functionality.)  Modeshape has very rich relational and network capability (internally it's has a generalized graph topology, not just hierarchical), and is a much higher level developer interface onto both relational and filesystem abstractions -- so your whole argument really only boils down to a matter of runtime efficiencies (storage space & execution speed). 

I've worked with the JBoss ecosystem now for 15 years (since EJBoss in 1999). And I've advised many CIOs, architects, and developers on questions like this. I strongly suggest you see Artificer as a very high level functional application for business organizations and technical organizations, rather than yet-another-low-level-engine (of which there are hundreds and hundreds). While space and time efficiency are always important, they don't make the top 10 list of what CIOs (should, and do) consider critical -- such as reliability, manageability, scalability, security, longevity, stability, developer ease, etc. So architect with the key longterm open standards; don't strip down and try to hotrod.

Instead of stripdown hotrodding, go Architectural. Architecturally, for speed, you'll want ISPN (which, BTW, has a fast API called Hotrod ;)). Hibernate caching is done with ISPN. Modeshape sits on ISPN. ISPN can be backed with *anything* (JDBC/RDBMS, NoSQL, JClouds/S3, etc) as long as a CacheStore is written.

Most importantly, stick with open standards, so you let your users (like me) make our own best decision of what to put behind the standard interface.

So it boils down to JPA & JCR. 

The real interesting point here is the opportunity to combine the 2 in an intelligent way to leverage the best of each.

Side-by-side is possible, but there are other more interesting approaches to consider.

JCR can back onto JPA, via ISPN CacheStore. (Tunnelling or bypassing then become options for your code that needs to get directly at relational for efficiency reasons.)

JPA can back onto JCR --- here are a few to consider --

https://code.google.com/p/jpa4jcr

https://github.com/Kobee1203/jcrom

  --- and then you can have nice integrity between the 2 paradigms. (&/or tunnel, or bypass, or best *tune* when you need performance)

IMHO, you ought to have a chat with Randall and figure out an optimal architecture.

FYI, I came to Artificer for the Errai, and stayed (so far) for the ModeShape ;)





> Switch to RDBMS, Hibernate ORM, Hibernate Search, and Hibernate 2nd Level Cache as the persistence solution
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: ARTIF-683
>                 URL: https://issues.jboss.org/browse/ARTIF-683
>             Project: Artificer
>          Issue Type: Feature Request
>            Reporter: Brett Meyer
>            Assignee: Brett Meyer
>
> Artificer currently uses ModeShape + Infinispan + JDBC as its storage.  Back when Artificer was a simple S-RAMP impl, JCR made a lot of sense.  The S-RAMP spec is essentially a hierarchical artifact repo that maintains the node metadata and relationships between them.  However, the "hierarchical" bit is overstated -- it's limited to a primary artifact and its derived artifact (ex -- primary: XSD, derived: type declarations).  So, the hierarchy is at most 2 levels and could be represented by a simple relationship or one-to-one foreign key.  The only time the hierarchical structure is helpful is when we look up an artifact by its UUID (due to a specific tree structure we use).  But otherwise, I think it's a bit of a misnomer.
> We're now extending well beyond S-RAMP.  In addition to an artifact/metadata/info repo, we're trying to position the project as a more general repo for multiple projects and service information.  Most importantly, the relationship requirements will expand the most.  As such, I'm thinking we'd be better served by alternatives.
> Note that this is essentially a read-intensive system.  Writes do of course occur, but they're almost always *additions*.  Nodes are rarely updated once created.  Locking and isolation should be used, but can be extremely optimistic.  Also note that most artifacts have files with them.  That currently uses a local filesystem store through ISPN, but could certainly be NAS.
> Additional fuel for the fire: many enterprise-level development shops have millions of artifacts, exponentially higher once derivation kicks in.  Further, many have multiple relationships defined.
> Ideas:
> 1.) Switch to RDBMS + Hibernate ORM + Hibernate Search + Hibernate 2nd Level Caching.  Although the structure originally looked JCR-specific, it may make a lot more sense as a relational DB.  HSearch is a no brainer -- the full-text search capability would be vastly improved, right out of the box.  And the RDBMS + in-memory-cache would be perfect for the read-intensive environment and scalability.
> 2.) Graph databases: Neo4j (w/ or w/o Hibernate OGM), OrientDB, etc..  The concern here is mainly horizontal scaling and, from what I understand, their (lack of adequate) clustering support.  But, it's definitely an option.
> 3.) Distributed but strongly consistent database: RocksDB (a variant of LevelDB), CockroachDB. These are newer, but can (theoretically) scale larger than relational, and because they replicate data it might be more durable or at least recover faster in the event of failure. On the other hand, this may be more difficult for enterprises to adopt
> 3.) Stick with MS + ISPN, but use Cassandra behind it (instead of JDBC).  Arguably, this wouldn't really change things and could potentially end up worse.
> 4.) Tinkerpop/Blueprints (graph API).  Hawkular is using this.  However, from what I've heard elsewhere, it's a horrible standard.  Solutions that attempt to implement it end up in a state of twisted adaptation, resulting in performance hits.
> In the end, I'd argue that #1 is the best from enterprise-level, scalability, reliability, and configurability standpoints.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


More information about the overlord-issues mailing list