[overlord-issues] [JBoss JIRA] (ARTIF-683) Switch to RDBMS, JPA (Hibernate), Hibernate Search, and Hibernate 2nd Level Cache as the default persistence solution

Tuesday, 12 May 2015

    [
https://issues.jboss.org/browse/ARTIF-683?page=com.atlassian.jira.plugin....
] 

Brett Meyer commented on ARTIF-683:
-----------------------------------

[~hxp], I should clarify: this isn't *replacing* the JCR plugin.  For the time being,
we'll continue to support that and you'll be able to continue using it.  However,
this new setup will be the new *default*.

I'm obviously biased (I'm a committer and one of the core devs of Hibernate ORM),
but this new setup has been a long time coming and has a large amount of benefits:

- Horizontal scaling: Several potential users have brought up their typical
enterprise-level use cases, and they all involved primary artifacts on the order of *one
million*.  Once derivation kicks in, that could look more like a hundred million, maybe
more.  I had a lot of concerns about JCR's ability to handle queries with that much
data.  RDBMS + proper indexing + in-mem caching has a proven track record...
- CQRS pattern: Being able to query-by-column using pure SQL, rather than pulling out
entire nodes (ala JCR), has a large amount of benefits on its own
(http://martinfowler.com/bliki/CQRS.html).
- Full-text search: Fully-indexed, full-text search is improving in ModeShape, but
it's not quite there.  Using Hibernate Search and Lucene opens the doors to several
new features.
- BLOB storage: I've actually added a new SPI that allows both SQL BLOB or the
filesystem to be used for binary storage.  For most use cases, BLOBs tend to have several
pros over filesystem, so at least having it as an option is beneficial.  Note that this
SPI would also allow for new plugins: Ceph, Gluster, etc.
- Flexibility: Right out of the gate, this tremendously increases what we're able to
support.  Any RDBMS (that's supported by Hibernate), any Hibernate caching solution
(not limited to Infinispan), etc.
- Less external dependencies: nearly the entire stack is already included in Wildfly/EAP
out of the box

{quote}
Use JPA
{quote}
FWIW, I am using pure JPA, rather than Hibernate ORM.  Although, the driving force behind
that is mainly the JPA Criteria Query API...

{quote}
Then Hiberante OGM comes essentially for free.
{quote}
Definitely a fair point.  OGM/NoSQL, Cassandra, GraphDBs, etc. could certainly be added as
additional plugins, if the community wants them (and is willing to help develop them ;)

{quote}
so your whole argument really only boils down to a matter of runtime efficiencies (storage
space & execution speed). 
{quote}
Simply not true -- there's *a lot* more to it

{quote}
I strongly suggest you see Artificer as a relatively high level functional application
platform for business organizations and technical organizations, rather than
yet-another-lower-level-engine (of which there are hundreds and hundreds). While space and
time efficiency are always important, they don't make the top 10 list of what CIOs
(should, and do) consider critical – such as reliability, manageability, scalability,
security, longevity, stability, developer ease, etc. So architect with the key longterm
open standards
{quote}
This is an absolutely vital point, and one that I focus on.  Artificer needs to continue
to become a powerful, out of the box application.  It certainly needs to support
developers and be usable as a powerful platform, but in the end, focusing on it as an
*application* is where it will shine.  But I'd argue that you're contradicting
yourself, at least a bit.  Through this tried-and-true stack, the "reliability,
manageability, scalability, security, longevity, stability" increase across the
board.  And 80-90% of users will see the storage solution as simply an internal
implementation detail.

{quote}
Instead of stripdown hotrodding, go Architectural
{quote}
Again, you're focusing only on the performance considerations, which are only a small
piece of the decision.  Realistically, we have a *long* backlog of tasks and new features
to add, many of which would at least be difficult to support with JCR.  A flexible
architecture is definitely one area being gained...

{quote}
Most importantly, stick with open standards, so you let your users (like me) make our own
best decision of what to put behind the standard interface.
{quote}
With JPA, that's what you'd be getting...

{quote}
and then you can have nice integrity between the 2 paradigms
{quote}
I have to say, I fully agree with Randall and Horia's points on
https://developer.jboss.org/message/928792.  Mixing JPA with JCR doesn't make much
sense.

{quote}
IMHO, you ought to have a chat with Randall and figure out an optimal architecture.
{quote}
Myself, Randall, and several other committers on the Hibernate/Infinispan teams have been
involved in these discussions and architecture.  That should probably have been more in
the open...

I definitely value alternative opinions -- don't want this to sound like I'm just
outright shooting you down.  But, just wanted to clarify how this has shaped up.  Feel
free to continue the discussion!

...
 Switch to RDBMS, JPA (Hibernate), Hibernate Search, and Hibernate 2nd
Level Cache as the default persistence solution

---------------------------------------------------------------------------------------------------------------------

                 Key: ARTIF-683
                 URL: https://issues.jboss.org/browse/ARTIF-683
             Project: Artificer
          Issue Type: Feature Request
            Reporter: Brett Meyer
            Assignee: Brett Meyer

 Artificer currently uses ModeShape + Infinispan + JDBC as its storage.  Back when
Artificer was a simple S-RAMP impl, JCR made a lot of sense.  The S-RAMP spec is
essentially a hierarchical artifact repo that maintains the node metadata and
relationships between them.  However, the "hierarchical" bit is overstated --
it's limited to a primary artifact and its derived artifact (ex -- primary: XSD,
derived: type declarations).  So, the hierarchy is at most 2 levels and could be
represented by a simple relationship or one-to-one foreign key.  The only time the
hierarchical structure is helpful is when we look up an artifact by its UUID (due to a
specific tree structure we use).  But otherwise, I think it's a bit of a misnomer.
 We're now extending well beyond S-RAMP.  In addition to an artifact/metadata/info
repo, we're trying to position the project as a more general repo for multiple
projects and service information.  Most importantly, the relationship requirements will
expand the most.  As such, I'm thinking we'd be better served by alternatives.
 Note that this is essentially a read-intensive system.  Writes do of course occur, but
they're almost always *additions*.  Nodes are rarely updated once created.  Locking
and isolation should be used, but can be extremely optimistic.  Also note that most
artifacts have files with them.  That currently uses a local filesystem store through
ISPN, but could certainly be NAS.
 Additional fuel for the fire: many enterprise-level development shops have millions of
artifacts, exponentially higher once derivation kicks in.  Further, many have multiple
relationships defined.
 Ideas:
 1.) Switch to RDBMS, JPA (Hibernate), Hibernate Search, and Hibernate 2nd Level Cache. 
Although the structure originally looked JCR-specific, it may make a lot more sense as a
relational DB.  HSearch is a no brainer -- the full-text search capability would be vastly
improved, right out of the box.  And the RDBMS + in-memory-cache would be perfect for the
read-intensive environment and scalability.
 2.) Graph databases: Neo4j (w/ or w/o Hibernate OGM), OrientDB, etc..  The concern here
is mainly horizontal scaling and, from what I understand, their (lack of adequate)
clustering support.  But, it's definitely an option.
 3.) Distributed but strongly consistent database: RocksDB (a variant of LevelDB),
CockroachDB. These are newer, but can (theoretically) scale larger than relational, and
because they replicate data it might be more durable or at least recover faster in the
event of failure. On the other hand, this may be more difficult for enterprises to adopt
 3.) Stick with MS + ISPN, but use Cassandra behind it (instead of JDBC).  Arguably, this
wouldn't really change things and could potentially end up worse.
 4.) Tinkerpop/Blueprints (graph API).  Hawkular is using this.  However, from what
I've heard elsewhere, it's a horrible standard.  Solutions that attempt to
implement it end up in a state of twisted adaptation, resulting in performance hits.
 In the end, I'd argue that #1 is the best from enterprise-level, scalability,
reliability, and configurability standpoints. 

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

[overlord-issues] [JBoss JIRA] (ARTIF-683) Switch to RDBMS, JPA (Hibernate), Hibernate Search, and Hibernate 2nd Level Cache as the default persistence solution