[JBoss JIRA] (ARTIF-683) Switch to RDBMS, JPA (Hibernate), Hibernate Search, and Hibernate 2nd Level Cache as the default persistence solution
by Brett Meyer (JIRA)
[ https://issues.jboss.org/browse/ARTIF-683?page=com.atlassian.jira.plugin.... ]
Brett Meyer resolved ARTIF-683.
-------------------------------
Fix Version/s: 1.1.0.Final
1.0.0.Beta1
Resolution: Done
> Switch to RDBMS, JPA (Hibernate), Hibernate Search, and Hibernate 2nd Level Cache as the default persistence solution
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: ARTIF-683
> URL: https://issues.jboss.org/browse/ARTIF-683
> Project: Artificer
> Issue Type: Feature Request
> Reporter: Brett Meyer
> Assignee: Brett Meyer
> Fix For: 1.1.0.Final, 1.0.0.Beta1
>
>
> Artificer currently uses ModeShape + Infinispan + JDBC as its storage. Back when Artificer was a simple S-RAMP impl, JCR made a lot of sense. The S-RAMP spec is essentially a hierarchical artifact repo that maintains the node metadata and relationships between them. However, the "hierarchical" bit is overstated -- it's limited to a primary artifact and its derived artifact (ex -- primary: XSD, derived: type declarations). So, the hierarchy is at most 2 levels and could be represented by a simple relationship or one-to-one foreign key. The only time the hierarchical structure is helpful is when we look up an artifact by its UUID (due to a specific tree structure we use). But otherwise, I think it's a bit of a misnomer.
> We're now extending well beyond S-RAMP. In addition to an artifact/metadata/info repo, we're trying to position the project as a more general repo for multiple projects and service information. Most importantly, the relationship requirements will expand the most. As such, I'm thinking we'd be better served by alternatives.
> Note that this is essentially a read-intensive system. Writes do of course occur, but they're almost always *additions*. Nodes are rarely updated once created. Locking and isolation should be used, but can be extremely optimistic. Also note that most artifacts have files with them. That currently uses a local filesystem store through ISPN, but could certainly be NAS.
> Additional fuel for the fire: many enterprise-level development shops have millions of artifacts, exponentially higher once derivation kicks in. Further, many have multiple relationships defined.
> Ideas:
> 1.) Switch to RDBMS, JPA (Hibernate), Hibernate Search, and Hibernate 2nd Level Cache. Although the structure originally looked JCR-specific, it may make a lot more sense as a relational DB. HSearch is a no brainer -- the full-text search capability would be vastly improved, right out of the box. And the RDBMS + in-memory-cache would be perfect for the read-intensive environment and scalability.
> 2.) Graph databases: Neo4j (w/ or w/o Hibernate OGM), OrientDB, etc.. The concern here is mainly horizontal scaling and, from what I understand, their (lack of adequate) clustering support. But, it's definitely an option.
> 3.) Distributed but strongly consistent database: RocksDB (a variant of LevelDB), CockroachDB. These are newer, but can (theoretically) scale larger than relational, and because they replicate data it might be more durable or at least recover faster in the event of failure. On the other hand, this may be more difficult for enterprises to adopt
> 3.) Stick with MS + ISPN, but use Cassandra behind it (instead of JDBC). Arguably, this wouldn't really change things and could potentially end up worse.
> 4.) Tinkerpop/Blueprints (graph API). Hawkular is using this. However, from what I've heard elsewhere, it's a horrible standard. Solutions that attempt to implement it end up in a state of twisted adaptation, resulting in performance hits.
> In the end, I'd argue that #1 is the best from enterprise-level, scalability, reliability, and configurability standpoints.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
9 years, 6 months
[JBoss JIRA] (ARTIF-693) Move query paging into the query itself
by Brett Meyer (JIRA)
Brett Meyer created ARTIF-693:
---------------------------------
Summary: Move query paging into the query itself
Key: ARTIF-693
URL: https://issues.jboss.org/browse/ARTIF-693
Project: Artificer
Issue Type: Enhancement
Reporter: Brett Meyer
Assignee: Brett Meyer
Due to JCR-SQL2's lack of paging support, paging was handled in-memory *after* the query was run. For obvious performance reasons, re-architect the setup and move into the query itself.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
9 years, 6 months
[JBoss JIRA] (ARTIF-692) Provide Apache Tika JARs in our WAR, once HSEARCH-1885 corrected
by Brett Meyer (JIRA)
Brett Meyer created ARTIF-692:
---------------------------------
Summary: Provide Apache Tika JARs in our WAR, once HSEARCH-1885 corrected
Key: ARTIF-692
URL: https://issues.jboss.org/browse/ARTIF-692
Project: Artificer
Issue Type: Task
Reporter: Brett Meyer
Assignee: Brett Meyer
Until HSEARCH-1885, we need to deploy our own WF/EAP Tika module (stolen from the ModeShape distro) and force the Search module to depend on it. Once HSEARCH-1885 is corrected and we upgrade, remove tika-module.zip, configureHibernateSearch.xslt, etc. and instead provide the Tika JARs in our WAR.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
9 years, 6 months
[JBoss JIRA] (ARTIF-689) Remove the JCR persistence adapter
by Brett Meyer (JIRA)
[ https://issues.jboss.org/browse/ARTIF-689?page=com.atlassian.jira.plugin.... ]
Brett Meyer resolved ARTIF-689.
-------------------------------
Resolution: Done
> Remove the JCR persistence adapter
> ----------------------------------
>
> Key: ARTIF-689
> URL: https://issues.jboss.org/browse/ARTIF-689
> Project: Artificer
> Issue Type: Task
> Reporter: Brett Meyer
> Assignee: Brett Meyer
> Fix For: 1.1.0.Final, 1.0.0.Beta1
>
>
> ARTIF-683 switched to a JPA/RDBMS default persistence strategy. Originally, we intended to keep the JCR plugin and continue to maintain it, however:
> First and foremost, initial performance tests are looking extremely promising. With the exception of uploads (slower for JPA due to Hibernate Search indexing and complicated INSERTs, as opposed to JCR node creation), everything else is 2-5x faster, including complex queries.
> Further, I'm starting to hit certain dependency conflicts between Hibernate and ModeShape that might eventually require a split in Artificer distros. Maintaining both is already becoming a headache.
> Is there much of a point in keeping both? The only argument I can think of is if for some reason a user was integrating S-RAMP's JCR storage with their existing, *non-RDBMS* source (Cassandra, etc) through ModeShape/Infinispan. But that's highly unlikely. In my mind, everything else trumps the off-chance for backward compatibility issues. Most users we talk to see JCR purely as an implementation detail.
> We will, however, provide a migration strategy. It would simply be code fragments from both adapters, translating to and from the backends. Honestly simple stuff.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
9 years, 6 months
[JBoss JIRA] (ARTIF-689) Remove the JCR persistence adapter
by Brett Meyer (JIRA)
[ https://issues.jboss.org/browse/ARTIF-689?page=com.atlassian.jira.plugin.... ]
Brett Meyer updated ARTIF-689:
------------------------------
Fix Version/s: 1.1.0.Final
1.0.0.Beta1
> Remove the JCR persistence adapter
> ----------------------------------
>
> Key: ARTIF-689
> URL: https://issues.jboss.org/browse/ARTIF-689
> Project: Artificer
> Issue Type: Task
> Reporter: Brett Meyer
> Assignee: Brett Meyer
> Fix For: 1.1.0.Final, 1.0.0.Beta1
>
>
> ARTIF-683 switched to a JPA/RDBMS default persistence strategy. Originally, we intended to keep the JCR plugin and continue to maintain it, however:
> First and foremost, initial performance tests are looking extremely promising. With the exception of uploads (slower for JPA due to Hibernate Search indexing and complicated INSERTs, as opposed to JCR node creation), everything else is 2-5x faster, including complex queries.
> Further, I'm starting to hit certain dependency conflicts between Hibernate and ModeShape that might eventually require a split in Artificer distros. Maintaining both is already becoming a headache.
> Is there much of a point in keeping both? The only argument I can think of is if for some reason a user was integrating S-RAMP's JCR storage with their existing, *non-RDBMS* source (Cassandra, etc) through ModeShape/Infinispan. But that's highly unlikely. In my mind, everything else trumps the off-chance for backward compatibility issues. Most users we talk to see JCR purely as an implementation detail.
> We will, however, provide a migration strategy. It would simply be code fragments from both adapters, translating to and from the backends. Honestly simple stuff.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
9 years, 6 months
[JBoss JIRA] (ARTIF-683) Switch to RDBMS, JPA (Hibernate), Hibernate Search, and Hibernate 2nd Level Cache as the default persistence solution
by Brett Meyer (JIRA)
[ https://issues.jboss.org/browse/ARTIF-683?page=com.atlassian.jira.plugin.... ]
Work on ARTIF-683 started by Brett Meyer.
-----------------------------------------
> Switch to RDBMS, JPA (Hibernate), Hibernate Search, and Hibernate 2nd Level Cache as the default persistence solution
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: ARTIF-683
> URL: https://issues.jboss.org/browse/ARTIF-683
> Project: Artificer
> Issue Type: Feature Request
> Reporter: Brett Meyer
> Assignee: Brett Meyer
>
> Artificer currently uses ModeShape + Infinispan + JDBC as its storage. Back when Artificer was a simple S-RAMP impl, JCR made a lot of sense. The S-RAMP spec is essentially a hierarchical artifact repo that maintains the node metadata and relationships between them. However, the "hierarchical" bit is overstated -- it's limited to a primary artifact and its derived artifact (ex -- primary: XSD, derived: type declarations). So, the hierarchy is at most 2 levels and could be represented by a simple relationship or one-to-one foreign key. The only time the hierarchical structure is helpful is when we look up an artifact by its UUID (due to a specific tree structure we use). But otherwise, I think it's a bit of a misnomer.
> We're now extending well beyond S-RAMP. In addition to an artifact/metadata/info repo, we're trying to position the project as a more general repo for multiple projects and service information. Most importantly, the relationship requirements will expand the most. As such, I'm thinking we'd be better served by alternatives.
> Note that this is essentially a read-intensive system. Writes do of course occur, but they're almost always *additions*. Nodes are rarely updated once created. Locking and isolation should be used, but can be extremely optimistic. Also note that most artifacts have files with them. That currently uses a local filesystem store through ISPN, but could certainly be NAS.
> Additional fuel for the fire: many enterprise-level development shops have millions of artifacts, exponentially higher once derivation kicks in. Further, many have multiple relationships defined.
> Ideas:
> 1.) Switch to RDBMS, JPA (Hibernate), Hibernate Search, and Hibernate 2nd Level Cache. Although the structure originally looked JCR-specific, it may make a lot more sense as a relational DB. HSearch is a no brainer -- the full-text search capability would be vastly improved, right out of the box. And the RDBMS + in-memory-cache would be perfect for the read-intensive environment and scalability.
> 2.) Graph databases: Neo4j (w/ or w/o Hibernate OGM), OrientDB, etc.. The concern here is mainly horizontal scaling and, from what I understand, their (lack of adequate) clustering support. But, it's definitely an option.
> 3.) Distributed but strongly consistent database: RocksDB (a variant of LevelDB), CockroachDB. These are newer, but can (theoretically) scale larger than relational, and because they replicate data it might be more durable or at least recover faster in the event of failure. On the other hand, this may be more difficult for enterprises to adopt
> 3.) Stick with MS + ISPN, but use Cassandra behind it (instead of JDBC). Arguably, this wouldn't really change things and could potentially end up worse.
> 4.) Tinkerpop/Blueprints (graph API). Hawkular is using this. However, from what I've heard elsewhere, it's a horrible standard. Solutions that attempt to implement it end up in a state of twisted adaptation, resulting in performance hits.
> In the end, I'd argue that #1 is the best from enterprise-level, scalability, reliability, and configurability standpoints.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
9 years, 6 months