[
https://issues.jboss.org/browse/ARTIF-683?page=com.atlassian.jira.plugin....
]
Brett Meyer updated ARTIF-683:
------------------------------
Description:
Artificer currently uses ModeShape + Infinispan + JDBC as its storage. Back when
Artificer was a simple S-RAMP impl, JCR made a lot of sense. The S-RAMP spec is
essentially a hierarchical artifact repo that maintains the node metadata and
relationships between them. However, the "hierarchical" bit is overstated --
it's limited to a primary artifact and its derived artifact (ex -- primary: XSD,
derived: type declarations). So, the hierarchy is at most 2 levels and could be
represented by a simple relationship or one-to-one foreign key. The only time the
hierarchical structure is helpful is when we look up an artifact by its UUID (due to a
specific tree structure we use). But otherwise, I think it's a bit of a misnomer.
We're now extending well beyond S-RAMP. In addition to an artifact/metadata/info
repo, we're trying to position the project as a more general repo for multiple
projects and service information. Most importantly, the relationship requirements will
expand the most. As such, I'm thinking we'd be better served by alternatives.
Note that this is essentially a read-intensive system. Writes do of course occur, but
they're almost always *additions*. Nodes are rarely updated once created. Locking
and isolation should be used, but can be extremely optimistic. Also note that most
artifacts have files with them. That currently uses a local filesystem store through
ISPN, but could certainly be NAS.
Additional fuel for the fire: many enterprise-level development shops have millions of
artifacts, exponentially higher once derivation kicks in. Further, many have multiple
relationships defined.
Ideas:
1.) Switch to RDBMS, JPA (Hibernate), Hibernate Search, and Hibernate 2nd Level Cache.
Although the structure originally looked JCR-specific, it may make a lot more sense as a
relational DB. HSearch is a no brainer -- the full-text search capability would be vastly
improved, right out of the box. And the RDBMS + in-memory-cache would be perfect for the
read-intensive environment and scalability.
2.) Graph databases: Neo4j (w/ or w/o Hibernate OGM), OrientDB, etc.. The concern here is
mainly horizontal scaling and, from what I understand, their (lack of adequate) clustering
support. But, it's definitely an option.
3.) Distributed but strongly consistent database: RocksDB (a variant of LevelDB),
CockroachDB. These are newer, but can (theoretically) scale larger than relational, and
because they replicate data it might be more durable or at least recover faster in the
event of failure. On the other hand, this may be more difficult for enterprises to adopt
3.) Stick with MS + ISPN, but use Cassandra behind it (instead of JDBC). Arguably, this
wouldn't really change things and could potentially end up worse.
4.) Tinkerpop/Blueprints (graph API). Hawkular is using this. However, from what
I've heard elsewhere, it's a horrible standard. Solutions that attempt to
implement it end up in a state of twisted adaptation, resulting in performance hits.
In the end, I'd argue that #1 is the best from enterprise-level, scalability,
reliability, and configurability standpoints.
was:
Artificer currently uses ModeShape + Infinispan + JDBC as its storage. Back when
Artificer was a simple S-RAMP impl, JCR made a lot of sense. The S-RAMP spec is
essentially a hierarchical artifact repo that maintains the node metadata and
relationships between them. However, the "hierarchical" bit is overstated --
it's limited to a primary artifact and its derived artifact (ex -- primary: XSD,
derived: type declarations). So, the hierarchy is at most 2 levels and could be
represented by a simple relationship or one-to-one foreign key. The only time the
hierarchical structure is helpful is when we look up an artifact by its UUID (due to a
specific tree structure we use). But otherwise, I think it's a bit of a misnomer.
We're now extending well beyond S-RAMP. In addition to an artifact/metadata/info
repo, we're trying to position the project as a more general repo for multiple
projects and service information. Most importantly, the relationship requirements will
expand the most. As such, I'm thinking we'd be better served by alternatives.
Note that this is essentially a read-intensive system. Writes do of course occur, but
they're almost always *additions*. Nodes are rarely updated once created. Locking
and isolation should be used, but can be extremely optimistic. Also note that most
artifacts have files with them. That currently uses a local filesystem store through
ISPN, but could certainly be NAS.
Additional fuel for the fire: many enterprise-level development shops have millions of
artifacts, exponentially higher once derivation kicks in. Further, many have multiple
relationships defined.
Ideas:
1.) Switch to RDBMS + Hibernate ORM + Hibernate Search + Hibernate 2nd Level Caching.
Although the structure originally looked JCR-specific, it may make a lot more sense as a
relational DB. HSearch is a no brainer -- the full-text search capability would be vastly
improved, right out of the box. And the RDBMS + in-memory-cache would be perfect for the
read-intensive environment and scalability.
2.) Graph databases: Neo4j (w/ or w/o Hibernate OGM), OrientDB, etc.. The concern here is
mainly horizontal scaling and, from what I understand, their (lack of adequate) clustering
support. But, it's definitely an option.
3.) Distributed but strongly consistent database: RocksDB (a variant of LevelDB),
CockroachDB. These are newer, but can (theoretically) scale larger than relational, and
because they replicate data it might be more durable or at least recover faster in the
event of failure. On the other hand, this may be more difficult for enterprises to adopt
3.) Stick with MS + ISPN, but use Cassandra behind it (instead of JDBC). Arguably, this
wouldn't really change things and could potentially end up worse.
4.) Tinkerpop/Blueprints (graph API). Hawkular is using this. However, from what
I've heard elsewhere, it's a horrible standard. Solutions that attempt to
implement it end up in a state of twisted adaptation, resulting in performance hits.
In the end, I'd argue that #1 is the best from enterprise-level, scalability,
reliability, and configurability standpoints.
Switch to RDBMS, JPA (Hibernate), Hibernate Search, and Hibernate 2nd
Level Cache as the default persistence solution
---------------------------------------------------------------------------------------------------------------------
Key: ARTIF-683
URL:
https://issues.jboss.org/browse/ARTIF-683
Project: Artificer
Issue Type: Feature Request
Reporter: Brett Meyer
Assignee: Brett Meyer
Artificer currently uses ModeShape + Infinispan + JDBC as its storage. Back when
Artificer was a simple S-RAMP impl, JCR made a lot of sense. The S-RAMP spec is
essentially a hierarchical artifact repo that maintains the node metadata and
relationships between them. However, the "hierarchical" bit is overstated --
it's limited to a primary artifact and its derived artifact (ex -- primary: XSD,
derived: type declarations). So, the hierarchy is at most 2 levels and could be
represented by a simple relationship or one-to-one foreign key. The only time the
hierarchical structure is helpful is when we look up an artifact by its UUID (due to a
specific tree structure we use). But otherwise, I think it's a bit of a misnomer.
We're now extending well beyond S-RAMP. In addition to an artifact/metadata/info
repo, we're trying to position the project as a more general repo for multiple
projects and service information. Most importantly, the relationship requirements will
expand the most. As such, I'm thinking we'd be better served by alternatives.
Note that this is essentially a read-intensive system. Writes do of course occur, but
they're almost always *additions*. Nodes are rarely updated once created. Locking
and isolation should be used, but can be extremely optimistic. Also note that most
artifacts have files with them. That currently uses a local filesystem store through
ISPN, but could certainly be NAS.
Additional fuel for the fire: many enterprise-level development shops have millions of
artifacts, exponentially higher once derivation kicks in. Further, many have multiple
relationships defined.
Ideas:
1.) Switch to RDBMS, JPA (Hibernate), Hibernate Search, and Hibernate 2nd Level Cache.
Although the structure originally looked JCR-specific, it may make a lot more sense as a
relational DB. HSearch is a no brainer -- the full-text search capability would be vastly
improved, right out of the box. And the RDBMS + in-memory-cache would be perfect for the
read-intensive environment and scalability.
2.) Graph databases: Neo4j (w/ or w/o Hibernate OGM), OrientDB, etc.. The concern here
is mainly horizontal scaling and, from what I understand, their (lack of adequate)
clustering support. But, it's definitely an option.
3.) Distributed but strongly consistent database: RocksDB (a variant of LevelDB),
CockroachDB. These are newer, but can (theoretically) scale larger than relational, and
because they replicate data it might be more durable or at least recover faster in the
event of failure. On the other hand, this may be more difficult for enterprises to adopt
3.) Stick with MS + ISPN, but use Cassandra behind it (instead of JDBC). Arguably, this
wouldn't really change things and could potentially end up worse.
4.) Tinkerpop/Blueprints (graph API). Hawkular is using this. However, from what
I've heard elsewhere, it's a horrible standard. Solutions that attempt to
implement it end up in a state of twisted adaptation, resulting in performance hits.
In the end, I'd argue that #1 is the best from enterprise-level, scalability,
reliability, and configurability standpoints.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)