[JBoss JIRA] Created: (DNA-261) Improve how Binary values are created and managed
by Randall Hauch (JIRA)
Improve how Binary values are created and managed
-------------------------------------------------
Key: DNA-261
URL: https://jira.jboss.org/jira/browse/DNA-261
Project: DNA
Issue Type: Feature Request
Components: Graph
Reporter: Randall Hauch
Priority: Minor
Fix For: 0.4
The current InMemoryBinaryValueFactory is creating separate InMemoryBinary objects each time 'create(...)' is called. We should have a Binary factory (and corresponding Binary implementations) that caches the in-memory values using the content's hash as a key. While this may slightly degrade performance (since the hash would be computed immediately rather than lazily if needed), it could result in more efficient use of memory. Note that we'd probably have to use weak or soft references inside the Binary object to the cached values, so that as the Binary objects are garbage collected, the cache can remove the unused values.
Another potential improvement is to store large values on disk, and to memory-map them using direct buffers. This would keep large values (e.g., gigabyte-sized values, such as those used to represent very large files) out of memory.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
17 years, 3 months
[JBoss JIRA] Created: (DNA-260) Add factory method to create a Binary instance from a (local) File
by Randall Hauch (JIRA)
Add factory method to create a Binary instance from a (local) File
------------------------------------------------------------------
Key: DNA-260
URL: https://jira.jboss.org/jira/browse/DNA-260
Project: DNA
Issue Type: Feature Request
Components: API, Graph
Affects Versions: 0.3
Reporter: Randall Hauch
Priority: Minor
Fix For: 0.4
A client sometimes has a file and wants to represent the content of the file with a Binary value object. Currently, the client would have to obtain an InputStream or a Reader to the file, and create the Binary value object with that. Then, when somebody wants to read the binary value, they have to create another stream/reader. Essentially, the content is streamed 1+n times, where n is the number of times the value is read. This can be very expensive when the size of the file is large (or very large, as in gigabytes).
By adding a factor method that takes a File object, the factory could optimize the behavior. Rather than creating a stream/reader to put the content into the Binary value object, the factory could just create a Binary implementation that delegates to a File. Then, the only time a stream/reader is created is when the client wants to read the value. In other words, the content is streamed only n times.
We don't yet have a BinaryFactory interface, and are currently using ValueFactory<Binary>. Therefore, the first step is to create the BinaryFactory interface and use this in the ValueFactories interface. (Most uses of the 'getBinaryFactory()' method would not need to change, since they're either just immediately calling 'create(...)' on the returned reference, or are using ValueFactory<Binary>, which would still work.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
17 years, 3 months
[JBoss JIRA] Created: (DNA-250) Creating a path from a parent path and a segment is not fast
by Randall Hauch (JIRA)
Creating a path from a parent path and a segment is not fast
------------------------------------------------------------
Key: DNA-250
URL: https://jira.jboss.org/jira/browse/DNA-250
Project: DNA
Issue Type: Bug
Components: Graph
Affects Versions: 0.3
Reporter: Randall Hauch
Assignee: Randall Hauch
Fix For: 0.3
Currently, we only have one implementation of Path, and each instance contains a list of the Path.Segment instances in the path. Although the actual Path.Segment instances are shared when creating a path using a parent path, the actual list is not shared. And this means new allocation for each Path. This could be more efficient, especially since creating a path from a parent path and a name is perhaps one of the most frequently used methods.
We probably need an implementation of Path that contains a name plus a reference to a parent. This would have a smaller memory footprint as well as be faster to create.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
17 years, 4 months
[JBoss JIRA] Commented: (DNA-40) Persistant storage for information not stored in other repository sources
by Randall Hauch (JIRA)
[ https://jira.jboss.org/jira/browse/DNA-40?page=com.atlassian.jira.plugin.... ]
Randall Hauch commented on DNA-40:
----------------------------------
Related to deleting a subgraph, per Section 8.3.7.4 of the JSR-170 spec: "When a node is removed, a NODE_REMOVED event must be generated for the node on which the remove was called. Additionally, an implementation should also generate a NODE_REMOVE or PROPERTY_ (as appropriate) for each item in the removed subtree."
This means that the "mark" portion of the deletion process will probably have to walk the deleted structure anyway. This means that the "sweep" could also be performed during the delete request, but may actually be a pretty straightforward series of DELETE SQL commands.
> Persistant storage for information not stored in other repository sources
> -------------------------------------------------------------------------
>
> Key: DNA-40
> URL: https://jira.jboss.org/jira/browse/DNA-40
> Project: DNA
> Issue Type: Feature Request
> Components: Connectors
> Reporter: Randall Hauch
> Assignee: Randall Hauch
> Fix For: 0.4
>
>
> Create a federation connector that is able to manage information in a relational database using a DNA-defined schema. This would enable the persistant storage of information that isn't being managed by other connectors.
> Requirements include efficiently storing and accessing large numbers of nodes, large property values (e.g., Binary values), and large numbers of children on nodes.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
17 years, 5 months
[JBoss JIRA] Commented: (DNA-40) Persistant storage for information not stored in other repository sources
by Randall Hauch (JIRA)
[ https://jira.jboss.org/jira/browse/DNA-40?page=com.atlassian.jira.plugin.... ]
Randall Hauch commented on DNA-40:
----------------------------------
Copying a subgraph could also be expensive, since that would require navigating the subgraph and creating the copy of each node. One interesting and challenging aspect is translating references between nodes contained in the original subgraph into references that correctly point to the new subgraph. Like the delete, this could benefit from a 2-pass approach. The first pass would involve copying the structure (ChildEntity records) and allow us to build up the translation map of original-UUID-to-copy-UUID. Then, the second pass would involve copying the properties and translating any property values (references, paths).
Note that the JSR-170 spec says the following in Section 8.3.7.6: "When a subtree is copied, an implementation must generate a single NODE_ADDED event reflecting the addition of the root of the copied subtree at the destination location. Additionally, an implementation should generate appropriate events for each resulting node and property addition in the copied subtree."
> Persistant storage for information not stored in other repository sources
> -------------------------------------------------------------------------
>
> Key: DNA-40
> URL: https://jira.jboss.org/jira/browse/DNA-40
> Project: DNA
> Issue Type: Feature Request
> Components: Connectors
> Reporter: Randall Hauch
> Assignee: Randall Hauch
> Fix For: 0.4
>
>
> Create a federation connector that is able to manage information in a relational database using a DNA-defined schema. This would enable the persistant storage of information that isn't being managed by other connectors.
> Requirements include efficiently storing and accessing large numbers of nodes, large property values (e.g., Binary values), and large numbers of children on nodes.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
17 years, 5 months
[JBoss JIRA] Commented: (DNA-40) Persistant storage for information not stored in other repository sources
by Randall Hauch (JIRA)
[ https://jira.jboss.org/jira/browse/DNA-40?page=com.atlassian.jira.plugin.... ]
Randall Hauch commented on DNA-40:
----------------------------------
Deleting a subgraph in this connector could be expensive, since no records contain the path and thus deleting requires walking the subgraph and deleting the records for the individual nodes.
However, an approach that will probably work well is "mark and sweep", where any node(s) to be deleted are simply marked as "deleted" but left until a later request/process can "sweep" the deleted records from the database. This could be pretty efficient, since it may be possible to only mark the nodes that are explicitly deleted, as long as all other operations exclude any records that are marked for deletion (or that are descendants of marked nodes).
We may need to add a new request type that represents a "compaction" or "cleanup" or "maintenance" operation, which could be called periodically by a separate process/thread. Ideally, each connector would implement this new request in such a way that it doesn't block other requests.
To implement the "mark and sweep" approach in this connector, I added a "deleted" attribute (boolean type, null if false) to ChildEntity and added to all the where clauses of the ChildEntity named queries a "and child.deleted is null" criteria. Since these ChildEntity named queries are used to verify the path/UUID combination for Locations used in all requests, the descendant nodes don't need to be marked for deletion (if an ancestor node is marked for deletion, it will not be found as the path is verified, resulting in a PathNotFoundException).
Since we're using UUIDs as the primary key for each node (all the various records), a node that is marked for deletion can still exist in the database without the chance of a clash.
> Persistant storage for information not stored in other repository sources
> -------------------------------------------------------------------------
>
> Key: DNA-40
> URL: https://jira.jboss.org/jira/browse/DNA-40
> Project: DNA
> Issue Type: Feature Request
> Components: Connectors
> Reporter: Randall Hauch
> Assignee: Randall Hauch
> Fix For: 0.4
>
>
> Create a federation connector that is able to manage information in a relational database using a DNA-defined schema. This would enable the persistant storage of information that isn't being managed by other connectors.
> Requirements include efficiently storing and accessing large numbers of nodes, large property values (e.g., Binary values), and large numbers of children on nodes.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
17 years, 5 months
[JBoss JIRA] Commented: (DNA-40) Persistant storage for information not stored in other repository sources
by Randall Hauch (JIRA)
[ https://jira.jboss.org/jira/browse/DNA-40?page=com.atlassian.jira.plugin.... ]
Randall Hauch commented on DNA-40:
----------------------------------
Improved functionality and testing, including some performance testing of creating 100s and 1000s of nodes. (These tests are commented out due to the time required to run them.)
Also made minor improvements to the Graph API. Specifically, added the ability to get the UUID out of a Location, changed the CreateNodeRequest.toString() to be more readable, and corrected the interface returned from Graph.create(...) and Graph.Batch.create(...) methods (previously required two .and() calls).
> Persistant storage for information not stored in other repository sources
> -------------------------------------------------------------------------
>
> Key: DNA-40
> URL: https://jira.jboss.org/jira/browse/DNA-40
> Project: DNA
> Issue Type: Feature Request
> Components: Connectors
> Reporter: Randall Hauch
> Assignee: Randall Hauch
> Fix For: 0.4
>
>
> Create a federation connector that is able to manage information in a relational database using a DNA-defined schema. This would enable the persistant storage of information that isn't being managed by other connectors.
> Requirements include efficiently storing and accessing large numbers of nodes, large property values (e.g., Binary values), and large numbers of children on nodes.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
17 years, 5 months