[infinispan-issues] [JBoss JIRA] (ISPN-1103) Soft schema-based storage

Randall Hauch (Commented) (JIRA) jira-events at lists.jboss.org
Thu Oct 13 10:11:16 EDT 2011


    [ https://issues.jboss.org/browse/ISPN-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634598#comment-12634598 ] 

Randall Hauch commented on ISPN-1103:
-------------------------------------

The design has been evolving, and I've been pushing (overwriting) new versions of the branch. Here's a summary of the basic design:

The primary goal is to enable storing dynamically-structured values with metadata, and to also enable describing the structure of each value (and metadata) using a schema-based approach. [JSON|http://json.org] documents provide an excellent way to offer structure that is extremely flexible, while [JSON Schema|http://json-schema.org/] offers a way to define the structure of JSON documents in a way that can be easily validated. (Note that a JSON Schema is just a JSON document that conforms to the JSON meta-schema, which is rich enough to be self-describing. It's actually a very nice specification.)

Manik originally suggested storing the metadata and value (henceforth referred to as 'content') as strings, but doing so would mean that in order to access any information within the metadata or content, the JSON strings would first need to be parsed into an in-memory representation. Plus, if the content is to be modified, the JSON document would need to be modified and written as a string before being stored. This parsing and writing would become prohibitive.

Since Infinispan is essentially an large heap of memory, it makes far more sense to represent the content and metadata as _in-memory documents_, as long as the in-memory representation were compatible with JSON, were easy to use, and could be validated using JSON Schemas. Additionally, if the representation also supported [BSON|http://bsonspec.org] data types (e.g., binary values, UUIDs, dates, regular expressions, etc.), more types of user-content could be supported (including just raw binary data). These in-memory documents could at any time be read from or written to JSON or BSON formats. Having the schematic values be _delta-aware_ with _fine-grained locking_ (see ISPN-1115) would provide significant advantages w/r/t performance and concurrency. (Note that efficient support for delta-aware means that the schematic value can capture the changes made to the documents by client application and use those changes as the delta, rather than having to compare the changed document to a prior version to compute the changes.)

Using an in-memory representation also means that the content and metadata need not be stored as separate objects, but could instead be represented by a single document that is conceptually:

{code}
{
   "metadata" : {
      /* metadata as a nested document */
   }
   "content" : /* user's content, as a nested document or binary value */ 
}
{code}

This is the approach taken by the current design. The primary packages are:

* org.infinispan.schematic
* org.infinispan.schematic.document
* org.infinispan.schematic.internal.*

The first two packages contain the public API, whereas all implementation-specific classes are contained within the "internal" packages.

The primary API interfaces are:

* SchematicDb - similar to Cache but tailored to make it easy for users to store a content document (or binary value) with a metadata document. Each SchematicDb has a JSON Schema library, and providing a map-reduce-based validation mechanism. Internally this uses a Cache<String,SchematicEntry>.
* SchematicEntry - the value actually stored within Infinispan, and which contains a content object (that is a Document or a Binary value) and a metadata Document. There are methods for getting a mutable interface to the content document and metadata documents. Since tracking the MIME type of the content is likely very common, the SchematicEntry interface provides methods for getting and setting the MIME type (which is actually stored in the metadata.
* Document - an immutable interface to an in-memory document
* EditableDocument - a mutable interface to an in-memory document
* Json - utility class for parsing JSON formatted streams/files into Document instances, and for writing Document instances as JSON
* Bson - utility class for parsing BSON formatted streams/files into Document instances, and for writing Document instances as BSON
* JsonSchema - utility class for working with JSON Schemas
* Various interfaces for reprenting JSON/BSON values: Array, Binary, Symbol, Timestamp, Code, CodeWithScope

The current status is that this works for LOCAL mode, but additional work is required before DISTRIBUTED and REPLICATED modes will work correctly with delta-aware and [fine-grained locking|MODE-1115].

As always, feedback is appreciated.
                
> Soft schema-based storage
> -------------------------
>
>                 Key: ISPN-1103
>                 URL: https://issues.jboss.org/browse/ISPN-1103
>             Project: Infinispan
>          Issue Type: Feature Request
>          Components: Core API
>            Reporter: Manik Surtani
>            Assignee: Randall Hauch
>            Priority: Critical
>             Fix For: 5.1.0.BETA2, 5.1.0.FINAL
>
>
> This JIRA is about storing metadata alongside values.  Perhaps encapsulating values as SchematicValues, which could be described as:
> {code}
>   class SchematicValue {
>     String jsonMetadata;
>     String jsonObject;
>   }
> {code}
> Metadata would allow for a few interesting features:
> * Extracting of lifespan and timestamp data if manipulated over a remote protocol (REST, HotRod, etc)
> * Content type for REST responses
> * Timestamps and SHA-1 hashes, useful for for HTTP headers (e.g., ETag, Cache-control, etc.)
> * Validation information (may not be processed by Infinispan, but can be used by client libs)
> * Classloader/marshaller/classdef version info
> * General structure of the information stored
> * Reference to the schema for this document
> * Storage of older versions

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       



More information about the infinispan-issues mailing list