From dna-commits at lists.jboss.org Wed Apr 30 13:48:50 2008 Content-Type: multipart/mixed; boundary="===============1672541724339409897==" MIME-Version: 1.0 From: dna-commits at lists.jboss.org To: dna-commits at lists.jboss.org Subject: [dna-commits] DNA SVN: r112 - trunk/docs/getting_started/en. Date: Wed, 30 Apr 2008 13:48:50 -0400 Message-ID: --===============1672541724339409897== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Author: rhauch Date: 2008-04-30 13:48:50 -0400 (Wed, 30 Apr 2008) New Revision: 112 Added: trunk/docs/getting_started/en/Author_Group.xml trunk/docs/getting_started/en/Legal_Notice.xml Modified: trunk/docs/getting_started/en/master.xml Log: Restructured into a single docbook file, using an older version of the JDoc= Book plugin Added: trunk/docs/getting_started/en/Author_Group.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- trunk/docs/getting_started/en/Author_Group.xml = (rev 0) +++ trunk/docs/getting_started/en/Author_Group.xml 2008-04-30 17:48:50 UTC = (rev 112) @@ -0,0 +1,6 @@ + + + + Randall Hauch + \ No newline at end of file Added: trunk/docs/getting_started/en/Legal_Notice.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- trunk/docs/getting_started/en/Legal_Notice.xml = (rev 0) +++ trunk/docs/getting_started/en/Legal_Notice.xml 2008-04-30 17:48:50 UTC = (rev 112) @@ -0,0 +1,16 @@ + + + + Legal Notice + +
+ 1801 Varsity Drive + Raleigh, NC 27606-2072 USA + Phone: +1 919 754 3700 + Phone: 888 733 4281 + Fax: +1 919 754 3701 + PO Box 13588, Research Triangle Park, NC 27709 USA +
+
+
\ No newline at end of file Modified: trunk/docs/getting_started/en/master.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- trunk/docs/getting_started/en/master.xml 2008-04-28 19:55:31 UTC (rev 1= 11) +++ trunk/docs/getting_started/en/master.xml 2008-04-30 17:48:50 UTC (rev 1= 12) @@ -1,41 +1,26 @@ - - - + + +]> + JBoss DNA Getting Started Guide - 0.1 + 0.1 + 1 + + - - Table of Contents - What this book covers The goal of this book is to help you learn about JBoss DNA and h= ow you can use it in your own applications to get the most out of your JCR repositories. - The first part of the book provides some background on content = repositories and the Java Content Repository (JCR) API. - Content repositories are an important aspect of JBoss DNA, so it's p= robably worth reading even if you're already familiar - with these technologies. Besides, it's really not that long. - The If you're already familiar with these technologies, you can= probably skip this section. However, most readers will - probably want to - and then introduces JBoss DNA project and its relationship to J= CR. The second part - - Part I intr oduces JBoss DNAFeature Analysis= and Design - Part 1 + The part of the book starts out with an introduction to content = repositories and an overview of the JCR API, both of which are an + important aspect of JBoss DNA. This is followed by an overview of the= the JBoss DNA project, it's architecture, and a basic + roadmap for what's coming next. + The next part of the book covers how to download and build the e= xamples, how to use JBoss DNA with + existing repositories, and how to build and use custom sequencers. - This document introduces the JBoss DNA project. - - - - The build process is simplified and standardized. Just follow th= e instructions in this guide to setup your - docs - directory and copy a very simple - pom.xml - file. - - - If you have any questions or comments, please feel free to contact J= Boss DNA's user mailing list @@ -51,10 +36,518 @@ same thing. - - - - - - + + Introduction + There are a lot of choices for how applications can store inform= ation persistently so that it can be accessed at a + later time and by other processes. The challenge developers face is = to use an approach that most closes matches the needs of + their application. This choice is more important as developers choos= e to focus their efforts on the application-specific + logic, delegating much of the responsibilities for persistence to li= braries and frameworks. + + Perhaps one of the easiest techniques is to simply store information= in + files + . The Java language makes working with files relatively easy, but Ja= va really doesn't provide many bells and whistles. So + using files is an easy choice when the information is either not com= plicated (for example property files), or when users may + need to read or change the information outside of the application (f= or example log files or configuration files). But using + files to persist information becomes more difficult as the informati= on becomes more complex, as the volume of it increases, + or if it needs to be accessed by multiple processes. For these situa= tions, other techniques are often a better choice. + + + Another technique built into the Java language is + Java serialization + , which is capable of persisting the state of an object graph so tha= t it can be read back in at a later time. However, Java + serialization can quickly become tricky if the classes are changed, = and so its beneficial usually when the information is + persisted for a very short period of time. For example, serializatio= n is sometimes used to send an object graph from one + process to another. + + + One of the more popular persistence technologies is the + relational database + . Relational database management systems have been around for decade= s and are very capable. The Java Database Connectivity + (JDBC) API provides a standard interface for connecting to and inter= acting with relational databases. However, it is a + low-level API that requires a lot of code to use correctly, and it s= till doesn't abstract away the DBMS-specific SQL + grammar. Also, working with relational data in an object-oriented la= nguage can feel somewhat unnatural, so many developers + map this data to classes that fit much more cleanly into their appli= cation. The problem is that manually creating this + mapping layer requires a lot of repetitive and non-trivial JDBC code. + + + Object-relational mapping + libraries automate the creation of this mapping layer and result in = far less code that is much more maintainable with often + as good (if not better) performance than handwritten JDBC code. The = new + Java Persistence API (JPA) + provide a standard mechanism for defining the mappings (through anno= tations) and working with these entity objects. Several + commercial and open-source libraries implement JPA, and some even of= fer additional capabilities and features that go beyond + JPA. For example, + Hibernate + is one of the most feature-rich JPA implementations and offers objec= t caching, statement caching, extra association + mappings, and other features that help to improve performance and us= efulness. + + + While relational databases and JPA are solutions that work for many = applications, they become more limited in cases when the + information structure is highly flexible, is not known a priori, or = is subject to frequent change and customization. In + these situations, + content repositories + may offer a better choice for persistence. Content repositories are = almost a hybrid between relational databases and file + systems, and typically provide other capabilies as well, including v= ersioning, indexing, search, access control, + transactions, and observation. Because of this, content repositories= are used by content management systems (CMS), document + management systems (DMS), and other applications that manage electro= nic files (e.g., documents, images, multi-media, web + content, etc.) and metadata associated with them (e.g., author, date= , status, security information, etc.). The + Content Rep= ository for Java technology API + provides a standard Java API for working with content repositories. = Abbreviated "JCR", this API was developed as part of the + Java Community Process under + JSR-170 + and is being revised under + JSR-283 + . + + + The + JBoss DNA project + is building the tooles and services that surround content repositori= es. Nearly all of these capabilities are to be hidden + below the JCR API and involve automated processing of the informatio= n in the repository. Thus, JBoss DNA can add value to + existing repository implementations. For example, JCR repositories o= ffer the ability to upload files into the repository and + have the file content index for search purposes. JBoss DNA defines a= library for also sequencing that content to extract + meaningful information and store it in the repository, where it can = then be searched, accessed and analyzed using the JCR + API. + + JBoss DNA is building other features as well. One goal of JBoss= DNA is to create federated repositories that + dynamically merge the information from multiple databases, services,= applications, and other JCR repositories. Another is to + create customized views based upon the type of data and the role of = the user that is accessing the data. And yet another is + to create a REST-ful API to allow the JCR content to be accessed eas= ily by other applications written in other languages. + + + The + next chapter + in this book goes into more detail about JBoss DNA and its architect= ure, the different components, what's available now, and + what's coming in future releases. + Chapter 3 + then provides instructions for downloading and compiling the sequenc= er examples for the current release. + Chapter 4 + walks through these examples, while + Chapter 5 + goes over how to create custom sequencers. Finally, + Chapter 6 + wraps things up. + + + + JBoss DNA + + Sequencers + The current JBoss DNA release contains a sequencing framework= that is designed to sequence data (typically files) + stored in a JCR repository to automatically extract meaningful and= useful information. This additional information is then + saved back into the repository, where it can be accessed and used.= + In other words, you can just upload various kinds of files in= to a JCR repository, and DNA automatically processes + those files to extract meaningful structured information. For exam= ple, load DDL files into the repository, and let + sequencers extract the structure and metadata for the database sch= ema. Load Hibernate configuration files into the + repository, and let sequencers extract the schema and mapping info= rmation. Load Java source into the repository, and let + sequencers extract the class structure, JavaDoc, annotations. Load= a PNG, JPEG, or other image into the repository, and + let sequencers extract the metadata from the image and save it in = the repository. The same with XSDs, WSDL, WS policies, + UML, MetaMatrix models, etc. + + JBoss DNA sequencers sit on top of existing JCR repositories (incl= uding federated repositories) - it basically extracts + more useful information from what's already stored in the reposito= ry. And it uses the existing JCR versioning system. Each + sequencer typically processes a single kind of file format. The fo= llowing sequencer is included in JBoss DNA: + + + + Image sequencer - A sequencer that processes the binary cont= ent of an image file, extracts the metadata for the + image, and then writes that image metadata to the repository= . Gets the file format, image resolution, number of bits + per pixel and optionally number of images, comments and phys= ical resolution from JPEG, GIF, BMP, PCX, PNG, IFF, RAS, + PBM, PGM, PPM and PSD files. (This sequencer may be improved= in the future to also extract EXIF metadata from JPEG + files; see + DNA-= 26 + .) + + + + + + As the community develops additional sequencers, they will also be= included in JBoss DNA. Some of those that have been + identified as being useful include: + + + + XML Schema Document (XSD) Sequencer - Process XSD files and = extract the various elements, attributes, complex types, + simple types, and groups. (See + DNA-= 32 + ) + + + + + Web Service Definition Language (WSDL) Sequencer - Process W= SDL files and extract the services, bindings, ports, + operations, parameters, and other information. (See + DNA-= 33 + ) + + + + + Hibernate File Sequencer - Process Hibernate configuration (= cfg.xml) and mapping (hbm.xml) files to extract the + configuration and mapping information. (See + DNA-= 61 + ) + + + + + XML Metadata Interchange (XMI) Sequencer - Process XMI docum= ents that contain UML models or models using another + metamodel, extracting the model structure into the repositor= y. (See + DNA-= 31 + ) + + + + + ZIP Archive Sequencer - Process ZIP archive files to extract= (explode) the contents into the repository. (See + DNA-= 63 + ) + + + + + Java Archive (JAR) Sequencer - Process JAR files to extract = (explode) the contents into the classes and file + resources. (See + DNA-= 64 + ) + + + + + Java Class File Sequencer - Process Java class files (byteco= de) to extract the class structure (including + annotations) into the repository. (See + DNA-= 62 + ) + + + + + Java Source File Sequencer - Process Java source files (byte= code) to extract the class structure (including + annotations) into the repository. (See + DNA-= 51 + ) + + + + + PDF Sequencer - Process PDF files to extract the document me= tadata, including table of contents. (See + DNA-= 50 + ) + + + + + Maven 2 POM Sequencer - Process Maven 2 Project Object Model= (POM) files to extract the project information, + dependencies, plugins, and other content. (See + DNA-= 24 + ) + + + + + Data Definition Language (DDL) Sequencer - Process various d= ialects of DDL, including that from Oracle, SQL Server, + MySQL, PostgreSQL, and others. May need to be split up into = a different sequencer for each dialect. (See + DNA-= 26 + ) + + + + + MP3 and MP4 Sequencer - Process MP3 and MP4 audio files to e= xtract the name of the song, artist, album, track + number, and other metadata. (See + DNA-= 30 + ) + + + + + + The + examples + in this book go into more detail about how sequencers are managed = and used, and +Chapter 5 goes into detail abou= t how to + write custom sequencers. + + + + Federation + There is a lot of information stored in many of different plac= es: databases, repositories, SCM systems, + registries, file systems, services, etc. The purpose of the federa= tion engine is to allow applications to use the JCR API + to access that information as if it were all stored in a single JC= R repository, but to really leave the information where + it is. + Why not just move the information into a JCR repository? Most = likely there are existing applications that rely upon + that information being where it is. If we were to move it, then al= l those applications would break. Or they'd have to be + changed to use JCR. If the information is being used, the most pra= ctical thing is to leave it where it is. + + Then why not just copy the information into a JCR repository? Actu= ally, there are times when it's perfectly reasonable to + make a copy of the data. Perhaps the system managing the existing = information cannot handle the additional load of more + clients. Or, perhaps the information doesn't change, or it does ch= ange and we want snapshots that don't change. But more + likely, the data + does + change. So if applications are to use the most current information= and we make copies of the data, we have to keep the + copies synchronized with the master. That's generally a lot of wor= k. + + The JBoss DNA federation engine lets us leave the information = where it is, yet lets client applications use the JCR + API to access all the information without caring where the informa= tion really exists. If the underlying information + changes, client applications using JCR observation will be notifie= d of the changes. If a JBoss DNA federated repository is + configured to allow updates, client applications can change the in= formation in the repository and JBoss DNA will propagate + those changes down to the original source. + + Connectors + + The JBoss DNA federation engine will use connectors to interact = with different information sources to get at the content + in those systems. Some ideas for connectors include: + + + JCR Repository Connector - Connect to and interact wit= h other JCR repositories. + + + File System Connector - Expose the files and directori= es on a file system through JCR. + + + Maven 2 Repository Connector - Access and expose the c= ontents of a Maven 2 repository (either on the + local file system or via HTTP) through JCR. + + + JDBC Metadata Connector - Connect to relational databa= ses via JDBC and expose their schema as content in a + repository. + + + UDDI Connector - Interact with UDDI registries to inte= grate their content into a repository. + + + + SVN Connector - Interact with Subversion software configur= ation management (SCM) repositories to expose the + managed resources through JCR. Consider using + SVNkit + (dual license) library for API into Subversion. + + + + CVS Connector - Interact with CVS software configurati= on management (SCM) repositories, to expose the + managed resources through JCR. + + + JDBC Storage Connector - Store and access information = in a relational database. Also useful for persisting + information in the federated repository not stored elsewhe= re. + + + + Distributed Database Connector - Store and access informat= ion in a + Hypertable + or + HBase + distributed databases. Also useful for persisting informat= ion in the federated repository not stored elsewhere. + + + + + + If the connectors allow the information they contribute to be up= dated, they must provide an + XAResource + implementation that can be used with a Java Transaction Service.= Connectors that provide read-only access need not + provide an implementation. + + + + Sources + + Each JBoss DNA federated repository is configured to federate an= d integrate information from one or more + sources + . Each source contains the configuration details (e.g., connecti= on information, location, properties, options, etc.) for + working with that particular source, as well as a reference to t= he connector that should be used to establish + connections to the source. And of course, sources can be added o= r removed without having to stop and restart the + federated repository. + + + + Building the unified graph + The federation engine works by effectively building up a si= ngle graph by querying each source and merging or + unifying the responses. This information is cached, which improv= es performance, reduces the number of (potentially + expensive) remote calls, reduces the load on the sources, and he= lps mitigate problems with source availability. As + clients interact with the repository, this cache is consulted fi= rst. When the requested portion of the graph (or + "subgraph") is contained completely in the cache, it is retuned = immediately. However, if any part of the requested + subgraph is not in the cache, each source is consulted for their= contributions to that subgraph, and any results are + cached. + This basic flow makes it possible for the federated reposit= ory to build up a local cache of the integrated graph + (or at least the portions that are used by clients). In fact, th= e federated repository caches information in a manner + that is similar to that of the Domain Name System (DNS). As sour= ces are consulted for their contributions, the source + also specifies whether it is the authoritative source for this i= nformation (some sources that are themselves federated + may not be the information's authority), whether the information= may be modified, the time-to-live (TTL) value (the time + after which the cached information should be refreshed), and the= expiration time (the time after which the cached + information is no longer valid). In effect, the source has compl= ete control over how the information it contributes is + cached and used. + + The federated repository also needs to incorporate + negative caching + , which is storage of the knowledge that something does not exis= t. Sources can be configured to contribute information + only below certain paths (e.g., + /A/B/C + ), and the federation engine can take advantage of this by never= consulting that source for contributions to information + on other paths. However, below that path, any negative responses= must also be cached (with appropriate TTL and expiry + parameters) to prevent the exclusion of that source (in case the= source has information to contribute at a later time) + or the frequent checking with the source. + + + + Queries + The JBoss DNA federated repository will also support querie= s against the integrated and unified graph. In some + situations the query can be determined to apply to a single sour= ce, but in most situations the query must be planned + (and possibly rewritten) such that it can be pushed down to all = the appropriate sources. Also, the cached results must + be consulted prior to returning the query results, as the result= s from one source might have contributions from another + source. + It is hoped that the MetaMatrix query engine can be used fo= r this purpose, after it is open sourced. This engine + implements sophisticated query planning and optimization techniq= ues for working efficiently with multiple sources. + + + + Updates + + The JBoss DNA federated repositories also make it possible for c= lient applications to make changes to the unified graph + within the context of distributed transactions. According to the= JCR API, client applications use the Java Transaction + API (JTA) to control the boundaries of their transactions. Meanw= hile, the federated repository uses a + distributed transac= tion service + to coordinate the XA resources provided by the connectors. + + It is quite possible that clients add properties to nodes i= n the unified graph, and that this information cannot be + handled by the same underlying source that contributed to the no= de. In this case, the federated repository can be + configured with a fallback source that will be used used to stor= e this "extra" information. + + It is a goal that non-XA sources (i.e., sources that use connect= ors without XA resources) can participate in distributed + transactions through the use of + compensating transactions + . Because the JBoss DNA federation engine implements the JCR obs= ervation system, it is capable of recording all of the + changes made to the distributed graph (and those changes sent to= each updatable source). Therefore, if a non-XA source + is involved in a distributed transaction that must be rolled bac= k, any changes made to non-XA sources can be undone. (Of + course, this does not make the underlying source transactional: = non-transactional sources still may expose the interim + changes to other clients.) + + + + Events + The JCR API supports observing a repository to receive noti= fications of additions, changes and deletions of nodes + and properties. The JBoss DNA federated repository will support = this API through two primary means. + When the changes are made through the federated repository,= the JBoss DNA federation engine is well aware of the + set of changes that have been (or are being) made to the unified= graph. These events are directly propagated to + listeners. + Sources have the ability to publish events, making it possi= ble for the JBoss DNA federation engine and clients that + have registered listeners to be notified of changes in the infor= mation managed by that source. These events are first + processed by the federation engine and possibly altered based up= on contributions from other sources. (The federation + engine also uses these events to update or purge information in = the cache, which may add to the event set.) The + resulting (and possibly altered) event set is then sent to all c= lient listeners. + + + + + + Downloading the examples + JBoss DNA is built using Maven 2, so it's much easier to followi= ng along with the examples in this document if you + install and configure Maven. Once this is done, you can very easily = build the examples or even create a maven project that + depends on the JBoss DNA JARs. Maven will automatically download the= right versions of the JARs, including those other + libraries on which JBoss DNA depends. Maven also makes it very easy = to create an assembly of your final application so that + you can package into a distributable form. + + The examples created for this User Guide use Maven2 to achieve exact= ly this so it is highly recommended that you + download + these first and take a look at how they work. + + + + To build and run the examples you first need to install and config= ure Maven 2.0.7 available from + http://maven.apache.org/ + + Installation is performed by downloading and unzipping the mav= en-2.0.7-bin.zip file to a convenient + location on your local disk. Configuration consists of adding $MAV= EN_HOME/bin to your path and adding the following + profile to your ~/.m2/settings.xml file: + <settings> + <profiles> + <profile> + <id>jboss.repository</id> + <activation> + <property> = + <name>!jboss.repository.off</name> + </property> + </activation> + <repositories> + <repository> + <id>snapshots.jboss.org</id> + <url>http://snapshots.jboss.org/maven2</url> + <snapshots> + <enabled>true</enabled> + </snapshots> + </repository> + <repository> + <id>repository.jboss.org</id> + <url>http://repository.jboss.org/maven2</url> + <snapshots> + <enabled>false</enabled> + </snapshots> + </repository> + </repositories> + <pluginRepositories> + <pluginRepository> + <id>repository.jboss.org</id> + <url>http://repository.jboss.org/maven2</url> + <snapshots> + <enabled>false</enabled> + </snapshots> + </pluginRepository> + <pluginRepository> + <id>snapshots.jboss.org</id> + <url>http://snapshots.jboss.org/maven2</url> + <snapshots> + <enabled>true</enabled> + </snapshots> + </pluginRepository> + </pluginRepositories> + </profile> + </profiles> +</settings> + This profile informs maven of the two JBoss repositories (sn= apshots and releases) that are needed to download the JBoss Microcontainer = and dependant JARs. + + Once you have configured Maven and downloaded the examples the= n you can go to one of the following subdirectories in the examples/U= ser_Guide directory and enter mvn install to perform a = build: + + + gettingStarted - projects for creating and using a servic= e together with AOP + + + pojoDevelopment - examples of creating and configuring PO= JOs using XML and annotations + + + aopDevelopment - examples of using AOP to add behaviour to= POJOs + + + extending - examples of how we created various extensions= to the microcontainer by creating new dependencies + + + Instructions on how to run the individual examples can be foun= d in the corresponding parts of this guide. + + + Using JBoss DNA + + + + Custom sequencers + + + + Future directions + What's next for JBoss DNA? Well, sequencers are just the beginnin= g. = + Remember our architecture? + There are a lot of components on our roadmap, including federating + = + + Roadmap: We'll start on the federation engine as soon as 0.1 = is out. The 0.1 release will contain the sequencing + system, and while there will be one sequencer, we hope the community= will help build the ones they need. Serge has started + on a Java sequencer, and MetaMatrix is starting on a sequencer for M= etaMatrix models. Check out JIRA for the list of the + ones we've thought of. + Your need for a web UI is very typical, which is why we also wa= nt to create a web interface (and RESTful service) that + presents data using "domain-specific" views - that is, views that ar= e specific to the type of data and user role. For + example, if a user is viewing database information, the views should= be structured to show all the information for a table + and its columns, keys, and indexes. (This is in contrast with a "gen= eric" node-based view where there is one page that shows + the table and only links to the other columns, keys, etc. See http:/= /www.jcr-explorer.org/screenshots.html for an example of + a "generic" web UI.) + \ No newline at end of file --===============1672541724339409897==--