I saw this within my first tests of Jackrabbit. I created several
load tests to create wide, deep, and wide+deep trees (of varying
sizes). Once I new my tests worked correctly, I cranked up the sizes
and waited. And waited. And waited some more. The times for the
even moderately wide trees were unimpressive.
Sometimes its nice to not be first - we're very lucky to be able to
learn from their mistakes. :-)
On Jun 12, 2008, at 8:15 PM, Michael Neale wrote:
Thats great ! Yes jackrabbit has some serious limitations in its
persistence. It seems to work OK for deep trees, but even so, there
are architectural challenges with this.
It sounds ilke you guys have seen this and are tacklilng it head on -
which is great news ! keep up the good work !
On Fri, Jun 13, 2008 at 10:46 AM, Randall Hauch <rhauch(a)redhat.com>
wrote:
> DNA is specially addressing this in a couple of ways. First, the
> core
> interfaces inside the federation engine address this by using
> iterators, and
> providing certain guarantees about using those iterators (e.g.,
> within the
> same transaction). This means that the federation engine will not
> require
> all children to be in memory. We may also provide more optimized
> behaviors,
> such as an explicit paging mechanism for accessing the children of
> a node.
>
> Second, we'll be developing a connector
> (
http://jira.jboss.org/jira/browse/DNA-40) that stores its
> information in a
> relational database, and this will not be storing all children in
> one chunk.
> (Jackrabbit does this, so if you add a 1001st child node, the whole
> list of
> 1001 children has to be written to the store. Unfortunately for
> them, this
> is designed into the foundation for their persistence layer.)
>
> On Jun 12, 2008, at 6:45 PM, Michael Neale wrote:
>
>> I guess I am thinking of the case of a large volume of small
>> "children" nodes - specifically. As you say, it can be a problem -
>> will DNA help with this at all?
>>
>> On Fri, Jun 13, 2008 at 3:58 AM, Randall Hauch <rhauch(a)redhat.com>
>> wrote:
>>>
>>> You're right that JCR handles heterogeneous data better than almost
>>> anything
>>> else, especially when the information structure changes/evolves
>>> over
>>> time.
>>>
>>> And I thought the InfoQ architecture was brilliant - use multiple
>>> independent JCRs for infrequently changing data, eliminating the
>>> need to
>>> create/maintain/scale a cluster. Very elegant and simple
>>> solution. And
>>> in
>>> this particular case, it doesn't really matter if there is a slight
>>> difference in content among the different machines during the
>>> push of new
>>> information to the independent repositories.
>>>
>>> But can you elaborate on your thought that JCR might not be
>>> useful for
>>> transactional data?
>>>
>>> IMO, JCR is useful in a lot of situations, and of course it is
>>> limited in
>>> others. Right now, the implementations don't do clustering or
>>> very large
>>> repositories well. Most impls also seem to be limited in the
>>> efficient
>>> handling of large numbers of children for any given node.
>>> Incorporation
>>> of
>>> information outside of JCR is also difficult, as it has to be
>>> done above
>>> JCR
>>> - although DNA will change this. But I'm not sure that frequently
>>> changing
>>> data is universally a limitation. Perhaps frequent additions of
>>> large
>>> volumes of data are a problem because you quickly get to volumes
>>> of data
>>> that are too large. Or frequent changes to data may be a problem
>>> if
>>> versioning is used, as it could quickly lead to unusable numbers of
>>> versions.
>>>
>>>
>>> On Jun 10, 2008, at 9:45 PM, Michael Neale wrote:
>>>
>>>> JCR seems to have a lot of traction I have noticed. Certainly
>>>> seems to
>>>> be the default choice now for heterogenous data. And data is
>>>> increasingly heterogenous.
>>>>
>>>> I guess my only thoughts on it are its limitations: should JCR
>>>> *not*
>>>> be used for transactional data - ie feeds of incoming data that
>>>> change
>>>> often?
>>>>
>>>> On Wed, Jun 11, 2008 at 12:11 PM, Randall Hauch
>>>> <rhauch(a)redhat.com>
>>>> wrote:
>>>>>
>>>>> There have been a couple of recent articles on InfoQ about JCR
>>>>> and/or
>>>>> REST.
>>>>> In case you haven't seen them, they're all worth a good
read.
>>>>>
>>>>> Interview with David Nuescheler, from Day Software:
>>>>>
http://www.infoq.com/articles/nuescheler-jcr-rest
>>>>> InfoQ architecture and use of JCR:
>>>>>
http://www.infoq.com/presentations/design-and-architecture-of-infoq
>>>>>
>>>>> Best regards,
>>>>> Randall
>>>>> _______________________________________________
>>>>> dna-dev mailing list
>>>>> dna-dev(a)lists.jboss.org
>>>>>
https://lists.jboss.org/mailman/listinfo/dna-dev
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Michael D Neale
>>>> home:
www.michaelneale.net
>>>> blog:
michaelneale.blogspot.com
>>>
>>>
>>
>>
>>
>> --
>> Michael D Neale
>> home:
www.michaelneale.net
>> blog:
michaelneale.blogspot.com
>
>
--
Michael D Neale
home:
www.michaelneale.net
blog:
michaelneale.blogspot.com