Hi,
What I was hoping to avoid was MongoDB shifting data in a collection to
make room for new elements in a document (ie. RowKey), as it appears to
be quite an expensive operation. I was noticing a connection between
many of the big spikes in my results and the following log entries in
mongodb.log (it will log queries >100ms by default). Most of the long
updates had the "moved" key set on the log entry. eg:
update ogm_test_database.associations_BlogEntry query: { _id:
ObjectId('4f957de9ca8af8159d604763') } update: { $push: { rows: { table:
"BlogEntry", columns: { id: 52105 }, tuple: { id: 52105, author_a_id:
15134 } } } } idhack:1 *moved:1 1831ms*
Since then I've read the following page, which explains that an amount
of padding is added using a heuristic. This is calculated based on
if/how much you grow documents in a collection during its lifetime. So,
for "real world" use maybe this isn't such a problem. In the performance
tests we work with an empty collection so there tends to be a number of
these spikes before the heuristic kicks in
http://www.mongodb.org/display/DOCS/Padding+Factor
On 04/25/2012 08:13 AM, Emmanuel Bernard wrote:
Hi Alan and all,
I have been researching the spikes issue you encounter in the stress test from a
theoretical point of view.
You were trying a different associations storage approach (splitting associations as one
row per document rather than the whole association per document). Does that return better
results?
I am skeptical for a few reasons. MongoDB has a global write lock per mongod process
(they are working on a more fine grained solution). So if the spikes are due to lock
contention, shuffling data won't help much. Also make sure you use MongoDB 2.0 instead
of 1.8 as they yield lock on page fault which should solve a lot of these spikes
problems.
I have found this blog entry to be quite insightful
http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html
Generally speaking, if all data can stay in memory, MongoDB should behave wonderfully.
Which leads to my demo and the time difference between Infinispan 5s and Mongodb 20s. I
can see several reasons:
- we don't really batch operations in the mongodb dialect and we should.
We should accumulate operations and apply them at the end of the flush operation in
one batch. That will require some new infrastructure from OGM's engine though
to tell the dialect when to "flush".
- my VM might swap memory on disk which would explain the difference
- or it could be that Infinispan is simply 4 times faster which would not be too
surprising as Infinispan is in-process.
Emmanuel