[hibernate-dev] [OGM] MongoDB time spikes

Wed Apr 25 06:31:29 EDT 2012

Hi,

What I was hoping to avoid was MongoDB shifting data in a collection to 
make room for new elements in a document (ie. RowKey), as it appears to 
be quite an expensive operation. I was noticing a connection between 
many of the big spikes in my results and the following log entries in 
mongodb.log (it will log queries >100ms by default). Most of the long 
updates had the "moved" key set on the log entry. eg:

update ogm_test_database.associations_BlogEntry query: { _id: 
ObjectId('4f957de9ca8af8159d604763') } update: { $push: { rows: { table: 
"BlogEntry", columns: { id: 52105 }, tuple: { id: 52105, author_a_id: 
15134 } } } } idhack:1 *moved:1 1831ms*

Since then I've read the following page, which explains that an amount 
of padding is added using a heuristic. This is calculated based on 
if/how much you grow documents in a collection during its lifetime. So, 
for "real world" use maybe this isn't such a problem. In the performance 
tests we work with an empty collection so there tends to be a number of 
these spikes before the heuristic kicks in

http://www.mongodb.org/display/DOCS/Padding+Factor

On 04/25/2012 08:13 AM, Emmanuel Bernard wrote:
> Hi Alan and all,
>
> I have been researching the spikes issue you encounter in the stress test from a theoretical point of view.
> You were trying a different associations storage approach (splitting associations as one row per document rather than the whole association per document). Does that return better results?
>
> I am skeptical for a few reasons. MongoDB has a global write lock per mongod process (they are working on a more fine grained solution). So if the spikes are due to lock contention, shuffling data won't help much. Also make sure you use MongoDB 2.0 instead of 1.8 as they yield lock on page fault which should solve a lot of these spikes problems.
>
> I have found this blog entry to be quite insightful http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html
>
> Generally speaking, if all data can stay in memory, MongoDB should behave wonderfully.
>
> Which leads to my demo and the time difference between Infinispan 5s and Mongodb 20s. I can see several reasons:
>
> - we don't really batch operations in the mongodb dialect and we should.
>    We should accumulate operations and apply them at the end of the flush operation in one batch. That will require some new infrastructure from OGM's engine though
>    to tell the dialect when to "flush".
> - my VM might swap memory on disk which would explain the difference
> - or it could be that Infinispan is simply 4 times faster which would not be too surprising as Infinispan is in-process.
>
> Emmanuel