Re: [hibernate-dev] [OGM] MongoDB time spikes

Wednesday, 25 April 2012

Hi,

What I was hoping to avoid was MongoDB shifting data in a collection to 
make room for new elements in a document (ie. RowKey), as it appears to 
be quite an expensive operation. I was noticing a connection between 
many of the big spikes in my results and the following log entries in 
mongodb.log (it will log queries >100ms by default). Most of the long 
updates had the "moved" key set on the log entry. eg:

update ogm_test_database.associations_BlogEntry query: { _id: 
ObjectId('4f957de9ca8af8159d604763') } update: { $push: { rows: { table: 
"BlogEntry", columns: { id: 52105 }, tuple: { id: 52105, author_a_id: 
15134 } } } } idhack:1 *moved:1 1831ms*

Since then I've read the following page, which explains that an amount 
of padding is added using a heuristic. This is calculated based on 
if/how much you grow documents in a collection during its lifetime. So, 
for "real world" use maybe this isn't such a problem. In the performance 
tests we work with an empty collection so there tends to be a number of 
these spikes before the heuristic kicks in

http://www.mongodb.org/display/DOCS/Padding+Factor

On 04/25/2012 08:13 AM, Emmanuel Bernard wrote:
...
 Hi Alan and all,

 I have been researching the spikes issue you encounter in the stress test from a
theoretical point of view.
 You were trying a different associations storage approach (splitting associations as one
row per document rather than the whole association per document). Does that return better
results?

 I am skeptical for a few reasons. MongoDB has a global write lock per mongod process
(they are working on a more fine grained solution). So if the spikes are due to lock
contention, shuffling data won't help much. Also make sure you use MongoDB 2.0 instead
of 1.8 as they yield lock on page fault which should solve a lot of these spikes
problems.

 I have found this blog entry to be quite insightful
http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html

 Generally speaking, if all data can stay in memory, MongoDB should behave wonderfully.

 Which leads to my demo and the time difference between Infinispan 5s and Mongodb 20s. I
can see several reasons:

 - we don't really batch operations in the mongodb dialect and we should.
    We should accumulate operations and apply them at the end of the flush operation in
one batch. That will require some new infrastructure from OGM's engine though
    to tell the dialect when to "flush".
 - my VM might swap memory on disk which would explain the difference
 - or it could be that Infinispan is simply 4 times faster which would not be too
surprising as Infinispan is in-process.

 Emmanuel 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] [OGM] MongoDB time spikes