[hibernate-dev] Memory consumption

Andrej Golovnin golovnin at gmx.net
Thu May 17 12:53:50 EDT 2012


Hi Steve,

> Deferring initialization of these batch loaders is one option, but that just really delays the memory consumption.  Personally, if it were my app I'd rather that sizing be done up front rather than over the course of the next few hours or weeks as the system continues to run.

The question is, if you really need all LockModes for an entity. I personally doubt it.

> 
> Its been proposed a number of times to instead generate just a single loader for loading that number of items.  This would mean generating a single SQL statement that has parameter holders for the full batch size and then somehow "filling out" the empty slots when the number of things to load is less that 16.  If this could be made workable across all dialects, I personally think it would be the best approach as it probably gets close to the start up sizings you claimed but would never grow beyond that.  But we would need to discuss this to see if it is a real possibility.  Lets work an example with batch-size = 3.  That would produce SQL like "select a.x, a.y, a.z from a a where a.x = ? or a.x = ? or a.x = ?"

I think you mean "select a.x, a.y, a.z from a a where a.x in (?, ?, ?)".
At least this is what Hibernate generates now.
The first rule I learned in SQL: avoid "OR" as much as possible
in WHERE clause. Query optimizers produce better query execution
plans for "IN" as for equivalent "OR"-statements.

> 
> So for a case where we have 2 things to batch load, how do we best leverage that single statement?  Which really comes down to how we handle the the 3rd parameter binding.
> 
> I have seen simply reusing one of the keys as a proposal, but that will not work.  Null should work, but would it work in all cases against all dialects?

Both, reusing one of the keys and NULL, should work.
Following queries must produce always the same results
in any DBMS, which claims to support SQL:

"select a.x, a.y, a.z from a a where a.x in (1, 2, 1)"
"select a.x, a.y, a.z from a a where a.x in (1, 2, null)"
"select a.x, a.y, a.z from a a where a.x in (1, 2)"

The only difference may be in the costs for the query calculated by the optimizer.
But the query execution time should be the same.

Example from PostgreSQL:

Table:
CREATE TABLE test (
  oid integer NOT NULL,
  ttt character varying(10),
  CONSTRAINT test_pkey PRIMARY KEY (oid )
);

Number of rows: 100000

Query: SELECT * FROM test WHERE oid IN (100, 101, 102);
Explain output:
"Index Scan using test_pkey on public.test  (cost=0.00..16.91 rows=3 width=9) (actual time=0.025..0.035 rows=3 loops=1)"
"  Output: oid, ttt"
"  Index Cond: (test.oid = ANY ('{100,101,102}'::integer[]))"
"  Buffers: shared hit=7"

Query: SELECT * FROM test WHERE oid IN (100, 101, 102, 100, 100, 100);
"Index Scan using test_pkey on public.test  (cost=0.00..29.72 rows=6 width=9) (actual time=0.024..0.034 rows=3 loops=1)"
"  Output: oid, ttt"
"  Index Cond: (test.oid = ANY ('{100,101,102,100,100,100}'::integer[]))"
"  Buffers: shared hit=7"

Explain output:
Query: SELECT * FROM test WHERE oid IN (100, 101, 102, null, null, null);
"Index Scan using test_pkey on public.test  (cost=0.00..29.69 rows=3 width=9) (actual time=0.023..0.033 rows=3 loops=1)"
"  Output: oid, ttt"
"  Index Cond: (test.oid = ANY ('{100,101,102,NULL,NULL,NULL}'::integer[]))"
"  Buffers: shared hit=7"

As you see the difference is only in the calculated costs.
If you want, I could show on monday explain output from MySQL 5.5 and Oracle 11g.

Best regards
Andrej Golovnin


More information about the hibernate-dev mailing list