[hibernate-issues] [JIRA] (HHH-15706) ETAG-like transactional local caching for single entity graph queries

Friday, 11 November 2022

Christian Beikov (
https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%...
) *created* an issue

Hibernate ORM (
https://hibernate.atlassian.net/browse/HHH?atlOrigin=eyJpIjoiMjRhNDBkMWU5...
) / New Feature (
https://hibernate.atlassian.net/browse/HHH-15706?atlOrigin=eyJpIjoiMjRhND...
) HHH-15706 (
https://hibernate.atlassian.net/browse/HHH-15706?atlOrigin=eyJpIjoiMjRhND...
) ETAG-like transactional local caching for single entity graph queries (
https://hibernate.atlassian.net/browse/HHH-15706?atlOrigin=eyJpIjoiMjRhND...
)

Issue Type: New Feature Assignee: Unassigned Components: hibernate-core Created:
11/Nov/2022 05:34 AM Fix Versions: 6.wishlist Priority: Major Reporter: Christian Beikov (
https://hibernate.atlassian.net/secure/ViewProfile.jspa?accountId=557058%...
)

I just had a chat with Sanne about transactional caches and how Infinispan uses a two
phase protocol to ensure “repeatable read” semantics. To support transactional *local*
caches (which are useful in general but necessary for Quarkus), we need some trickery
which I want to outline here:

* The following only works on a per row basis, so byId lookups and byId queries (without
limit/offset)
* Every table that is involved in a query needs an optimistic version
* Changes to join tables must increment the optimistic version of the owner (or both
sides?)

When this is given, on read, we can use the following protocol to reduce load on the
database and network

* Get the local cache entry. If it doesn’t exist, just run the query
* Extract the versions of all involved tables from the cache entry
* Query the database with ORIGINAL_QUERY AND EXISTS(VERSION_CHECK_QUERY)
* If the result is empty, the cache entry was current, so use that
* If there is a result, use that as JDBC result

The VERSION_CHECK_QUERY looks roughly like this:

select 1 from table1 t where t.id = :t1_id and t.version > :t1_version
union all
select 1 from tableN t where t.id = :tN_id and t.version > :tN_version

This query will return at least one row, when the data read from the cache is stale and
return no rows when it is current. Since it is embedded in an EXISTS predicate, the main
query will return no results when the cache entry is current and otherwise will return
actual results.

On write, we only need to specially handle queries that use the query cache, because for
entities, the version is naturally part of the cache entry already.

To support this in the query cache we need to alter the created SQL-AST a bit.
Specifically, we need to add SqlSelection (without corresponding DomainResult ) at the end
of the select item list to select the versions of the involved tables. The premise is,
that the query has a single root, the only parameter the query contains is used to filter
by root id, and the query contains no subqueries or aggregation. This should cover the
most common cases of simple projections.

The query cache entry will then contain the versions which, on read, we can then extract
again and run the same query as shown above.

Supporting queries with more parameters requires that we maintain a version per
parametrization, which we can investigate in the future. One idea that comes to mind is
that we could have a table e.g. query_cache_tombstones(query, params, version) into which
we insert a tombstone version as part of a transaction that modifies rows that might be
part of a query cache parametrization.

The query cache entry is equipped with a version. The query which should be cached
receives an additional SqlSelection , e.g. 1+coalesce((select max(version) from
query_cache_tombstones where query = '...' and params = '...'),0). Maybe
this query can also be executed separately or we can encode that with union somehow. The
point is, that this version is then added to the query cache entry, and then on read we
can use ORIGINAL_QUERY WHERE NOT EXISTS(select 1 from query_cache_tombstones where query =
'...' and param = '...' and version > :version).

Obviously, we need a better encoding for the query than the query string itself and
possibly also a good encoding for parameters.

Idea for supporting subqueries: Hoist them into the from clause, possibly adding a lateral
, and then passing through the table row versions again.

Idea for supporting aggregations: Have an additional table aggregation_results that
records the row primary keys and versions of the aggregated rows. We can keep track of the
versions by adding an array_agg or listagg to the original query. On read, we again look
for tombstones. On write of entities, we have to insert tombstones if we find that there
is an intersection with aggregation_results. On insert or delete, we’d probably have to
insert tombstones unconditionally.

(
https://hibernate.atlassian.net/browse/HHH-15706#add-comment?atlOrigin=ey...
) Add Comment (
https://hibernate.atlassian.net/browse/HHH-15706#add-comment?atlOrigin=ey...
)

Get Jira notifications on your phone! Download the Jira Cloud app for Android (
https://play.google.com/store/apps/details?id=com.atlassian.android.jira....
) or iOS (
https://itunes.apple.com/app/apple-store/id1006972087?pt=696495&ct=Em...
) This message was sent by Atlassian Jira (v1001.0.0-SNAPSHOT#100210- sha1:583150f )

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[hibernate-issues] [JIRA] (HHH-15706) ETAG-like transactional local caching for single entity graph queries