I just had a chat with Sanne about transactional caches and how Infinispan uses a two phase protocol to ensure “repeatable read” semantics. To support transactional local caches (which are useful in general but necessary for Quarkus), we need some trickery which I want to outline here:
- The following only works on a per row basis, so byId lookups and byId queries (without limit/offset)
- Every table that is involved in a query needs an optimistic version
- Changes to join tables must increment the optimistic version of the owner (or both sides?)
When this is given, on read, we can use the following protocol to reduce load on the database and network
- Get the local cache entry. If it doesn’t exist, just run the query
- Extract the versions of all involved tables from the cache entry
- Query the database with ORIGINAL_QUERY AND EXISTS(VERSION_CHECK_QUERY)
- If the result is empty, the cache entry was current, so use that
- If there is a result, use that as JDBC result
The VERSION_CHECK_QUERY looks roughly like this:
This query will return at least one row, when the data read from the cache is stale and return no rows when it is current. Since it is embedded in an EXISTS predicate, the main query will return no results when the cache entry is current and otherwise will return actual results. On write, we only need to specially handle queries that use the query cache, because for entities, the version is naturally part of the cache entry already. To support this in the query cache we need to alter the created SQL-AST a bit. Specifically, we need to add SqlSelection (without corresponding DomainResult) at the end of the select item list to select the versions of the involved tables. The premise is, that the query has a single root, the only parameter the query contains is used to filter by root id, and the query contains no subqueries or aggregation. This should cover the most common cases of simple projections. The query cache entry will then contain the versions which, on read, we can then extract again and run the same query as shown above. Supporting queries with more parameters requires that we maintain a version per parametrization, which we can investigate in the future. One idea that comes to mind is that we could have a table e.g. query_cache_tombstones(query, params, version) into which we insert a tombstone version as part of a transaction that modifies rows that might be part of a query cache parametrization. The query cache entry is equipped with a version. The query which should be cached receives an additional SqlSelection, e.g. 1+coalesce((select max(version) from query_cache_tombstones where query = '...' and params = '...'),0). Maybe this query can also be executed separately or we can encode that with union somehow. The point is, that this version is then added to the query cache entry, and then on read we can use ORIGINAL_QUERY WHERE NOT EXISTS(select 1 from query_cache_tombstones where query = '...' and param = '...' and version > :version). Obviously, we need a better encoding for the query than the query string itself and possibly also a good encoding for parameters. Idea for supporting subqueries: Hoist them into the from clause, possibly adding a lateral, and then passing through the table row versions again. Idea for supporting aggregations: Have an additional table aggregation_results that records the row primary keys and versions of the aggregated rows. We can keep track of the versions by adding an array_agg or listagg to the original query. On read, we again look for tombstones. On write of entities, we have to insert tombstones if we find that there is an intersection with aggregation_results. On insert or delete, we’d probably have to insert tombstones unconditionally. |