]
Galder Zamarreño updated ISPN-5876:
-----------------------------------
Fix Version/s: 5.2.15.Final
Pre-commit cache invalidation creates stale cache vulnerability
---------------------------------------------------------------
Key: ISPN-5876
URL:
https://issues.jboss.org/browse/ISPN-5876
Project: Infinispan
Issue Type: Bug
Components: Eviction
Affects Versions: 5.2.7.Final
Reporter: Stephen Fikes
Assignee: Galder Zamarreño
Fix For: 5.2.15.Final, 8.1.0.Beta1, 8.1.0.Final
In a cluster where Infinispan serves as the level 2 cache for Hibernate (configured for
invalidation), because invalidation requests for modified entities are sent *before*
database commit, it is possible for nodes receiving the invalidation request to perform
eviction and then (due to "local" read requests) reload the evicted entities
prior to the time the database commit takes place in the server where the entity was
modified.
Consequently, other servers in the cluster may contain data that remains stale until a
subsequent change in another server or until the entity times out from lack of use.
It isn't easy to write a testcase for this - it required manual intervention to
reproduce - but can be seen with any entity class, cluster, etc. (at least using Oracle -
results may vary with specific databases) so I've not attached a testcase. The issue
can be seen/understood by code inspection (i.e. the timing of invalidation vs. database
commit). That said, my test consisted of a two node cluster and I used Byteman rules to
delay database commit of a change to an entity (with an optimistic version property) long
enough in "server 1" for eviction to complete and a subsequent re-read (by a
worker thread on behalf of an EJB) to take place in "server 2". Following the
re-read in "server 2", I the database commit proceeds in "server 1"
and "server 2" now has a stale copy of the entity in cache.
One option is pessimistic locking which will block any read attempt until the DB commit
completes. It is not feasible, however, for many applications to use pessimistic locking
for all reads as this can have a severe impact on concurrency - and is the reason for
using optimistic version control. But due to the early timing of invalidation broadcast
(*before* database commit, while the data is not yet stale), optimistic locking is
insufficient to guard against "permanently" stale data. We did see that some
databases default to blocking repeatable reads even outside of transactions and without
explicit lock requests. Oracle does not provide such a mode. So, all reads must be
implemented to use pessimistic locks (which must be enclosed in explicit transactions -
(b)locking reads are disallowed when autocommit=true in Oracle) and this could require
significant effort (re-writes) to use pessimistic reads throughout - in addition to the
performance issues this can introduce.
If broadcast of an invalidation message always occurs *after* database commit, optimistic
control attributes are sufficient to block attempts to write stale data and though a few
failures may occur (as they would in a single server with multiple active threads), it can
be known that the stale data will be removed in some finite period.