]
Dan Berindei updated ISPN-8962:
-------------------------------
Status: Pull Request Sent (was: Reopened)
Git Pull Request:
PreferAvailabilityStrategy: Rely less on the stable topology
------------------------------------------------------------
Key: ISPN-8962
URL:
https://issues.jboss.org/browse/ISPN-8962
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 9.2.0.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 9.2.2.Final, 9.3.0.Alpha1
{{PreferAvailabilityStrategy}} checks the size of the stable topology, and only considers
cache topologies that are derived from the biggest topology (in size) when picking a
post-merge topology.
Unfortunately, in some situations this algorithm fails pretty badly. If a node has a very
long GC pause, when it comes back it will report the old topology *and* the old stable
topology. If the rest of the cluster rebalanced, it now has both a smaller current
topology and a smaller stable topology.
Furthermore, the stable topology is updated asynchronously, independent from the current
topology. So even if there's a split and the minority partition installs a current
topology with fewer members, it may take some time for its stable topology to be updated
with fewer members. In fact, it appears that when a rebalance is not needed (e.g. because
the partition has a single node), the stable topology is never updated!