[infinispan-dev] Kubernetes/OpenShift Rolling updates and configuration changes

Tue Jul 19 04:08:58 EDT 2016

Considering very few options can be changed at runtime safely, should we
rather focus of a strategy where we start a new grid and populate it
with the old grid before flipping the proxy to the new one?

On Mon 2016-07-18 17:12, Tristan Tarrant wrote:
> On 14/07/16 12:17, Sebastian Laskawiec wrote:
> > Hey!
> >
> > I've been thinking about potential use of Kubernetes/OpenShift 
> > (OpenShift = Kubernetes + additional features) Rolling Update 
> > mechanism for updating configuration of Hot Rod servers. You might 
> > find some more information about the rolling updates here [1][2] but 
> > putting it simply, Kubernetes replaces nodes in the cluster one at a 
> > time. What's worth mentioning, Kubernetes ensures that the newly 
> > created replica is fully operational before taking down another one.
> >
> > There are two things that make me scratching my head...
> >
> > #1 - What type of configuration changes can we introduce using rolling 
> > updates?
> >
> > I'm pretty sure introducing a new cache definition won't do any harm. 
> > But what if we change a cache type from Distributed to Replicated? Do 
> > you have any idea which configuration changes are safe and which are 
> > not? Could come up with such list?
> Very few changes are safe, but obviously this would need to be verified 
> on a per-attribute basis. All of the attributes which can be changed at 
> runtime (timeouts, eviction size) are safe.
> 
> >
> > #2 - How to prevent loosing data during the rolling update process?
> I believe you want to write losing :)
> > In Kubernetes we have a mechanism called lifecycle hooks [3] (we can 
> > invoke a script during container startup/shutdown). The problem with 
> > shutdown script is that it's time constrained (if it won't end up 
> > within certain amount of time, Kubernetes will simply kill the 
> > container). Fortunately this time is configurable.
> >
> > The idea to prevent from loosing data would be to invoke (enquque and 
> > wait for finish) state transfer process triggered by the shutdown hook 
> > (with timeout set to maximum value). If for some reason this won't 
> > work (e.g. a user has so much data that migrating it this way would 
> > take ages), there is a backup plan - Infinispan Rolling Upgrades [4].
> The thing that concerns me here is the amount of churn involved: the 
> safest bet for us is that the net topology doesn't change, i.e. you end 
> up with the exact number of nodes you started with and they are replaced 
> one by one in a way that the replacement assumes the identity of the 
> replaced (both as persistent uuid, owned segments and data in a 
> persistent store). Other types could be supported but they will 
> definitely have a level of risk.
> Also we don't have any guarantees that a newer version will be able to 
> cluster with an older one...
> 
> Tristan
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev