[infinispan-dev] Infinispan and OpenShift/Kubernetes PetSets

Sebastian Laskawiec slaskawi at redhat.com
Wed Aug 24 03:06:39 EDT 2016


Thanks Rob! That clarifies a lot!

On Mon, Aug 22, 2016 at 4:02 PM, Rob Cernich <rcernich at redhat.com> wrote:

>
>
> ------------------------------
>
> Hey Rob!
>
> Thanks a lot for clarification!
>
> More comments inlined.
>
> Thanks
> Sebastian
>
> On Sat, Aug 20, 2016 at 12:04 AM, Rob Cernich <rcernich at redhat.com> wrote:
>
>> A couple of things...
>>
>> re. volumes:
>> We also need to consider the mounting behavior for scale down scenarios
>> and for overage scenarios when doing upgrades.  For the latter, OpenShift
>> can spin up pods of the new version before the older version pods have
>> terminated.  This may mean that some volumes from the old pods are
>> orphaned.  We did see this when testing A-MQ during upgrades.  With a
>> single pod, the upgrade process caused the new version to have a new mount
>> and the original mount was left orphaned (another upgrade would cause the
>> newer pod to pick up the orphaned mount, leaving the new mount orphaned).
>> I believe we worked around this by specifying an overage of 0% during
>> upgrades.  This ensured the new pods would pick up the volumes left behind
>> by the old pods.  (Actually, we were using subdirectories in the mount,
>> since all pods shared the same volume.)
>>
>>
> I think PetSets try to address this kind of problems. According to the
> manual page [11], the storage is linked to the Pod ordinal and hostaname
> and should be stable.
>
> [11] http://kubernetes.io/docs/user-guide/petset/#when-to-use-pet-set
>
>>
>> re. dns:
>> DNS should work fine as-is, but there are a couple things that you need
>> to consider.
>> 1. Service endpoints are only available in DNS after the pod becomes
>> ready (SVC records on the service name).  Because infinispan attaches
>> itself to the cluster, this meant pods were all started as cluster of one,
>> then merged once they noticed the other pods.  This had a significant
>> impact on startup.  Since then, OpenShift has added the ability to query
>> the endpoints associated with a service as soon as the pod is created,
>> which would allow initialization to work correctly.  To make this work,
>> we'd have to change the form of the DNS query to pick up the service
>> endpoints (I forget the naming scheme).
>>
>
> Yes, I agree. Adding nodes one after another will have significant impact
> on cluster startup time. However it should be safe to query the cluster
> (and even put data) during rebalance. So I would say, if a node is up, and
> cluster is not damaged - we should treat it as ready.
>
> To be clear, the issue was around the lifecycle and interaction with the
> readiness probe.  OpenShift/Kubernetes only add the pod to the service once
> it's "ready."  Our readiness probe defines ready as startup is complete
> (i.e. x of y services started).  The issue with this is that as pods come
> up, they only see other pods that are ready when they initialize, so if you
> start with an initial cluster size of five, the new nodes don't see any of
> the other nodes until they finish startup and refresh the list of nodes,
> after which they all have to merge with each other.  This had a significant
> impact on performance.  Since then, a feature has been added which allows
> you to query DNS for all endpoints regardless of ready state.  This allows
> nodes to see the other nodes before they become ready, which allows for a
> more natural cluster formation.  To reiterate, the impact on startup was
> significant, especially when scaling up under load.
>
>
> NB - I proposed a HealthCheck API to Infinispan 9 (currently under
> development) [12][13]. The overall cluster health can be in one of 3
> statuses - GREEN (everything is fine), YELLOW (rebalance in progress), RED
> (cluster not healthy). Kubernetes/OpenShift readiness probe should check if
> the status is GREEN or YELLOW. The HealthCheck API is attached to the WF
> management API so you can query it with CURL or using ispn_cli.sh script.
>
> [12] https://github.com/infinispan/infinispan/wiki/Health-check-API
> [13] https://github.com/infinispan/infinispan/pull/4499
>
>>
>> Another thing to keep in mind is that looking up pods by labels allows
>> any pod with the specified label to be added to the cluster.  I'm not sure
>> of a use case for this, but it would allow other deployments to be included
>> in the cluster.  (You could also argue that the service is the authority
>> for this and any pod with said label would be added as a service endpoint,
>> thus achieving the same behavior...probably more simply too.)
>>
>
> I think this is a scenario when someone might try to attach Infinispan in
> library mode (a dependency in WAR file for example) to the Hot Rod cluster.
> Gustavo answered question like this a while ago [14].
>
> [14] https://developer.jboss.org/message/961568
>
>> Lastly, DNS was a little flaky when we first implemented this, which was
>> part of the reason we went straight to kubernetes.  Users were using
>> dnsmasq with wildcards that worked well for routes, but ended up routing
>> services to the router ip instead of pod ip.  Needless to say, there were a
>> lot of complications trying to use DNS and debug user problems with service
>> resolution.
>>
>
> I think a governing headless service [15] is required here (PetSets
> require a service but considering how Infinispan works, it should be a
> headless service in my opinion).
>
> [15] http://kubernetes.io/docs/user-guide/services/#headless-services
>
>>
>>
>> Hope that helps,
>> Rob
>>
>> ------------------------------
>>
>> Hey Bela!
>>
>> No no, the resolution can be done with pure JDK.
>>
>> Thanks
>> Sebastian
>>
>> On Fri, Aug 19, 2016 at 11:18 AM, Bela Ban <bban at redhat.com> wrote:
>>
>>> Hi Sebastian
>>>
>>> the usual restrictions apply: if DNS discovery depends on external libs,
>>> then it should be hosted in jgroups-extras, otherwise we can add it to
>>> JGroups itself.
>>>
>>> On 19/08/16 11:00, Sebastian Laskawiec wrote:
>>>
>>>> Hey!
>>>>
>>>> I've been playing with Kubernetes PetSets [1] for a while and I'd like
>>>> to share some thoughts. Before I dig in, let me give you some PetSets
>>>> highlights:
>>>>
>>>>   * PetSets are alpha resources for managing stateful apps in Kubernetes
>>>>     1.3 (and OpenShift Origin 1.3).
>>>>   * Since this is an alpha resource, there are no guarantees about
>>>>     backwards compatibility. Alpha resources can also be disabled in
>>>>     some public cloud providers (you can control which API versions are
>>>>     accessible [2]).
>>>>   * PetSets allows starting pods in sequence (not relevant for us, but
>>>>     this is a killer feature for master-slave systems).
>>>>   * Each Pod has it's own unique entry in DNS, which makes discovery
>>>>     very simple (I'll dig into that a bit later)
>>>>   * Volumes are always mounted to the same Pods, which is very important
>>>>     in Cache Store scenarios when we restart pods (e.g. Rolling Upgrades
>>>>     [3]).
>>>>
>>>> Thoughts and ideas after spending some time playing with this feature:
>>>>
>>>>   * PetSets make discovery a lot easier. It's a combination of two
>>>>     things - Headless Services [4] which create multiple A records in
>>>>     DNS and predictable host names. Each Pod has it's own unique DNS
>>>>     entry following pattern: {PetSetName}-{PodIndex}.{ServiceName} [5].
>>>>     Here's an example of an Infinispan PetSet deployed on my local
>>>>     cluster [6]. As you can see we have all domain names and IPs from a
>>>>     single DNS query.
>>>>   * Maybe we could perform discovery using this mechanism? I'm aware of
>>>>     DNS discovery implemented in KUBE_PING [7][8] but the code looks
>>>>     trivial [9] so maybe it should be implement inside JGroups? @Bela -
>>>>     WDYT?
>>>>   * PetSets do not integrate well with OpenShift 'new-app' command. In
>>>>     other words, our users will need to use provided yaml (or json)
>>>>     files to create Infinispan cluster. It's not a show-stopper but it's
>>>>     a bit less convenient than 'oc new-app'.
>>>>   * Since PetSets are alpha resources they need to be considered as
>>>>     secondary way to deploy Infinispan on Kubernetes and OpenShift.
>>>>   * Finally, the persistent volumes - since a Pod always gets the same
>>>>     volume, it would be safe to use any file-based cache store.
>>>>
>>>> If you'd like to play with PetSets on your local environment, here are
>>>> necessary yaml files [10].
>>>>
>>>> Thanks
>>>> Sebastian
>>>>
>>>>
>>>> [1] http://kubernetes.io/docs/user-guide/petset/
>>>> [2] For checking which APIs are accessible, use 'kubectl api-versions'
>>>> [3]
>>>> http://infinispan.org/docs/stable/user_guide/user_guide.
>>>> html#_Rolling_chapter
>>>> [4] http://kubernetes.io/docs/user-guide/services/#headless-services
>>>> [5] http://kubernetes.io/docs/user-guide/petset/#peer-discovery
>>>> [6] https://gist.github.com/slaskawi/0866e63a39276f8ab66376229716a676
>>>> [7] https://github.com/jboss-openshift/openshift-ping/tree/master/dns
>>>> [8] https://github.com/jgroups-extras/jgroups-kubernetes/
>>>> tree/master/dns
>>>> [9] http://stackoverflow.com/a/12405896/562699
>>>> [10] You might need to adjust ImageStream.
>>>> https://gist.github.com/slaskawi/7cffb5588dabb770f654557579c5f2d0
>>>>
>>>
>>> --
>>> Bela Ban, JGroups lead (http://www.jgroups.org)
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20160824/9ae5248c/attachment.html 


More information about the infinispan-dev mailing list