[keycloak-dev] Infinispan error during update in ha configuration

Mon Jul 8 10:31:55 EDT 2019

I've estimated the support of ReplicaSets and it looks like possible to
implement in 1 day. So yes, I start working on it.

пн, 8 июл. 2019 г. в 15:28, Sebastian Laskawiec <slaskawi at redhat.com>:

> oooh right, I guess you hit this issue:
> https://github.com/jgroups-extras/jgroups-kubernetes/issues/50
>
> Sorry to hear that. Perhaps you'd be interested in contributing a fix to
> KUBE_PING?
>
> On Mon, Jul 8, 2019 at 1:35 PM Мартынов Илья <imartynovsp at gmail.com>
> wrote:
>
>> I've tried split_clusters_during_rolling_update option, but KUBE_PING
>> start failing with NPE:
>> Caused by: java.lang.NullPointerException
>>         at java.util.Objects.requireNonNull(Objects.java:203)
>>         at java.util.Optional.<init>(Optional.java:96)
>>         at java.util.Optional.of(Optional.java:108)
>>         at java.util.stream.FindOps$FindSink$OfRef.get(FindOps.java:193)
>>         at java.util.stream.FindOps$FindSink$OfRef.get(FindOps.java:190)
>>         at
>> java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152)
>>         at
>> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>>         at
>> java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:464)
>>         at
>> org.jgroups.protocols.kubernetes.KUBE_PING.findMembers(KUBE_PING.java:240)
>>         at org.jgroups.protocols.Discovery.findMembers(Discovery.java:211)
>>         at org.jgroups.protocols.Discovery.down(Discovery.java:350)
>>
>> In debug I see code in KUBE_PING that tries to find pod's deployment
>>                 String senderParentDeployment = hosts.stream()
>>                       .filter(pod -> senderIp.contains(pod.getIp()))
>>                       .map(Pod::getParentDeployment)
>>                       .findFirst().orElse(null);
>> This code fails with NPE, as Pod::getParentDeployment returns null.
>> Parent deployment not set because KUBE_PING cannot read deployment from
>> json describing pod.
>> There is already a bug on KUBE_PING for it:
>> https://github.com/jgroups-extras/jgroups-kubernetes/issues/50
>>
>>
>> пн, 8 июл. 2019 г. в 10:25, Sebastian Laskawiec <slaskawi at redhat.com>:
>>
>>> Ok, I'm glad it worked.
>>>
>>> Just for the future - with KUBE_PING, there's an additional option:
>>> "split_clusters_during_rolling_update" set to true. See the documentation:
>>> https://github.com/jgroups-extras/jgroups-kubernetes#kube_ping-configuration
>>>
>>> On Wed, Jul 3, 2019 at 4:32 PM Мартынов Илья <imartynovsp at gmail.com>
>>> wrote:
>>>
>>>> Thanks Sebastian,
>>>> I use Kube_ping JGroups discovery plugin, it seems to be more reliable
>>>> than IP Multicasting.
>>>> Finally I've utilized choise #1 simply changing Strategy of Deployment
>>>> in kubernetes from default "RollingUpdate" to "Recreate". Recreate does
>>>> exactly what we need, first drops all existing pods and after that creates
>>>> pods with new version.
>>>>
>>>>
>>>>
>>>> ср, 3 июл. 2019 г. в 15:15, Sebastian Laskawiec <slaskawi at redhat.com>:
>>>>
>>>>> If you're using standalone-ha.xml without any extra parameters, you're
>>>>> using UDP JGroups stack with IP Multicasting discovery. You also suggested
>>>>> that you're using Pods, so I'm assuming you're using Kubernetes with
>>>>> Flannel network plugin (as far as I know, other plugins do not support IP
>>>>> Multicasting out of the box).
>>>>>
>>>>> So effectively you have 2 choices:
>>>>> - Turn everything off, do the migration and start it again. Note, that
>>>>> you need to shut everything down (this is not a rolling update procedure).
>>>>> - Reconfigure JGroups not to join the same cluster. The easiest thing
>>>>> to do here is to modify the cluster attribute and make sure it's different
>>>>> for the old and new cluster.
>>>>>
>>>>> Thanks,
>>>>> Sebastian
>>>>>
>>>>> On Wed, Jul 3, 2019 at 11:10 AM Мартынов Илья <imartynovsp at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Just another question. Actually, I don't even need zero-downtime
>>>>>> upgrade, I
>>>>>> need just any upgrade. Now upgrade is blocked because new pod cannot
>>>>>> start.
>>>>>> Can we somehow make new pod to ignore non-readable messages from old
>>>>>> pod
>>>>>> and continue loading?
>>>>>> Yes, session state won't be replicated, but it's more automated way
>>>>>> then
>>>>>> manual stop of old pod and starting a new pod after that.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ср, 3 июл. 2019 г. в 09:39, Мартынов Илья <imartynovsp at gmail.com>:
>>>>>>
>>>>>> > Hi Marek,
>>>>>> >
>>>>>> > Ok, I got it, thank you for response!
>>>>>> >
>>>>>> > вт, 2 июля 2019 г., 19:25 Marek Posolda <mposolda at redhat.com>:
>>>>>> >
>>>>>> >> I think you can't mix old and new Keycloak servers in same
>>>>>> cluster. And
>>>>>> >> rolling upgrade (zero downtime upgrade) is not yet supported. We
>>>>>> plan to
>>>>>> >> add support for it, but it won't be in a very near future as it
>>>>>> will
>>>>>> >> likely require quite a lot of work...
>>>>>> >>
>>>>>> >> In shortcut, it will be recommended to stop all pods with 4.5.0
>>>>>> and then
>>>>>> >> start pods with 6.0.1.
>>>>>> >>
>>>>>> >> Marek
>>>>>> >>
>>>>>> >>
>>>>>> >> On 02/07/2019 16:25, Мартынов Илья wrote:
>>>>>> >> > Hello!
>>>>>> >> > I have Keycloak 4.5.0.Final deployed in standalone-ha
>>>>>> configuration in
>>>>>> >> k8s
>>>>>> >> > cluster. When I try to update Keycloak to version 6.0.1, the
>>>>>> following
>>>>>> >> > happens:
>>>>>> >> > 1. K8s starts new pod with version 6.0.1
>>>>>> >> > 2. Old pod still running, it will be terminated on successfull
>>>>>> readiness
>>>>>> >> > probe of the new pod
>>>>>> >> > 3. New pod discovers old pod via JGroups, cache synchronization
>>>>>> started
>>>>>> >> > 4. Exception in new pod:
>>>>>> >> > 02-07-2019;13:34:29,220 WARN
>>>>>> [stateTransferExecutor-thread--p25-t1]
>>>>>> >> > org.infinispan.statetransfer.InboundTransferTask ISPN000210:
>>>>>> Failed to
>>>>>> >> > request state of cache work from node idp-6569c544b
>>>>>> >> > -hsd6g, segments {0-255}:
>>>>>> org.infinispan.remoting.RemoteException:
>>>>>> >> > ISPN000217: Received exception from idp-6569c544b-hsd6g, see
>>>>>> cause for
>>>>>> >> > remote stack trace
>>>>>> >> >          at org.infinispan at 9.4.8.Final
>>>>>> >> >
>>>>>> >>
>>>>>> //org.infinispan.remoting.transport.ResponseCollectors.wrapRemoteException(ResponseCollectors.java:28)
>>>>>> >> > ...
>>>>>> >> > Caused by: java.io.IOException: Unknown type: 132
>>>>>> >> >          at
>>>>>> >> >
>>>>>> >>
>>>>>> org.infinispan.marshall.core.GlobalMarshaller.readNonNullableObject(GlobalMarshaller.java:681)
>>>>>> >> >          at
>>>>>> >> >
>>>>>> >>
>>>>>> org.infinispan.marshall.core.GlobalMarshaller.readNullableObject(GlobalMarshaller.java:355)
>>>>>> >> >          at
>>>>>> >> >
>>>>>> >>
>>>>>> org.infinispan.marshall.core.BytesObjectInput.readObject(BytesObjectInput.java:40)
>>>>>> >> >
>>>>>> >> > Looks like this exception blocks further Keycloak startup,
>>>>>> because
>>>>>> >> nothing
>>>>>> >> > happens in logs afterwards. Also, my rest service deployed as
>>>>>> JAX-RS
>>>>>> >> bean
>>>>>> >> > also doesn't respond, so pod is not treated as alive by
>>>>>> Kubernetes.
>>>>>> >> > Please help.
>>>>>> >> > _______________________________________________
>>>>>> >> > keycloak-dev mailing list
>>>>>> >> > keycloak-dev at lists.jboss.org
>>>>>> >> > https://lists.jboss.org/mailman/listinfo/keycloak-dev
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> _______________________________________________
>>>>>> keycloak-dev mailing list
>>>>>> keycloak-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/keycloak-dev
>>>>>
>>>>>