[jboss-jira] [JBoss JIRA] (SWSQE-200) B12 OpenShift Cluster is flaky
Kevin Earls (JIRA)
issues at jboss.org
Tue May 15 04:06:01 EDT 2018
[ https://issues.jboss.org/browse/SWSQE-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Earls reopened SWSQE-200:
-------------------------------
Sorry [~gbaufake] but I don't think we can say this is done yet. When I came in this morning the elasticsearch cluster that I had left running yesterday was in a failed state even though nothing had been using it. I shut it down and tried to redeploy, but so far that's not working. I get the following message in the monitoring/events page:
Failed Create Pod Sand Box Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "elasticsearch-0": Error response from daemon: grpc: the connection is unavailable
174 times in the last
Can you at least restart the cluster so I can try to finish up a couple of things today?
Also we should probably leave this bug open (or replace it with more specific ones) until we're sure we have a healthy cluster. Thanks.
> B12 OpenShift Cluster is flaky
> ------------------------------
>
> Key: SWSQE-200
> URL: https://issues.jboss.org/browse/SWSQE-200
> Project: Kiali QE
> Issue Type: QE Task
> Reporter: Kevin Earls
> Assignee: Guilherme Baufaker Rêgo
> Priority: Minor
>
> I'm opening this mostly as a placeholder, and will update it as I get more information on the problems I've been experiencing. Since I've been using B12 I've been experiencing more frequent failures than on other clusters, including minishift on my laptop and the CNCF CI Jenkins. Here are a couple of instances:
> 1. Deploying ElasticSearch along with the Jaeger Production templates requires allocating 2G or memory, even though the default 512M works fine elsewhere. This can be seen in the Jaeger Standalone Performance tests job here: https://jenkins-jaeger-test.openshift3.jonqe.lab.eng.bos.redhat.com/job/Jaeger%20Standalone%20Performance%20Test/ . If you run the job without changing the ES_MEMORY parameter to 2Gi it will fail.
> 2. I have a set of smoke tests for Red Hat productized artifacts for the Jaeger Java client which are run using the Jaeger all-in-one template. It's fairly simple, but on B12 deployment of the Jaeger all-in-one images fails on every other build. I've made a copy of the job here: https://jenkins-jaeger-test.openshift3.jonqe.lab.eng.bos.redhat.com/job/Flaky%20Test/ and its history should be clear.
> So far I have not been able to get any useful information about why this is failing. In the OpenShift console it looks like Jaeger has started correctly. There are no errors in the logs, nor any to be found under monitoring. But if you click on the Jaeger link, you get the message "Application is not available . The application is currently not serving requests at this endpoint. It may not have been started or is still starting."
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
More information about the jboss-jira
mailing list