[wildfly-dev] Re: WildFly Cloud Tests

Friday, 6 September 2024

I have improved the error reporting mechanism now.

I did this in two phases
* https://github.com/wildfly/wildfly/pull/18172 +
https://github.com/wildfly-extras/wildfly-cloud-tests/pull/194 introduced a
new mechanism
* https://github.com/wildfly/wildfly/pull/18174 +
https://github.com/wildfly-extras/wildfly-cloud-tests/pull/195 removed the
old mechanism

Previously the Cloud Tests Trigger workflow triggered by WIldFly PRs would
wait until the test on the wildfly-cloud-tests side had completed. The
remote job would communicate the status of the job via a push to a branch
that the trigger was polling and monitoring for the commit with the status.
I was never happy with this approach, and came across another way while
adding CI somewhere else.

What happens now is the Cloud Tests Trigger issues a repository dispatch
against the WIldFly Cloud Tests repository. This is the same as before, but
now it returns immediately after the dispatch.
The dispatch is done cloud-test-pr-trigger.yml workflow
<https://github.com/wildfly/wildfly/blob/main/.github/workflows/cloud-test...;,
and the *trigger-cloud-tests-pr* event is handled on the cloud tests side
by the wildfly-pull-request-runner.yml workflow
<https://github.com/wildfly-extras/wildfly-cloud-tests/blob/main/.github/w...
.

The first thing that the wildfly-pull-request-runner.yml workflow does is,
is a repository dispatch back to the wildfly repository to set the status
of the job as pending
<https://github.com/wildfly/wildfly/blob/main/.github/workflows/cloud-test...;.
The *report-cloud-tests-pr-pending* event type is handled on the
wildlfy side by cloud-test-pr-reporter.yml
<https://github.com/wildfly/wildfly/blob/main/.github/workflows/cloud-test...;,
which executes a call to add the status.

Once this is done, in the original PR, we see the 'Cloud Tests Trigger' job
has completed, and there is a new entry called 'Cloud Tests Remote Run',
which is in the pending status:
[image: Screenshot 2024-09-06 at 13.51.50.png]

The 'Details' link for 'Cloud Tests Remote Run' takes you to the workflow
run on the wildfly-cloud-tests side.

Once all the tests are run, the cloud tests wildfly-pull-request-runner.yml
reports the job status back to wildfly, with another repository dispatch
<https://github.com/wildfly-extras/wildfly-cloud-tests/blob/main/.github/w...;.
Again, the *report-cloud-tests-pr-complete* event type is handled on the
wildfly side by cloud-test-pr-reporter.yml
<https://github.com/wildfly/wildfly/blob/main/.github/workflows/cloud-test...;,
which executes a call to update the status for the job on the PR. In this
case the job passed 🥳:
[image: Screenshot 2024-09-06 at 14.45.28.png]
As before the 'Details' link takes you back to the job run on the cloud
tests side.

A small niggle is that the concurrency check on the WIldFly cloud tests
side will cancel all running jobs. This currently causes the status
reported back to be 'failed' due to something I still need to figure out.
Ideally that should be cancelled. However, this is a bit of a corner case,
since once the job is cancelled, and the new job starts the status will
correctly be reported as 'pending' again.

Thanks,

Kabir

On Wed, 4 Sept 2024 at 18:00, Kabir Khan <kkhan(a)redhat.com&gt; wrote:

...
 I've implemented the space saving part, and I now think the tests
can
 remain where they are.

 I found that with the Kubernetes registry enabled I was able to push and
 pull images from it. If I disable it and enable it again, the images I
 pushed before restarting are no longer there. So it seems this cleans up
 the registry, and should give a big space saving.

 I added a kubernetes-ci profile used by the GitHub Actions workflow, which
 enables the registry before each test is run, and disables it after it is
 run [1]. Here I also clean the image for each test from the local docker
 registry, although here the space saving is less (I believe it is just a
 layer containing the test deployment on top of the pre-built server images).

 For now I am keeping the server images built early on via the -Pimages
 flag, since I think the space saving from pruning the Kubernetes repository
 should be good enough for now. If this turns out to be a problem if we ever
 get a lot more tests and server images, I think I can do something in the
 scripts called by the kubernetes-ci profile to build those on demand, and
 remove them after the tests have completed.

 The next step will be to look at the improved reporting back to the
 WIldFly PR I mentioned.

 [1] - https://github.com/wildfly-extras/wildfly-cloud-tests/pull/192

 On Thu, 29 Aug 2024 at 15:30, Brian Stansberry <
 brian.stansberry(a)redhat.com&gt; wrote:

>
>
> On Thu, Aug 29, 2024 at 4:57 AM Kabir Khan <kkhan(a)redhat.com&gt; wrote:
>
>> Ok, I vaguely thought about that too...
>>
>> I can keep them in wildfly-extras for now, and improve the reporting as
>> mentioned, and then look into how to deal with the space issue. I guess on
>> the wildfly-extras side it will be a trigger job calling out to the other
>> ones, so the overall status report probably will not be as tricky as I
>> imagined.
>>
>
> Ok, good.
>
> An overall CI execution for a PR takes about 4.5 hours, due to the
> Windows jobs on TeamCity, so even if GH-action-based jobs ended up queuing
> sometimes it's unlikely to delay the entire PR cycle. These jobs take about
> 20 minutes and other ones we run should be faster. So really we shouldn't
> block moving things to wildfly. But optimizing any jobs that run in the
> wildfly GH org is important.
>
>
>> On Wed, 28 Aug 2024 at 16:53, Brian Stansberry <
>> brian.stansberry(a)redhat.com&gt; wrote:
>>
>>>
>>>
>>> On Wed, Aug 28, 2024 at 5:50 AM Kabir Khan <kkhan(a)redhat.com&gt; wrote:
>>>
>>>> Hi,
>>>>
>>>> These tests need some modernisation, and there are two things in my
>>>> opinion that need addressing.
>>>>
>>>> *1 Space issues*
>>>> Recently we were running out of space when running these tests. James
>>>> fixed this by deleting the built WildFly, but when trying to resurrect
an
>>>> old PR I had forgotten all about, we ran out of space again.
>>>>
>>>> I believe the issue is that the way the tests work at the moment,
>>>> which is to:
>>>> * Start minikube with the registry
>>>> * Build all the test images
>>>> * Run all the tests
>>>>
>>>> Essentially we end up building all the server images (different
>>>> layers) before running the tests, which takes space, and then each test
>>>> installs the image into minikube's registry. Also, some tests also
install
>>>> other images (e.g postgres, strimzi) into the minikube instance.
>>>>
>>>> My initial thought was that it would be good to build the server
>>>> images more on demand, rather than before the tests, and to be able to
call
>>>> 'docker system prune' now and again.
>>>>
>>>> However, this does not take into account the minikube registry, which
>>>> will also accumulate a lot of images. It will at least become populated
>>>> with the test images, I am unsure if it also becomes populated with the
>>>> images pulled from elsewhere (i.e. postgres, strimzi etc)?
>>>>
>>>> If `minikube addons disable registry` followed by a 'minikube addons
>>>> enable registry' deletes the registry contents from the disk, having
a hook
>>>> to do that between each test could be something easy to look into. Does
>>>> anyone know if this is the case?
>>>>
>>>> An alternative could be to have one job building wildfly, and
>>>> uploading the maven repository as an artifact, and then have separate
jobs
>>>> to run each test (or perhaps set of tests requiring the same WildFly
server
>>>> image). However, as this test is quite fiddly since it runs remotely,
I'm
>>>> not sure how the reporting would look.
>>>>
>>>> *2 Pull request trigger*
>>>> PRs in wildfly/wildfly execute a remote dispatch which results in the
>>>> job getting run in the wildfly-extras/wildfly-cloud-tests repository.
>>>>
>>>> There is no reporting back from the wildfly-extras/wildfly-cloud-tests
>>>> repository about the run id of the resulting run.
>>>>
>>>> What I did when I implemented this was to have the calling
>>>> wildfly/wildfly job wait and poll a branch in
>>>> wildfly-extras/wildfly-cloud-tests for the results of the job (IIRC I
have
>>>> a file with the triggering PR number). The job on the other side would
then
>>>> write to this branch once the job is done. Which is all quite ugly!
>>>>
>>>> However, playing in other repositories, I found
>>>> https://www.kenmuse.com/blog/creating-github-checks/. Basically this
>>>> would result in
>>>> * the WIldFly pull request trigger completing immediately once it has
>>>> done the remote dispatch
>>>> * When the wildfly-cloud-tests job starts it will do a remote dispatch
>>>> to wildfly, which will get picked up by a workflow which can add a
status
>>>> check on the PR conversation page saying remote testing in
>>>> wildfly-cloud-tests is in progres
>>>> * Once the wildfly-cloud-tests job is done, it will do another remote
>>>> dispatch to wildfly, which will update the status check with
success/failure
>>>>
>>>> So we'd have two checks in the section rather than the current one.
>>>>
>>>>
>>>> *Other ideas*
>>>> While writing the above, the following occurred to me.
>>>>
>>>> The reason for the split is that the cloud test framework is quite
>>>> involved, and IMO does not belong in WildFly. So the remote dispatch
>>>> approach was used.
>>>>
>>>> However, I wonder now if a saner approach would be to update the
>>>> wildfly-cloud-tests workflow to be reusable so they can be used from
>>>> WildFly?
>>>>
>>>> That would allow the tests, test framework etc., and the workflow to
>>>> continue to live in wildfly-cloud-tests, while running in wildfly
itself.
>>>> That should get rid of the remote dispatch issues, and make that side of
>>>> things simpler.
>>>>
>>>> It does not address the space issue, but I think if this approach
>>>> works, it will be easier to deal with the space issue.
>>>>
>>>
>>> A downside is that means the 3 actual test jobs (e.g.
>>>
https://github.com/wildfly-extras/wildfly-cloud-tests/actions/runs/105839...)
>>> run using the wildfly GH org's set of runners.
>>>
>>> Relying on wildfly-extras to get around that is a hack though. But if
>>> we're going to move these I think we need to optimize as much as
possible,
>>> e.g. not rebuild WildFly multiple times.
>>>
>>>>
>>>>
>>>> Any thoughts/insights are welcome.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Kabir
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> wildfly-dev mailing list -- wildfly-dev(a)lists.jboss.org
>>>> To unsubscribe send an email to wildfly-dev-leave(a)lists.jboss.org
>>>> Privacy Statement: https://www.redhat.com/en/about/privacy-policy
>>>> List Archives:
>>>>
https://lists.jboss.org/archives/list/wildfly-dev@lists.jboss.org/message...
>>>>
>>>
>>>
>>> --
>>> Brian Stansberry
>>> Principal Architect, Red Hat JBoss EAP
>>> WildFly Project Lead
>>> He/Him/His
>>>
>>
>
> --
> Brian Stansberry
> Principal Architect, Red Hat JBoss EAP
> WildFly Project Lead
> He/Him/His
>

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

[wildfly-dev] Re: WildFly Cloud Tests