Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(k8s): Make RunContainer detect/report more error conditions like image pull errors #331

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

cognifloyd
Copy link
Member

@cognifloyd cognifloyd commented Apr 30, 2022

This is an alternate implementation of #279 using the SharedInformer infra to watch events introduced in #519.
Closes #279

Overview

The Docker Runtime surfaces image pull errors naturally because of the blocking synchronous requests to pull an image or start a container.

The Kubernetes Runtime has to watch for events that show any errors with pulling images.

RunContainer changes in k8s runtime

Wait until one of these conditions before returning:

  • build canceled
  • container is running
  • got an image pull error

WaitContainer changes in k8s runtime

Wait until one of these conditions before returning:

  • build canceled (new)
  • container is terminated (was already doing this)

containerTracker additions

I added signal channels for Running and ImagePulled. I also added a channel for receiving image pull errors in RunContainer.

podTracker changes

I need pod Name and Namespace in more places, so I copy those into the podTracker now.

I was surprised that the SharedInformerFactory I used to create the Informer/Lister for watching the build Pod did NOT work for watching events. Apparently, events don't get labeled, so I had to add a separate eventInformerFactory that uses a FieldSelector instead of a LabelSelector when making the list/watch API calls.

I did not rename informerFactory to podInformerFactory because it can easily be used for other k8s resources in the future (if needed), just not for Events.

I also added the Event event handler funcs (wow that was a mouth full - events about Events).

podTracker.inspectContainerEvent() (called by the Event event handlers) is responsible for

  • sending the image pull events to RunContainer
  • closing the ImagePulled signal channel when an event indicates that the pull was successful

I also adjusted podTracker.inspectContainerStatuses() so that it can also:

  • surface image pull errors (in case the error is visible there, but not through the Events API - which is unlikely, but there doesn't seem to be any kind of guarantees about event availability, so this covers our bases)
  • close the Running signal channel when the container status shows that it is running.

TODO:

@cognifloyd cognifloyd added the bug Indicates a bug label Apr 30, 2022
@cognifloyd cognifloyd self-assigned this Apr 30, 2022
@cognifloyd cognifloyd added feature Indicates a new feature and removed bug Indicates a bug labels Apr 30, 2022
@cognifloyd cognifloyd changed the title K8s events feat(k8s): Make RunContainer detect/report more error conditions like image pull errors. Apr 30, 2022
@cognifloyd cognifloyd changed the title feat(k8s): Make RunContainer detect/report more error conditions like image pull errors. feat(k8s): Make RunContainer detect/report more error conditions like image pull errors Apr 30, 2022
@copart-jafloyd copart-jafloyd force-pushed the k8s-events branch 2 times, most recently from 2d5034b to f33af7f Compare May 3, 2022 20:08
@codecov
Copy link

codecov bot commented May 3, 2022

Codecov Report

Merging #331 (65cbe59) into master (a9cbe48) will increase coverage by 1.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #331      +/-   ##
==========================================
+ Coverage   81.18%   82.20%   +1.01%     
==========================================
  Files          68       68              
  Lines        5273     5472     +199     
==========================================
+ Hits         4281     4498     +217     
+ Misses        848      831      -17     
+ Partials      144      143       -1     
Impacted Files Coverage Δ
runtime/kubernetes/build.go 88.60% <100.00%> (+1.55%) ⬆️
runtime/kubernetes/container.go 91.64% <100.00%> (+9.61%) ⬆️
runtime/kubernetes/pod_tracker.go 97.37% <100.00%> (+1.51%) ⬆️

@copart-jafloyd copart-jafloyd force-pushed the k8s-events branch 2 times, most recently from abdb440 to bc846e0 Compare May 4, 2022 18:31
cognifloyd added 18 commits May 4, 2022 13:32
We have nothing to do when that happens.
we have to use different list options, so they can't share the same factory.
This needs access to the PodTracker's EvenLister, so we create
the function when initializing the containerTracker.
I've documented a bunch of other event reasons to facilitate
expanded event handling in the future.
Also, I keep running across these messages, so this help
me remember what they are and where they come from.
Now that RunContainer listens for k8s-state changes,
the tests need to actually simulate the state change
like the WaitContainer tests do.
@cognifloyd cognifloyd marked this pull request as ready for review May 4, 2022 18:56
@cognifloyd cognifloyd requested a review from a team as a code owner May 4, 2022 18:56
@cognifloyd cognifloyd requested review from jbrockopp and kneal May 4, 2022 18:56
// known kubelet event reasons are listed here:
// https://github.com/kubernetes/kubernetes/blob/v1.23.6/pkg/kubelet/events/event.go

// kubelet image event reasons.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to just use their constants or wrap them at least so we know when they change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have not packaged the code so that it can be used externally.
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/events/event.go

If I do go get github.com/kubernetes/kubernetes I get an error module declares its path as: k8s.io/kubernetes
If I do go get k8s.io/kubernetes I get an error that k8s.io/[email protected] requires k8s.io/[email protected]: reading k8s.io/api/go.mod at revision v0.0.0: unknown revision v0.0.0

They seem to carefully curate what can be imported by external projects, so we can't import this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, just checking. I just noticed when I went to the code they were public on the package. So, thought they might be importable.

Copy link
Contributor

@jbrockopp jbrockopp May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I'd provide a bit of clarity on this subject...

The tl;dr is you can vendor from k8s.io/kubernetes but its not pretty or straightforward 😅

Here's the reason why go get k8s.io/kubernetes fails:

kubernetes/kubernetes#79384 (comment)

And to actually vendor from k8s.io/kubernetes, you have to add a replace directive for all nested packages:

kubernetes/kubernetes#79384 (comment)

If we want, there are people who've scripted the approach to this so updating the version is easier:

kubernetes/kubernetes#79384 (comment)

To see a real world example of what this looks like in the go.mod file:

https://github.com/kubernetes-sigs/cloud-provider-huaweicloud/blob/9589bc854c29e399476479f9c5aa9599b2064da5/go.mod#L18-L40

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eww... All that work just to reuse some constants... Not worth it imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed 😄

runtime/kubernetes/pod_tracker.go Outdated Show resolved Hide resolved
Copy link
Contributor

@kneal kneal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐬

Copy link
Contributor

@jbrockopp jbrockopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cognifloyd cognifloyd marked this pull request as draft May 7, 2022 17:41
@cognifloyd
Copy link
Member Author

I'm running into some issues with this, so I'm moving it back to draft for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Indicates a new feature
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

3 participants