-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable direct error handling for bundle plugin trigger method #7143
base: main
Are you sure you want to change the base?
enable direct error handling for bundle plugin trigger method #7143
Conversation
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
201abe0
to
c510621
Compare
@@ -6522,6 +6522,68 @@ func TestPluginManualTriggerWithTimeout(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestPluginManualTriggerWithServerError(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test triggers the race condition.
My investigation led to the conclusion that this is triggered by having the bundle with the "periodic" trigger mode in the test.
The downloader.oneShot function will be triggered by the explicitly trigger method and also by the downloader.loop function.
The access to the shared variables (f.ex. etag, client) in Downloader is not synchronized and leads to the race condition.
A possible solution is to only trigger bundle dowloads with "manual" trigger mode in the trigger method.
F.ex. like this (in bundle.Plugin.Trigger)
func (p *Plugin) Trigger(ctx context.Context) error {
[...]
for name, dl := range p.downloaders {
downloaders[name] = dl
p.cfgMtx.RLock()
triggerMode := p.config.Bundles[name].Trigger
p.cfgMtx.RUnlock()
if triggerMode != nil && *triggerMode == plugins.TriggerManual {
downloaders[name] = dl
}
}
[...]
This would also solve the question if Trigger should return errors for bundles with "periodic" trigger mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashutosh-narkar
Do you have a recommendation how to proceed with this?
To make the test green, I could remove the configuration for the bundle in "periodic" trigger mode but I think this might be a problem in the bundle plugin that should be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having only a manual trigger for the test should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @torwunder! One high level comment I have is about the structure of the error. If we can model it similar to https://github.com/open-policy-agent/opa/blob/main/plugins/bundle/status.go#L64, it would be more informative I think.
@ashutosh-narkar I modelled the error similar to the status. |
return fmt.Sprintf("Bundle name: %s, Code: %s, HTTPCode: %d, Message: %s", e.BundleName, errCode, e.HTTPCode, e.Message) | ||
} | ||
|
||
func (e Error) Unwrap() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Is there a reason to define this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will allow using errors.As
for retrieving the wrapped error. Something like this:
errs := Errors{NewBundleError("example", download.HTTPError{StatusCode: 500})}
var httpError download.HTTPError
errors.As(errs, &httpError)
if httpError.StatusCode == 500 { ...
There is a test in errors_test.go
(TestUnwrap) that will start to fail if we remove the function.
plugins/bundle/plugin.go
Outdated
// plugin callback will also log the trigger error and include it in the bundle status | ||
err := d.Trigger(ctx) | ||
// only return errors for TriggerMode manual as periodic bundles will be retried | ||
if err != nil && *p.config.Bundles[name].Trigger == plugins.TriggerManual { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the Config()
method on the plugin to retrieve the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashutosh-narkar do you think it makes sense to only return the errors for bundles in manual
trigger mode?
It felt sensible to me as periodic
triggermode will be retried by the bundle plugin but might also just create confusion.
it also creates this additional if statement which will then not be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manual trigger mode can currently only be configured via a library integration. So to keep parity with the discovery plugin, returning errors in the manual mode makes sense.
@@ -6522,6 +6522,68 @@ func TestPluginManualTriggerWithTimeout(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestPluginManualTriggerWithServerError(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having only a manual trigger for the test should be fine.
356764b
to
2776139
Compare
@@ -321,15 +323,32 @@ main contains "hello" if { | |||
|
|||
ctx, cancel := context.WithCancel(context.Background()) | |||
defer cancel() | |||
go func() { | |||
go func(expectedErrors []string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand what we're testing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the existing test is verifying that the errors are present in the status,
my goal was to also verify that these errors are also returned directly from the trigger method
handing over the expected error as a parameter became necessary due to a detected race condition as the goroutine will otherwise work on tc.expErrs
which is changed concurrently in the outer loop (l. 294)
@torwunder thanks for addressing the comments. Can you please fix the conflict and squash your changes and we can then get this in? Thanks! |
2776139
to
f9a46e2
Compare
…r errors for bundles in manual trigger mode Signed-off-by: towunderlich <[email protected]>
* model Error similar to bundle status Signed-off-by: towunderlich <[email protected]>
* only test error handling for bundle in manual trigger mode * use plugin config method Signed-off-by: towunderlich <[email protected]>
* enable new test to run in parallel Signed-off-by: towunderlich <[email protected]>
f9a46e2
to
bd3212c
Compare
@ashutosh-narkar I resolved the conflict and added the parallel setting to the test, in line with the upstream changes. |
Why the changes in this PR are needed?
When using opa as a library you may want to directly control when and how opa interacts with the bundle registry.
When switching to manual trigger mode and triggering the the bundle plugin directly you may want to react to errors that occurred when downloading the bundles.
The current capability of using the bundle status requires configuring a listener and wait for it to be triggered which differs from the approach taken f.ex. in the discovery plugin.
By returning the downloader errors directly in the bundle plugin trigger method you can f.ex. check for HTTPError and its status code so to determine if a retry can be successful (status code 5xx) or will yield the same result (4xx).
What are the changes in this PR?
Notes to assist PR review:
Further comments: