Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cloud): implement distributed caching for tests #5237

Draft
wants to merge 42 commits into
base: main
Choose a base branch
from

Conversation

shumailxyz
Copy link
Contributor

@shumailxyz shumailxyz commented Oct 16, 2023

What this PR does / why we need it:

Adds support for the cloud backed distributed caching for the tests actions.

This introduces a new cache object field on the action config level with option for excluding certain config keys from the action version calculation. This is particularly useful for the new distributed caching feature where a certain keys might change across environments, but the action should still be cached.

For instance, if you have an 'deploy' action that relies on a variable to determine the ingress hostname, you may wish to specify the keys using that variable as one to be excluded. This ensures that the action's version remains consistent, even as the hostname varies across different environments.

Additionally, it adds support for wildcard in the paths.

Here's an example config for 2 actions, specifying the keys to be excluded from action version calculation:

kind: Deploy
type: container
name: vote
build: vote
cache:
  exclude:
    paths:
      - spec.ingresses.*.hostname
      - spec.*.HOSTNAME
description: The voting UI
spec:
  sync:
    paths:
      - target: /app/src
        source: src
        mode: one-way-replica
        exclude: [node_modules]
  ports:
    - name: http
      containerPort: 8080
      servicePort: 80
  healthCheck:
    httpGet:
      path: /
      port: http
  ingresses:
    - path: /
      port: http
      hostname: vote.${var.base-hostname}
  env:
    HOSTNAME: vote.${var.base-hostname}
    VITE_USERNAME: ${local.username}

---
kind: Deploy
type: container
name: api
build: api-build
cache:
  exclude:
    paths:
      - spec.ingresses.*.hostname
description: The backend deploy for the voting UI
spec:
  args: [python, app.py]
  sync:
    args: ["/bin/sh", "-c", "ls /app/app.py | entr -n -r python /app/app.py"]
    paths:
      - target: /app
        mode: "one-way-replica"
        exclude: [.venv]
  ports:
    - name: http
      protocol: TCP
      containerPort: 8080
      servicePort: 80
  ingresses:
    - path: /
      port: http
      hostname: api.${variables.base-hostname}
  healthCheck:
    httpGet:
      path: /health
      port: http

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:
While the wildcard is a good thing to have at hand, we'll need to document the risks of using wildcard clearly.
For example, consider the following config:

...
cache:
  exclude:
    paths:
      spec.ingresses.*.hostname
spec:
  ports:
    - name: http
      containerPort: 8080
      servicePort: 80
  healthCheck:
    httpGet:
      path: /
      port: http
  ingresses:
    - path: /
      port: http
      hostname: foo.${var.base-hostname}
    - path: /api
      port: http
      hostname: bar.${var.base-hostname}
    - path: /hello
      port: http
      hostname: hello.garden

And then the user updates the config to the following:

...
cache:
  exclude:
    paths:
      spec.ingresses.*.hostname
spec:
  ports:
    - name: http
      containerPort: 8080
      servicePort: 80
  healthCheck:
    httpGet:
      path: /
      port: http
  ingresses:
    - path: /
      port: http
      hostname: foo.${var.base-hostname}
    - path: /api
      port: http
      hostname: horizon.${var.base-hostname} # <-- this is changed
    - path: /hello
      port: http
      hostname: hello.xyz # <-- this is changed

in above case, the action version is not going to change as wildcard spec.ingresses.*.hostname basically ignores hostname altogether in each ingress object.

@shumailxyz shumailxyz changed the title WIP: Tests cache WIP: implement distributed caching for tests. Nov 2, 2023
core/src/actions/base.ts Outdated Show resolved Hide resolved
@shumailxyz shumailxyz marked this pull request as ready for review November 9, 2023 09:07
@shumailxyz shumailxyz changed the title WIP: implement distributed caching for tests. feat: implement distributed caching for tests Nov 9, 2023
@shumailxyz shumailxyz requested review from 10ko and a team November 9, 2023 09:07
@shumailxyz shumailxyz changed the title feat: implement distributed caching for tests feat(cloud): implement distributed caching for tests Nov 9, 2023
still be cached.

For instance, if you have an 'deploy' action that relies on a variable to determine the ingress hostname,
you may wish to designate the \`\${var.ingress-hostname}\` variable as one to be ignored. This ensures
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good feature but really calls for full examples and more clarity in the docs.

Also, this template string mentioned here is confusing… Isn’t the exclusion applied to the action config and not the input variables? How would that work in this case?

@@ -543,7 +562,9 @@ export abstract class BaseAction<

@Memoize()
private stringifyConfig() {
return stableStringify(omit(this._config, "internal"))
const clonedConfig = cloneDeep(this._config)
const configExcludingIgnoreKeys = omit(clonedConfig, "internal", ...this.ignoredKeysForVersion)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not quite sure this works as intended. In many or most cases we want to exclude a specific nested path in the config from versioning. This really only filters out top-level keys, right?

@shumailxyz shumailxyz requested a review from thsig November 9, 2023 09:57
result.push({ key: `variables.${v}`, matchedValue: config.variables[v]?.toString() ?? "" })
}
})
return result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, now I get it, ignore my last comment. However… I can imagine some issues with this approach. For one thing, the variables on a config itself may not be what you’re trying to mark as excluded. It also feels conflated and prone to mistakes to use one field to specify both variable names and names of config keys. I wonder if we should tease this apart. I’d also consider if full key paths make more sense than searching for any key in the config with a listed name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should tease this apart.

How would that look like?

The idea here was that user only specifies the variables to exclude under cache.exclude.variables and we find all the keys that are referencing that variable. And additionally, we look for the variable declarations in the action config and also exclude paths of those.

e.g., For the following action config, where the user specifies var.hostname to be ignored:

name: foo
kind: Deploy
type: container
cache:
  exclude:
    variables:
      - ${var.hostname}
variables:
  hostname: foo.com
spec:
  ingresses:
    - path: /
      port: http
      hostname: foo.${var.hostname}

we omit the 3 paths for action version calculation:

spec.ingresses.0.hostname
cache.exclude.variables.0
variables.hostname

I’d also consider if full key paths make more sense than searching for any key in the config with a listed name.

Not quite sure about the experience if user is expected to write the full paths. That might be more robust though, so open to suggestions. Something like following?

...
cache:
  exclude:
    paths:
      - spec.ingresses.0.hostname
      - cache.exclude.variables.0
      - variables.hostname
...     

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets more tricky if, for example, $merge is used.
Consider the following config where it's not easily possible to write the full key path.

garden.yml

apiVersion: garden.io/v1
kind: Project
name: project_foo

environments:
  - name: local1
    defaultNamespace: project_foo
    variables:
      base-hostname: foo.com
      nestedVars:
        EXTERNAL_API_URL: http://api.foo.com
providers:
  - name: local-kubernetes
    environments: [local1]
    namespace: ${environment.namespace}

---

kind: Deploy
type: container
name: action-with-merge
build: foo
cache:
  exclude:
    variables:
      - ${var.base-hostname}
      - ${var.nestedVars.EXTERNAL_API_URL}
description: Deploy the container
spec:
  ports:
    - name: http
      containerPort: 8080
      servicePort: 80
  healthCheck:
    httpGet:
      path: /
      port: http
  ingresses:
    - path: /
      port: http
      hostname: foo.${var.base-hostname}
  env:
    $merge: ${var.nestedVars}
---

cc @thsig as we were also discussing this.

Copy link
Collaborator

@thsig thsig Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the following:

cache:
  exclude:
    paths:
      - spec.ingresses.0.hostname
      - spec.ingresses.0.hostname
      - cache.exclude.variables.0
      - variables.hostname
    values:
      - ${var.hostname}

Also, it would probably be more intuitive / less surprising for the user if the values matching would substitute the value with a placeholder (e.g. the string <ignored> before hashing instead of omitting the entire key where the value appears (which might lead to unwanted cache hits when another templated value in the expression changes).

For example, here I'd want the version to change if var.memoryLimitUnit changes from Mi (megabytes) to G (gigabytes):

cache:
  exclude:
    values:
      - ${var.memoryLimit}
some:
  key: "${var.memoryLimit}${var.memoryLimitUnit}"

This is a silly template expression that would probably never be used in practice, but it's enough to illustrate the point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just start with paths for the initial implementation.

I think the values concept makes more sense at the project level, maybe under environments[] (adjacent to where environment-specific variables are defined).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch with the key: "${var.memoryLimit}${var.memoryLimitUnit}" example, thanks @thsig.

In that case, your suggestion with omitting paths and substituting values seems more logical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write this down in the RFC? I'm getting quite confused; I think we need a list of examples that we apply all these new concepts to, to see how intuitive they are in all cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public-facing docs and examples definitely need to be clear easy to understand—should we start by writing/improving the docs around this (including code samples), or do you think this merits going back to the RFC proper for a more in-depth discussion?

Copy link
Member

@stefreak stefreak Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think working on examples and public-facing docs first to help validate the concept might be a good idea (But I'm not saying it has to be done this way; It would just be a bummer to notice it late if it ends up to be a difficult UX). The RFC might be a good way to communicate the results of that.

Especially thinking about the edge cases and the user experience here: What happens when a user mistypes a path? How do you explain the concept and the syntax of paths? Does it make it harder to implement a VSCode plugin? What happens when users ignore a value that is extremely common, e.g. empty strings? What happens if the ignored value is an object, null or undefined? What about the templating in manifests, how do you express a path there?

Copy link
Contributor Author

@shumailxyz shumailxyz Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR as we discussed. Adds option for excluding certain keys based on the path and removed excluding by values for now as it's prone to errors and unwanted cache hits.
Feel free to check it again and see how the excluding certain keys by path feels.

@shumailxyz shumailxyz marked this pull request as draft November 14, 2023 15:23
@shumailxyz shumailxyz closed this Jan 10, 2024
@10ko
Copy link
Member

10ko commented Jan 10, 2024

Hey @shumailxyz I'd leave this open for a bit longer, just to make sure it doesn't fall through the cracks.

@10ko 10ko reopened this Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants