-
Notifications
You must be signed in to change notification settings - Fork 1
Update GKE versions to something that will support NetApp + Quotas #571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3aaf335 to
e5ef76c
Compare
6bec0b9 to
580ca1e
Compare
580ca1e to
7b79b34
Compare
7718578 to
2acbeae
Compare
d5e619e to
0cb4a1a
Compare
be751d2 to
337ca12
Compare
|
Corresponds to https://github.com/lsst/idf_deploy/actions/runs/14230558971/job/39880180904 The objects to be created look plausible to me. |
|
|
||
| moved { | ||
| # Google provider version update | ||
| from = module.service_account_cluster.google_project_iam_member.project-roles["cluster-science-platform-dev-7696=>roles/container.clusterAdmin"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thecluster-science-platform-dev-7696 service account is for dev only and is unique per project. Please add the service accounts from demo, int, and stable, too. In the plans the service account membership in clusterAdmin is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out we're gonna have to edit the state directly to do this.
|
I would consider removing integration and production tfvars from this PR. Granted this is a new feature, but there is a module upgrade. Safer to validate dev first then deploy int and prod in a separate PR. |
| # a storage pool/volume pair. | ||
| # | ||
| netapp_definitions = [ | ||
| { name = "home" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a future PR you might consider moving many of these definitions out of the variables file and into static module instantiatiations in the main terraform.
That would make it easier to get consistent deployments between environments without relying on copy/pasting correctly between the variables files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would, but they're still going to vary across environments (sizes and default quotas and user quota overrides, at least, will differ). I think it will be less confusing to have the whole volume definition in the same place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do variables like home_volume_size, scratch_volume_default_quota -- only having variables for the parts that vary between the environments.
modules/netapp_volumes/main.tf
Outdated
|
|
||
| # Backup policy (if needed); highly opinionated | ||
| resource "google_netapp_backup_policy" "instance" { | ||
| depends_on = [google_netapp_volume.instance] // needs each? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this depends_on is backwards -- the definition of the google_netapp_volume is referencing these backup policies, not the other way around.
If you make the change I recommended above, I think you can just remove this since Terraform will already be aware of the dependency.
|
|
||
| backup_config { | ||
| scheduled_backup_enabled = var.definition.backups_enabled | ||
| backup_policies = var.definition.backups_enabled ? ["projects/${var.project}/locations/${var.location}/backupPolicies/backup_${var.definition.name}"] : null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to specify this as google_netapp_backup_policy.instance[0].id -- less error-prone and it makes Terraform aware of the dependency, so it will set up infrastructure in the correct order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, maybe, but doing this and the next thing requires plumbing at least one more variable through and apparently I need some kind of explicit dependency because it hung on backup_vault_id . And since I know what that string looks like and I'm responsible for creating these objects, I don't think it's any more error-prone.
We don't actually care about the ordering at this stage: the worst thing that happens is that we somehow don't get our first scheduled backup, which, since it's a new volume, is not a big deal. By the time we get the initial data copy to the new volume done, the backup vault and policy will both long have existed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one shouldn't require any extra variable plumbing -- you literally just replace the string with the module reference. The backup vault one does need a variable but that kind of dependency injection is a very standard terraform pattern.
Not sure what you mean by "hung" -- passing the variable through is how you declare the dependency.
The point of terraform is to allow a future developer to know what infrastructure was created and how all the pieces tie together. If you're using an implied string interpolation buried in a module, it's a lot more difficult for a future reader to trace back to where the resource is being created.
And you do care about the ordering -- the terraform deployment might fail if the volume is instantiated before the backup vault or backup policy, because the Google API will reject the reference to a non-existent resource. This is not going to be deterministic unless Terraform can see the dependency between the two resources.
| backup_config { | ||
| scheduled_backup_enabled = var.definition.backups_enabled | ||
| backup_policies = var.definition.backups_enabled ? ["projects/${var.project}/locations/${var.location}/backupPolicies/backup_${var.definition.name}"] : null | ||
| backup_vault = var.definition.backups_enabled ? "projects/${var.project}/locations/${var.location}/backupVaults/netapp-backup-vault" : null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the above, it would be better to pass in the ID of this vault from the top level instead of making an assumption about its location here.
| default = "127.0.0.1" | ||
| } | ||
|
|
||
| variable "definition" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a future PR you might consider flattening out this object into separate variables here... as you discovered earlier in the week, Terraform's type-coercion rules can often lead to surprising errors with complex types. You'll get better error messages for typos if these are just module-level variables.
|
Looks good to me other than what Dan already called out. And you checked on the billing account change, right? |
…esource instance can have moved from only one source instance.
0f39e9d to
222c4ad
Compare
222c4ad to
6b5638f
Compare
Yeah, Dan said it was just the budget and was harmless. |
We need at least 6.21 for Netapp Volume Quotas.