Skip to content

Conversation

@athornton
Copy link
Contributor

We need at least 6.21 for Netapp Volume Quotas.

@athornton athornton marked this pull request as draft March 20, 2025 18:27
@athornton athornton force-pushed the tickets/DM-49593 branch 4 times, most recently from 3aaf335 to e5ef76c Compare March 31, 2025 15:26
@athornton athornton force-pushed the tickets/DM-49593 branch 7 times, most recently from 6bec0b9 to 580ca1e Compare April 2, 2025 17:13
@athornton athornton requested review from dhirving and dspeck1 April 2, 2025 22:14
@athornton athornton marked this pull request as ready for review April 2, 2025 22:15
@athornton
Copy link
Contributor Author

Corresponds to https://github.com/lsst/idf_deploy/actions/runs/14230558971/job/39880180904

The objects to be created look plausible to me.


moved {
# Google provider version update
from = module.service_account_cluster.google_project_iam_member.project-roles["cluster-science-platform-dev-7696=>roles/container.clusterAdmin"]
Copy link
Collaborator

@dspeck1 dspeck1 Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thecluster-science-platform-dev-7696 service account is for dev only and is unique per project. Please add the service accounts from demo, int, and stable, too. In the plans the service account membership in clusterAdmin is removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out we're gonna have to edit the state directly to do this.

@dspeck1
Copy link
Collaborator

dspeck1 commented Apr 3, 2025

I would consider removing integration and production tfvars from this PR. Granted this is a new feature, but there is a module upgrade. Safer to validate dev first then deploy int and prod in a separate PR.

# a storage pool/volume pair.
#
netapp_definitions = [
{ name = "home"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future PR you might consider moving many of these definitions out of the variables file and into static module instantiatiations in the main terraform.

That would make it easier to get consistent deployments between environments without relying on copy/pasting correctly between the variables files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would, but they're still going to vary across environments (sizes and default quotas and user quota overrides, at least, will differ). I think it will be less confusing to have the whole volume definition in the same place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do variables like home_volume_size, scratch_volume_default_quota -- only having variables for the parts that vary between the environments.


# Backup policy (if needed); highly opinionated
resource "google_netapp_backup_policy" "instance" {
depends_on = [google_netapp_volume.instance] // needs each?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this depends_on is backwards -- the definition of the google_netapp_volume is referencing these backup policies, not the other way around.

If you make the change I recommended above, I think you can just remove this since Terraform will already be aware of the dependency.


backup_config {
scheduled_backup_enabled = var.definition.backups_enabled
backup_policies = var.definition.backups_enabled ? ["projects/${var.project}/locations/${var.location}/backupPolicies/backup_${var.definition.name}"] : null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to specify this as google_netapp_backup_policy.instance[0].id -- less error-prone and it makes Terraform aware of the dependency, so it will set up infrastructure in the correct order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, maybe, but doing this and the next thing requires plumbing at least one more variable through and apparently I need some kind of explicit dependency because it hung on backup_vault_id . And since I know what that string looks like and I'm responsible for creating these objects, I don't think it's any more error-prone.

We don't actually care about the ordering at this stage: the worst thing that happens is that we somehow don't get our first scheduled backup, which, since it's a new volume, is not a big deal. By the time we get the initial data copy to the new volume done, the backup vault and policy will both long have existed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one shouldn't require any extra variable plumbing -- you literally just replace the string with the module reference. The backup vault one does need a variable but that kind of dependency injection is a very standard terraform pattern.

Not sure what you mean by "hung" -- passing the variable through is how you declare the dependency.

The point of terraform is to allow a future developer to know what infrastructure was created and how all the pieces tie together. If you're using an implied string interpolation buried in a module, it's a lot more difficult for a future reader to trace back to where the resource is being created.

And you do care about the ordering -- the terraform deployment might fail if the volume is instantiated before the backup vault or backup policy, because the Google API will reject the reference to a non-existent resource. This is not going to be deterministic unless Terraform can see the dependency between the two resources.

backup_config {
scheduled_backup_enabled = var.definition.backups_enabled
backup_policies = var.definition.backups_enabled ? ["projects/${var.project}/locations/${var.location}/backupPolicies/backup_${var.definition.name}"] : null
backup_vault = var.definition.backups_enabled ? "projects/${var.project}/locations/${var.location}/backupVaults/netapp-backup-vault" : null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above, it would be better to pass in the ID of this vault from the top level instead of making an assumption about its location here.

default = "127.0.0.1"
}

variable "definition" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future PR you might consider flattening out this object into separate variables here... as you discovered earlier in the week, Terraform's type-coercion rules can often lead to surprising errors with complex types. You'll get better error messages for typos if these are just module-level variables.

@dhirving
Copy link
Contributor

dhirving commented Apr 3, 2025

Looks good to me other than what Dan already called out. And you checked on the billing account change, right?

@dhirving dhirving self-requested a review April 3, 2025 18:10
@athornton athornton force-pushed the tickets/DM-49593 branch 2 times, most recently from 0f39e9d to 222c4ad Compare April 3, 2025 21:08
@athornton
Copy link
Contributor Author

Looks good to me other than what Dan already called out. And you checked on the billing account change, right?

Yeah, Dan said it was just the budget and was harmless.

@athornton athornton merged commit 595995b into main Apr 3, 2025
2 checks passed
@athornton athornton deleted the tickets/DM-49593 branch April 3, 2025 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants