Replies: 17 comments
-
Looking forward to it! |
Beta Was this translation helpful? Give feedback.
-
I would definitely recommend using BashOperator for this :) . Just to clear the air I am not against TerraformOperator for sure but I have used a lot of terraform with Airflow in the past and BashOperator has just worked perfectly without having to care about different TF versions |
Beta Was this translation helpful? Give feedback.
-
I think it would be nice to have a Hook/Operator that could manage installing Terraform automatically and expose a python API. There is the https://pypi.org/project/python-terraform/ wrapper and installing terraform is basically downloading the right binary from https://www.terraform.io/downloads.html (you can even specify version of terraform). This way you could use the power of terraform without worrying about having it installed at your worker. I think just mentioning that there is a "Terraform" hook/operaotr is something that can make more Airflow users more aware that they can actually use Terraform rather than dedicated actions. Plus I think often terraform scripts are rather complex - usually they are stored somewhere in repository - not necessarily in the DAG's folder, so it would be great to have an option to somehow download (git-sync?) a specified set of terraform scripts. Or maybe we can think out some more "airflow-y" way of distributing such scripts. |
Beta Was this translation helpful? Give feedback.
-
Just an FYI: https://pypi.org/project/python-terraform/ (https://github.com/beelit94/python-terraform) hasn't been actively maintained so if we someone wants to work on this one, try finding another library |
Beta Was this translation helpful? Give feedback.
-
@kaxil I'd love to hear more about your use of terraform in airflow.
I think something that comes to mind for me is that beyond terraform binaries sometimes, terraform scripts depend on other binaries e.g. custom providers or to shell out and do hacky things from null provider. The first thing that came to mind was KubernetesPodOperator. I think if we made it easy to "bring your own terraform image or use this default image for this version (pulled from docker hub)" |
Beta Was this translation helpful? Give feedback.
-
Hey Jacob, in our Setup the DAG Author had the first task as Setup the environment where they downloaded necessary binaries from internal Nexus which would have multiple versions of binaries too as the different team relies on different versions of Terraform and Ansible. These binaries were stored in a temporary location and were cleared by the task at the end of the DAG. We had bash scripts to run Terraform and Ansible as there were clients who would run these things on ad-hoc basis via their local Machine or through Jenkins. Running things via BashOperator was an easy way for us without maintenance over-head of a Bash script and a Custom Operator. But it would be different for different teams with different use-cases I suppose :) |
Beta Was this translation helpful? Give feedback.
-
@kaxil How would you pull your terraform configuration source? Our idea would be to provide some abstraction of that setup the environment step to make it easy to have a terraform binary running in your airflow execution environment. My first thought was to essentially create a subclass of KubernetesPodOperator with a git-sync initialization container and a terraform container. This would give a lot of flexibility to the advanced user and takes care of a lot of boilerplate. tf_task = TerraformOperator(
command='terraform apply -auto-approve',
git_ssh_key_secret_name='my_tf_git_sync',
sub_path='terraform/my_dir',
terraform_image='gcr.io/my/terragrunt-image', # this would default to ''hashicorp/terraform:latest',
gcp_secret_name='gcp-terraform-key',
aws_secret_path=None,
azure_secrete_path=None,
) The drawback naturally is for non-k8s based airflow deployments. I think to make this really useful at most enterprises we need to think about how to best handle secrets. The idea here is that the user manages a k8s secret for the god-like terraform credentials and just tells our operator this secret name so we can mount it as a volume in our pod definition. Alternatively these secret names could be omitted and we could fall back on provider specific magic like workload identity. |
Beta Was this translation helpful? Give feedback.
-
I think the approach of subclassing k8s pod operator is far from how we typically build airflow integrations and the heavy reliance on k8s secrets as opposed to leaning in on airflow secrets managers might be overly opinionated. I guess I'd like to ask does this seem like something that belongs in airflow core or something that belongs as an opinionated plugin released elsewhere? Due to the wide variety of binaries folks end up relying on in their terraform scripts (even perhaps different terraform version in different DAGs), the large permissions footprint required I think it is difficult to develop a very general integration that gives the user enough flexibility to meet their security needs. Taking a step back: |
Beta Was this translation helpful? Give feedback.
-
We had everything in the bash script, our terraform modules were on a private Gitlab repo but the ssh key of our Airflow Box was added to Gitlab and the native Terraform + git integration worked fine when running terraform init for us. Although I definitely feel there are more users out there who would be happy with your solution too :) so yes please feel free to PR the TerraformOperator |
Beta Was this translation helpful? Give feedback.
-
tftest is an alternative that is actively maintained |
Beta Was this translation helpful? Give feedback.
-
Hey all, just wondering if there's potential for a Terraform/Terragrunt operator that's simply for returning Terraform output values to be used for downstream tasks. Since this operator will be scoped to only the The operator would look something like this: terra_output = TerraOutputOperator(
binary='terraform',
repo_url="https://company/infra-live",
sub_path='infra/region/project',
tf_version='1.3.5',
tg_version="32.0.0", # only used if `binary` arg is set to `terragrunt`
gcp_iam_role=None,
aws_iam_role="arn:aws:iam::role"
)
some_task = SomeCloudOperator(
task_id="some_task",
instance_arn=terra_output["instance_arn"]
) within the from tftest import TerraformTest, TerragruntTest
def execute(self, context: Context):
terra = TerraformHook()
# install repo containing IaC files to local tmp directory
tfdir = terra.get_source(self.repo_url)
# install terraform binary if not already installed and sets the version in PATH
# could use terraform version manager like tfenv
terra.set_terraform_version(self.tf_version)
if self.binary == "terraform":
terra = TerraformTest(tfdir)
elif self.binary == "terragrunt":
# install terragrunt binary if not already installed and sets the version in PATH
# could use terragrunt version manager like tgswitch
terra.set_terragrunt_version(self.tg_version)
terra = TerragruntTest(tfdir)
# removes tfdir directory and could remove terragrunt/terraform if needed
terra.cleanup()
# parses terraform/terragrunt output results into python dictionary
return terra.output() If this is within the scope of Airflow's built-in operators/hooks, I'd be happy to make a PR! |
Beta Was this translation helpful? Give feedback.
-
I think it's more appropriate for https://airflow.apache.org/ecosystem/ You can release your own project and add a PR to make it part of the ecosysystem page. Cool for community, cool for you, but you will have to take the burden of managing people using it - which is also an opportunity - to build relations and business contacts. I think terraform - while popular and cool - is not something community would bank on and support on the community level. This is my personal opinion though. I like terraform and I know how useful it can be, but also I know it's very far from the "base" software/applicaton of Airflow. There are a number of opinionated decisions to make there, and I think it is difficult to make it iinto a "generic"/"reusable" solution (unlike Helm Chart of ours) so making it part of the "product" is not a good thing :) |
Beta Was this translation helpful? Give feedback.
-
@potiuk Thanks for the link to the airflow ecosystem. I might consider adding it there. I see your point in regard to where Terraform stands within the Airflow community. Now that I think of it, it would be interesting if there was a solution but reversed. I'm thinking if there was a Terraform Airflow provider that allowed DAGs to be defined within Terraform files so that Terraform resources could be directly interpolated into DAG operator configurations. Although it seems like that would require some magic to sync the updated DAG to the hosted Airflow environment... Anyhow thanks for the quick feedback! |
Beta Was this translation helpful? Give feedback.
-
Not sure if I understand, but for sure it is possible to just use terraform do deploy smth, from within Airlfow task - for example bash operator with running |
Beta Was this translation helpful? Give feedback.
-
And also BTW. there are a number of Terraform modules existing for some specific Airflow configuration - MWAA, EKS, etc. you can find them if you search - and it pretty much reflects the idea that it should be outside of Community - they are all very specific cases, highly opinionated for their case and very difficult to generalise. Terraform makes it easy to build "specific deplouments" from "generic modules" but developing "generic airflow module" that would serve multiple different cases is almost like replacing the whole Terraform with that generic module - much better approach is to have N specific terraform scripts - each implementing specific deployment, each of them opinionated and targetted for that specific deployment with some "options" that you can turn on/off. It just makes more sense. |
Beta Was this translation helpful? Give feedback.
-
Converted to discussion - as it belongs here after all those things. I thinkn we need to figure what is a "feature" to really implement. |
Beta Was this translation helpful? Give feedback.
-
This was certainly an discussion to read through. I have spent a few hours researching how to best setup and destroy ephemeral infrastructure that tasks in a scheduled dag would need to rely on...ignoring the debate as to whether Terraform should be used for that or not (having become a fan of this opinion), my mind settled on just continuing to use Terraform code I previously had, just using airflow to schedule (and scale when needed) the infrastructure. So, I'm going to be extending the Airflow image with Terraform similar to how rearc did with Bronco: https://github.com/rearc/bronco/tree/main Then I'll be using the BashOperator similar to here: https://stackoverflow.com/a/66801089/6784817 In any case, if you get a TerraformOperator up and running in the future, I would be very interested to play with it. My current problem I am trying to solve may no longer exist by then, but if it does, I wouldn't mind considering a refactor to use a TerraformOperator. |
Beta Was this translation helpful? Give feedback.
-
Description
Create a terraform integration for apache airflow.
Use case / motivation
Use terraform to manage ephemeral infrastructure used in airflow DAGs taking advantage of it's "drift" detection features and wide array of existing integrations. For teams who use terraform this could replace tasks like create / delete dataproc cluster operator. This could be really interesting for automating nightly large scale e2e integration tests of your terraform and data pipelines code bases (terraform apply >> run data pipelines >> terraform destory.
Related Issues
Inspired by this discussion in #9593
cc: @brandonjbjelland @potiuk
Brandon and I will discuss a design for this next week.
Beta Was this translation helpful? Give feedback.
All reactions