Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log fetches for EcsRunTaskOperator return incomplete logs #43717

Open
1 of 2 tasks
yaningz opened this issue Nov 5, 2024 · 0 comments
Open
1 of 2 tasks

Log fetches for EcsRunTaskOperator return incomplete logs #43717

yaningz opened this issue Nov 5, 2024 · 0 comments
Labels
area:logging area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:amazon-aws AWS/Amazon - related issues

Comments

@yaningz
Copy link

yaningz commented Nov 5, 2024

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

apache-airflow==2.10.1
apache-airflow-providers-airbyte==3.9.0
apache-airflow-providers-amazon==8.28.0
apache-airflow-providers-databricks==6.9.0
apache-airflow-providers-docker==3.13.0
apache-airflow-providers-slack==8.9.0
apache-airflow-providers-snowflake==5.7.0

Apache Airflow version

2.10.1

Operating System

Amazon Linux

Deployment

Amazon (AWS) MWAA

Deployment details

Our MWAA environments are managed through Terraform. We install 1 provider outside of those available in the public providers, as well as 1 internal provider.

What happened

Related to #40875

We previously reported the same issue, and after upgrading to a newer version of Airflow and the AWS provider that included the change, we were still able to detect an issue with logs appearing across log groups for ECS tasks.

In all versions of Airflow between 2.7 and 2.10, we have observed logs missing from the *-Task log group of our MWAA deployment. As in #40875, we continue to use the awslogs_group argument for the EcsRunTaskOperator to specify the logs group that the UI should pull from.

On successful task execution

The log group associated with our ECS cluster ( /ecs/airflow2/airflow2 ) retains all logs, but these are not fully reflected in the *-Task log group ( airflow-airflow2-Task ). The following files demonstrate a pair of log streams that should contain all the same events, but all lines after [0m13:11:27 7 of 30 START test not_null_disease_location_proximity_context_results_parsed_updt_dttm [RUN] are missing in the log stream for the *-Task log group.

ECS cluster logstream (log group /ecs/airflow2/airflow2 ): 202411015-ecs:airflow2:airflow-logstream.csv

MWAA logstream (log group airflow-airflow2-Task ): 20241105-airflow-airflow2-Task-logstream.csv

On unsuccessful task execution

Additionally, we found that ::group::Post task execution logs were able to retrieve remaining logs during failed tasks, but successful tasks left the remaining logs behind. The following is an example of a failed task that includes all log events as part of post task execution logs pulled into the log stream.

ECS cluster logstream (log group /ecs/airflow2/airflow2 ): 20241008-ecs:airflow2:airflow2-logstream.csv

MWAA logstream (log group airflow-airflow2-Task ): 20241008-airflow-airflow-Task-logstream.csv

EcsRunTaskOperator call

This is the most current version of our call to the EcsRunTaskOperator class:

        super().__init__(
            task_definition=task_definition,
            cluster=airflow_env_name,
            overrides={
                "containerOverrides": [
                    {
                        "name": "dbt-om1-task",
                        "command": command_list,
                    },
                ],
            },
            launch_type="FARGATE",
            network_configuration={
                "awsvpcConfiguration": {
                    "subnets": [
                        os.environ.get("AIRFLOW__VAR__PRIMARY_SUBNET_ID"),
                        os.environ.get("AIRFLOW__VAR__SECONDARY_SUBNET_ID"),
                        os.environ.get( "AIRFLOW__VAR__TERTIARY_SUBNET_ID"),
                    ],
                    "securityGroups": [
                        os.environ.get("AIRFLOW__VAR__SECURITY_GROUP_ID")
                    ],
                    "assignPublicIp": "DISABLED",
                },
            },
            awslogs_group=f"/ecs/airflow2/airflow2",
            awslogs_region="us-east-1",
            awslogs_stream_prefix=f"airflow2/dbt-om1-task",
            awslogs_fetch_interval=timedelta(seconds=30),
            propagate_tags="TASK_DEFINITION",
            **kwargs,
        )

What you think should happen instead

All log events in the ECS log group should be pulled forward by the MWAA -*Task log group using the task_log_fetcher.

How to reproduce

As in #40875, we are using a large (~2GB) custom DBT image inside our taskdef. This has happened with every execution of the EcsRunTaskOperator against our custom DBT image.

After triggering a task using the EcsRunTaskOperator, all logs should be copied forward from the log stream in the ECS cluster's log group to the log stream within the MWAA task log group.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@yaningz yaningz added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Nov 5, 2024
@dosubot dosubot bot added area:logging provider:amazon-aws AWS/Amazon - related issues labels Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

No branches or pull requests

1 participant