Log fetches for EcsRunTaskOperator return incomplete logs #43717
Labels
area:logging
area:providers
kind:bug
This is a clearly a bug
needs-triage
label for new issues that we didn't triage yet
provider:amazon-aws
AWS/Amazon - related issues
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
apache-airflow==2.10.1
apache-airflow-providers-airbyte==3.9.0
apache-airflow-providers-amazon==8.28.0
apache-airflow-providers-databricks==6.9.0
apache-airflow-providers-docker==3.13.0
apache-airflow-providers-slack==8.9.0
apache-airflow-providers-snowflake==5.7.0
Apache Airflow version
2.10.1
Operating System
Amazon Linux
Deployment
Amazon (AWS) MWAA
Deployment details
Our MWAA environments are managed through Terraform. We install 1 provider outside of those available in the public providers, as well as 1 internal provider.
What happened
Related to #40875
We previously reported the same issue, and after upgrading to a newer version of Airflow and the AWS provider that included the change, we were still able to detect an issue with logs appearing across log groups for ECS tasks.
In all versions of Airflow between 2.7 and 2.10, we have observed logs missing from the
*-Task
log group of our MWAA deployment. As in #40875, we continue to use theawslogs_group
argument for the EcsRunTaskOperator to specify the logs group that the UI should pull from.On successful task execution
The log group associated with our ECS cluster (
/ecs/airflow2/airflow2
) retains all logs, but these are not fully reflected in the*-Task
log group (airflow-airflow2-Task
). The following files demonstrate a pair of log streams that should contain all the same events, but all lines after[0m13:11:27 7 of 30 START test not_null_disease_location_proximity_context_results_parsed_updt_dttm [RUN]
are missing in the log stream for the*-Task
log group.ECS cluster logstream (log group
/ecs/airflow2/airflow2
): 202411015-ecs:airflow2:airflow-logstream.csvMWAA logstream (log group
airflow-airflow2-Task
): 20241105-airflow-airflow2-Task-logstream.csvOn unsuccessful task execution
Additionally, we found that
::group::Post task execution logs
were able to retrieve remaining logs during failed tasks, but successful tasks left the remaining logs behind. The following is an example of a failed task that includes all log events as part of post task execution logs pulled into the log stream.ECS cluster logstream (log group
/ecs/airflow2/airflow2
): 20241008-ecs:airflow2:airflow2-logstream.csvMWAA logstream (log group
airflow-airflow2-Task
): 20241008-airflow-airflow-Task-logstream.csvEcsRunTaskOperator call
This is the most current version of our call to the EcsRunTaskOperator class:
What you think should happen instead
All log events in the ECS log group should be pulled forward by the MWAA
-*Task
log group using thetask_log_fetcher
.How to reproduce
As in #40875, we are using a large (~2GB) custom DBT image inside our taskdef. This has happened with every execution of the EcsRunTaskOperator against our custom DBT image.
After triggering a task using the EcsRunTaskOperator, all logs should be copied forward from the log stream in the ECS cluster's log group to the log stream within the MWAA task log group.
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: