Skip to content

Conversation

@tadeja
Copy link
Contributor

@tadeja tadeja commented Feb 9, 2026

Rationale for this change

Recent CI checks failing with the job AMD64 macOS 15-intel Python 3 being cancelled at 60 minutes.
The job has exceeded the maximum execution time of 1h0m0s

What changes are included in this PR?

Temporary timeout increase from 60 to 75 minutes for macOS Python 3 jobs (both ARM64 and Intel jobs as these have common setting.)

Are these changes tested?

To be tested on CI.

Are there any user-facing changes?

No.

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

⚠️ GitHub issue #49184 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added the awaiting review Awaiting review label Feb 9, 2026
@kou
Copy link
Member

kou commented Feb 9, 2026

We need to change the macos job instead of the docker job:

diff --git a/.github/workflows/python.yml b/.github/workflows/python.yml
index bc7fe3cd68..e9db71a8c7 100644
--- a/.github/workflows/python.yml
+++ b/.github/workflows/python.yml
@@ -142,7 +142,7 @@ jobs:
     name: ${{ matrix.architecture }} macOS ${{ matrix.macos-version }} Python 3
     runs-on: macos-${{ matrix.macos-version }}
     if: ${{ !contains(github.event.pull_request.title, 'WIP') }}
-    timeout-minutes: 60
+    timeout-minutes: 75
     strategy:
       fail-fast: false
       matrix:

@tadeja tadeja force-pushed the 49184-macos-15-intel-timeout branch from 92520c5 to d9590d8 Compare February 9, 2026 21:41
@tadeja
Copy link
Contributor Author

tadeja commented Feb 9, 2026

Thanks, @kou! Updated.

@kou
Copy link
Member

kou commented Feb 9, 2026

BTW, can we speed up macOS jobs...?

It seems that tests were finished in about 5min on Linux:

https://github.com/apache/arrow/actions/runs/21831237516/job/62989873777#step:6:6707

= 7836 passed, 253 skipped, 22 xfailed, 2 xpassed, 53 warnings in 247.63s (0:04:07) =

But tests were finished in about 30min on macOS:

https://github.com/apache/arrow/actions/runs/21831237516/job/62989873535#step:10:541

= 7527 passed, 573 skipped, 11 xfailed, 2 xpassed, 53 warnings in 1634.73s (0:27:14) =

Have you profiled tests on macOS?

@tadeja
Copy link
Contributor Author

tadeja commented Feb 10, 2026

@kou speeding up jobs would be best, indeed!

I've been checking Build and why ccache almost never retrieves for macOS 15-intel - as it gets evicted due to the 10 GB repo-wide cache limit.
Cache not found for input keys: python-ccache-macos-15-intel-

I see just around 7 successful completions in February
12m example A and 4m example B with ccache

(Perhaps using sccache for macOS could be an option?)

I will check Test phase and profiling options next. For now I've added "--durations=20 -v" on pytest

Locally on M1 I get pytest finished in 86.44s (0:01:26) and the following test duration for comparison:

=========================================== slowest 20 durations ============================================
5.91s call     tests/test_fs.py::test_s3_options[builtin_pickle]
5.90s call     tests/test_fs.py::test_s3_options[cloudpickle]
4.67s call     tests/test_cython.py::test_cython_api
3.80s call     tests/test_extension_type.py::test_cpp_extension_in_python
3.76s call     tests/test_cython.py::test_visit_strings
2.95s call     tests/test_dataset.py::test_write_dataset_with_backpressure
1.66s call     tests/test_csv.py::TestThreadedCSVTableRead::test_cancellation
1.47s call     tests/test_fs.py::test_s3_real_aws_region_selection
1.08s call     tests/test_io.py::test_compression_level[zstd]
1.06s call     tests/test_csv.py::TestSerialCSVTableRead::test_cancellation
1.02s call     tests/test_pandas.py::test_is_data_frame_race_condition
0.97s call     tests/test_fs.py::test_s3_finalize
0.66s call     tests/parquet/test_metadata.py::test_table_large_metadata
0.64s call     tests/test_pandas.py::TestConvertMisc::test_threaded_conversion_multiprocess
0.60s call     tests/test_fs.py::test_s3fs_wrong_region
0.60s call     tests/test_fs.py::test_s3_finalize_region_resolver
0.58s call     tests/test_fs.py::test_concurrent_s3fs_init
0.50s call     tests/test_fs.py::test_s3_real_aws
0.48s call     tests/test_ipc.py::test_read_year_month_nano_interval
0.46s call     tests/test_pandas.py::test_threaded_pandas_import

@tadeja
Copy link
Contributor Author

tadeja commented Feb 10, 2026

ARM64 macOS 14 Python 3 shows deterioration

============================= slowest 20 durations =============================
854.97s call     tests/test_pandas.py::test_nested_chunking_valid
84.06s call     tests/test_convert_builtin.py::test_string_too_large[ty1]
59.02s call     tests/test_convert_builtin.py::test_auto_chunking_binary_like
55.42s call     tests/test_convert_builtin.py::test_string_too_large[ty2]
54.07s call     tests/test_convert_builtin.py::test_large_binary_array[ty1]
46.97s call     tests/test_feather.py::test_chunked_binary_error_message
46.07s call     tests/test_convert_builtin.py::test_large_binary_array[ty0]
36.98s call     tests/test_pandas.py::TestConvertStringLikeTypes::test_bytes_exceed_2gb
30.02s call     tests/interchange/test_conversion.py::test_pyarrow_roundtrip_large_string
29.80s call     tests/test_convert_builtin.py::test_array_from_pylist_data_overflow
28.25s call     tests/test_convert_builtin.py::test_nested_auto_chunking[ty1-x]
26.66s call     tests/test_convert_builtin.py::test_nested_auto_chunking[ty0-x]
26.12s call     tests/parquet/test_parquet_writer.py::test_parquet_writer_chunk_size
25.14s call     tests/test_dataset.py::test_write_dataset_with_backpressure
24.57s call     tests/test_convert_builtin.py::test_string_too_large[ty0]
23.43s call     tests/test_array.py::test_list_child_overflow_to_chunked
22.41s call     tests/test_convert_builtin.py::test_auto_chunking_list_like
17.74s call     tests/test_pandas.py::TestConvertListTypes::test_auto_chunking_on_list_overflow
16.53s call     tests/test_pandas.py::TestConvertStringLikeTypes::test_auto_chunking_pandas_series_of_strings[x1]
15.80s call     tests/test_pandas.py::TestConvertStringLikeTypes::test_auto_chunking_pandas_series_of_strings[x0]

For ARM64 macOS 14 Python 3 durations didn't print yet as it got cancelled at [ 99%] this time within the new 75m timeout setting. I will try to run once more with 90m timeout setting.

@tadeja
Copy link
Contributor Author

tadeja commented Feb 10, 2026

AMD64 macOS 15-intel Python 3 finally succeeds in 1h 16m 47s

============================= slowest 40 durations =============================
1712.00s call     tests/test_pandas.py::test_nested_chunking_valid
114.92s call     tests/test_convert_builtin.py::test_large_binary_array[ty0]
90.11s call     tests/interchange/test_conversion.py::test_pyarrow_roundtrip_large_string
76.32s call     tests/test_convert_builtin.py::test_large_binary_array[ty1]
73.84s call     tests/test_convert_builtin.py::test_auto_chunking_binary_like
54.73s call     tests/test_convert_builtin.py::test_string_too_large[ty1]
35.69s call     tests/test_feather.py::test_chunked_binary_error_message
35.54s call     tests/test_convert_builtin.py::test_nested_auto_chunking[ty0-x]
35.05s call     tests/test_convert_builtin.py::test_nested_auto_chunking[ty1-x]
30.69s call     tests/test_cython.py::test_cython_api
28.96s call     tests/parquet/test_parquet_writer.py::test_parquet_writer_chunk_size
27.06s call     tests/test_convert_builtin.py::test_array_from_pylist_data_overflow
26.29s call     tests/test_pandas.py::TestConvertStringLikeTypes::test_bytes_exceed_2gb
25.96s call     tests/test_array.py::test_list_child_overflow_to_chunked
24.99s call     tests/test_dataset.py::test_write_dataset_with_backpressure
22.27s call     tests/test_io.py::test_compression_level[zstd]
21.09s call     tests/test_pandas.py::TestConvertListTypes::test_auto_chunking_on_list_overflow
18.70s call     tests/test_cython.py::test_visit_strings
17.46s call     tests/test_convert_builtin.py::test_auto_chunking_list_like
16.43s call     tests/test_extension_type.py::test_cpp_extension_in_python
14.16s call     tests/parquet/test_data_types.py::test_large_binary_overflow
...
=========================== short test summary info ============================

@tadeja
Copy link
Contributor Author

tadeja commented Feb 10, 2026

  • I changed PYARROW_TEST_LARGE_MEMORY: to OFF only for macOS 15-intel job to skip all large memory tests, including the longest one of 28 minutestests/test_pandas.py::test_nested_chunking_valid.
    Remaining are
============================= slowest 40 durations =============================
97.90s call     tests/test_cython.py::test_cython_api
33.97s call     tests/test_cython.py::test_visit_strings
32.49s call     tests/test_dataset.py::test_write_dataset_with_backpressure
25.90s call     tests/parquet/test_metadata.py::test_table_large_metadata
25.56s call     tests/test_extension_type.py::test_cpp_extension_in_python
24.93s call     tests/test_io.py::test_compression_level[zstd]
17.69s call     tests/test_memory.py::test_env_var
14.40s setup    tests/parquet/test_dataset.py::test_read_s3fs
...

https://github.com/apache/arrow/actions/runs/21868873981/job/63117751343?pr=49189#step:10:8522
= 7506 passed, 594 skipped, 11 xfailed, 2 xpassed, 53 warnings in 746.69s (0:12:26) =

Skipping large memory tests for 15-intel and purely coincidental ccache hit got macOS 15-intel job runtime from 76 minutes down to 32 minutes!

============================= slowest 40 durations =============================
803.30s call     tests/test_pandas.py::test_nested_chunking_valid
48.00s call     tests/test_convert_builtin.py::test_string_too_large[ty1]
43.55s call     tests/test_convert_builtin.py::test_auto_chunking_binary_like
36.04s call     tests/test_convert_builtin.py::test_large_binary_array[ty0]
33.03s call     tests/test_convert_builtin.py::test_large_binary_array[ty1]
28.53s call     tests/test_convert_builtin.py::test_string_too_large[ty2]
27.13s call     tests/test_feather.py::test_chunked_binary_error_message
23.89s call     tests/interchange/test_conversion.py::test_pyarrow_roundtrip_large_string
22.55s call     tests/test_convert_builtin.py::test_array_from_pylist_data_overflow
22.32s call     tests/test_pandas.py::TestConvertStringLikeTypes::test_bytes_exceed_2gb
21.72s call     tests/parquet/test_parquet_writer.py::test_parquet_writer_chunk_size
...
= 7527 passed, 573 skipped, 11 xfailed, 2 xpassed, 53 warnings in 1451.60s (0:24:11) =

--> Is skipping large memory tests only for macOS 15-intel and keeping on ARM macOS 14 something that can accepted for now to get CI to pass?
I'll be removing this macOS timeout-minutes: 90, so back to 60.

PYARROW_TEST_LARGE_MEMORY: ${{ matrix.large-memory-tests }}
PYTEST_ARGS: "--durations=40 -v"
# Current oldest supported version according to https://endoflife.date/macos
MACOSX_DEPLOYMENT_TARGET: 12.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should bump this to 14? Since it's the oldest not EOL yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried setting MACOSX_DEPLOYMENT_TARGET: 14.0 and having large memory tests reenabled for macOS 15-intel but there isn't any timing improvement for either macOS.
15-intel is back to being cancelled at 60m while it's at 91% of the longest test_pandas.py
https://github.com/apache/arrow/actions/runs/21907459012/job/63251481075?pr=49189#step:10:248

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose disabling large-memory-tests is best for now then.

Copy link
Contributor Author

@tadeja tadeja Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, back to the previous fix - disabling large memory tests on 15-intel which take more than two thirds of test time there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulcd should we bump MACOSX_DEPLOYMENT_TARGETs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure I've seen this conversation popping up somewhere else but I can't find where.
I think it's reasonable, we did update it 1.5 years ago here but I think it's time to upgrade:
6db12f2
Probably worth opening an issue and tracking this individually also to give visibility to the issue in case there are any concerns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue #49246

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Feb 10, 2026
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 11, 2026
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 11, 2026
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Feb 11, 2026
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 11, 2026
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 11, 2026
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tadeja for tackling this!

ARROW_BUILD_TESTS: OFF
PYARROW_TEST_LARGE_MEMORY: ON
PYARROW_TEST_LARGE_MEMORY: ${{ matrix.large-memory-tests }}
PYTEST_ARGS: "--durations=40"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add the duration output to more jobs as an improvement so we can act if we start finding some really slow tests 🤔. This would be a different issue though.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Feb 12, 2026
@raulcd
Copy link
Member

raulcd commented Feb 12, 2026

Tests are still taking around 30 minutes to pass on macOS. If the problem is memory bound due to GitHub runners limitations there's not much we could do. I think we should merge this so we fix CI but maybe we should open a different issues to keep investigating whether there's something we could do about that. Example, has someone with a macOS with "normal" specs (not GitHub runner) validate whether tests are that slow there?

@tadeja
Copy link
Contributor Author

tadeja commented Feb 12, 2026

Thank you, @raulcd.
What do you say about trying additional pytest-xdist in python/requirements-test.txt and running pytest -n auto only for macOS on CI while continuing to investigate?

I don't have 14 or 15 but locally on M1 26 with one worker, without parallelism for pytest finishing under 2 minutes!
(Earlier post here: #49189 (comment).
And today I have ====== 7552 passed, 513 skipped, 15 xfailed, 2 xpassed, 54 warnings in 90.61s (0:01:30) ====== )

@raulcd
Copy link
Member

raulcd commented Feb 12, 2026

====== 7552 passed, 513 skipped, 15 xfailed, 2 xpassed, 54 warnings in 90.61s (0:01:30) ====== )

On one hand that sounds awesome, on the other hand and looking at the specs for the GH runners, specifically for the macos-15-intel, 14GB of RAM should be more than enough:
https://docs.github.com/en/actions/reference/runners/github-hosted-runners#standard-github-hosted-runners-for-public-repositories

Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see these tests speed up! I think this is good to merge.

We should probably look at sccache in a separate effort as we seem to be doing a lot of early cache evictions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants