GH-49184: [CI] AMD64 macOS 15-intel Python 3 consistently times out #49189

tadeja · 2026-02-09T15:29:04Z

Rationale for this change

Recent CI checks failing with the job AMD64 macOS 15-intel Python 3 being cancelled at 60 minutes.
The job has exceeded the maximum execution time of 1h0m0s

What changes are included in this PR?

Temporary timeout increase from 60 to 75 minutes for macOS Python 3 jobs (both ARM64 and Intel jobs as these have common setting.)

Are these changes tested?

To be tested on CI.

Are there any user-facing changes?

No.

GitHub Issue: [CI] AMD64 macOS 15-intel Python 3 consistently times out #49184

github-actions · 2026-02-09T15:29:34Z

⚠️ GitHub issue #49184 has been automatically assigned in GitHub to PR creator.

kou · 2026-02-09T21:19:40Z

We need to change the macos job instead of the docker job:

diff --git a/.github/workflows/python.yml b/.github/workflows/python.yml
index bc7fe3cd68..e9db71a8c7 100644
--- a/.github/workflows/python.yml
+++ b/.github/workflows/python.yml
@@ -142,7 +142,7 @@ jobs:
     name: ${{ matrix.architecture }} macOS ${{ matrix.macos-version }} Python 3
     runs-on: macos-${{ matrix.macos-version }}
     if: ${{ !contains(github.event.pull_request.title, 'WIP') }}
-    timeout-minutes: 60
+    timeout-minutes: 75
     strategy:
       fail-fast: false
       matrix:

tadeja · 2026-02-09T21:49:41Z

Thanks, @kou! Updated.

kou · 2026-02-09T21:57:07Z

BTW, can we speed up macOS jobs...?

It seems that tests were finished in about 5min on Linux:

https://github.com/apache/arrow/actions/runs/21831237516/job/62989873777#step:6:6707

= 7836 passed, 253 skipped, 22 xfailed, 2 xpassed, 53 warnings in 247.63s (0:04:07) =

But tests were finished in about 30min on macOS:

https://github.com/apache/arrow/actions/runs/21831237516/job/62989873535#step:10:541

= 7527 passed, 573 skipped, 11 xfailed, 2 xpassed, 53 warnings in 1634.73s (0:27:14) =

Have you profiled tests on macOS?

tadeja · 2026-02-10T00:46:51Z

@kou speeding up jobs would be best, indeed!

I've been checking Build and why ccache almost never retrieves for macOS 15-intel - as it gets evicted due to the 10 GB repo-wide cache limit.
Cache not found for input keys: python-ccache-macos-15-intel-

I see just around 7 successful completions in February
12m example A and 4m example B with ccache

(Perhaps using sccache for macOS could be an option?)

I will check Test phase and profiling options next. For now I've added "--durations=20 -v" on pytest

Locally on M1 I get pytest finished in 86.44s (0:01:26) and the following test duration for comparison:

=========================================== slowest 20 durations ============================================
5.91s call     tests/test_fs.py::test_s3_options[builtin_pickle]
5.90s call     tests/test_fs.py::test_s3_options[cloudpickle]
4.67s call     tests/test_cython.py::test_cython_api
3.80s call     tests/test_extension_type.py::test_cpp_extension_in_python
3.76s call     tests/test_cython.py::test_visit_strings
2.95s call     tests/test_dataset.py::test_write_dataset_with_backpressure
1.66s call     tests/test_csv.py::TestThreadedCSVTableRead::test_cancellation
1.47s call     tests/test_fs.py::test_s3_real_aws_region_selection
1.08s call     tests/test_io.py::test_compression_level[zstd]
1.06s call     tests/test_csv.py::TestSerialCSVTableRead::test_cancellation
1.02s call     tests/test_pandas.py::test_is_data_frame_race_condition
0.97s call     tests/test_fs.py::test_s3_finalize
0.66s call     tests/parquet/test_metadata.py::test_table_large_metadata
0.64s call     tests/test_pandas.py::TestConvertMisc::test_threaded_conversion_multiprocess
0.60s call     tests/test_fs.py::test_s3fs_wrong_region
0.60s call     tests/test_fs.py::test_s3_finalize_region_resolver
0.58s call     tests/test_fs.py::test_concurrent_s3fs_init
0.50s call     tests/test_fs.py::test_s3_real_aws
0.48s call     tests/test_ipc.py::test_read_year_month_nano_interval
0.46s call     tests/test_pandas.py::test_threaded_pandas_import

tadeja · 2026-02-10T12:22:07Z

ARM64 macOS 14 Python 3 shows deterioration

============================= slowest 20 durations =============================
854.97s call     tests/test_pandas.py::test_nested_chunking_valid
84.06s call     tests/test_convert_builtin.py::test_string_too_large[ty1]
59.02s call     tests/test_convert_builtin.py::test_auto_chunking_binary_like
55.42s call     tests/test_convert_builtin.py::test_string_too_large[ty2]
54.07s call     tests/test_convert_builtin.py::test_large_binary_array[ty1]
46.97s call     tests/test_feather.py::test_chunked_binary_error_message
46.07s call     tests/test_convert_builtin.py::test_large_binary_array[ty0]
36.98s call     tests/test_pandas.py::TestConvertStringLikeTypes::test_bytes_exceed_2gb
30.02s call     tests/interchange/test_conversion.py::test_pyarrow_roundtrip_large_string
29.80s call     tests/test_convert_builtin.py::test_array_from_pylist_data_overflow
28.25s call     tests/test_convert_builtin.py::test_nested_auto_chunking[ty1-x]
26.66s call     tests/test_convert_builtin.py::test_nested_auto_chunking[ty0-x]
26.12s call     tests/parquet/test_parquet_writer.py::test_parquet_writer_chunk_size
25.14s call     tests/test_dataset.py::test_write_dataset_with_backpressure
24.57s call     tests/test_convert_builtin.py::test_string_too_large[ty0]
23.43s call     tests/test_array.py::test_list_child_overflow_to_chunked
22.41s call     tests/test_convert_builtin.py::test_auto_chunking_list_like
17.74s call     tests/test_pandas.py::TestConvertListTypes::test_auto_chunking_on_list_overflow
16.53s call     tests/test_pandas.py::TestConvertStringLikeTypes::test_auto_chunking_pandas_series_of_strings[x1]
15.80s call     tests/test_pandas.py::TestConvertStringLikeTypes::test_auto_chunking_pandas_series_of_strings[x0]

For ARM64 macOS 14 Python 3 durations didn't print yet as it got cancelled at [ 99%] this time within the new 75m timeout setting. I will try to run once more with 90m timeout setting.

tadeja · 2026-02-10T14:11:15Z

AMD64 macOS 15-intel Python 3 finally succeeds in 1h 16m 47s

============================= slowest 40 durations =============================
1712.00s call     tests/test_pandas.py::test_nested_chunking_valid
114.92s call     tests/test_convert_builtin.py::test_large_binary_array[ty0]
90.11s call     tests/interchange/test_conversion.py::test_pyarrow_roundtrip_large_string
76.32s call     tests/test_convert_builtin.py::test_large_binary_array[ty1]
73.84s call     tests/test_convert_builtin.py::test_auto_chunking_binary_like
54.73s call     tests/test_convert_builtin.py::test_string_too_large[ty1]
35.69s call     tests/test_feather.py::test_chunked_binary_error_message
35.54s call     tests/test_convert_builtin.py::test_nested_auto_chunking[ty0-x]
35.05s call     tests/test_convert_builtin.py::test_nested_auto_chunking[ty1-x]
30.69s call     tests/test_cython.py::test_cython_api
28.96s call     tests/parquet/test_parquet_writer.py::test_parquet_writer_chunk_size
27.06s call     tests/test_convert_builtin.py::test_array_from_pylist_data_overflow
26.29s call     tests/test_pandas.py::TestConvertStringLikeTypes::test_bytes_exceed_2gb
25.96s call     tests/test_array.py::test_list_child_overflow_to_chunked
24.99s call     tests/test_dataset.py::test_write_dataset_with_backpressure
22.27s call     tests/test_io.py::test_compression_level[zstd]
21.09s call     tests/test_pandas.py::TestConvertListTypes::test_auto_chunking_on_list_overflow
18.70s call     tests/test_cython.py::test_visit_strings
17.46s call     tests/test_convert_builtin.py::test_auto_chunking_list_like
16.43s call     tests/test_extension_type.py::test_cpp_extension_in_python
14.16s call     tests/parquet/test_data_types.py::test_large_binary_overflow
...
=========================== short test summary info ============================

tadeja · 2026-02-10T17:46:59Z

I changed PYARROW_TEST_LARGE_MEMORY: to OFF only for macOS 15-intel job to skip all large memory tests, including the longest one of 28 minutestests/test_pandas.py::test_nested_chunking_valid.
Remaining are

============================= slowest 40 durations =============================
97.90s call     tests/test_cython.py::test_cython_api
33.97s call     tests/test_cython.py::test_visit_strings
32.49s call     tests/test_dataset.py::test_write_dataset_with_backpressure
25.90s call     tests/parquet/test_metadata.py::test_table_large_metadata
25.56s call     tests/test_extension_type.py::test_cpp_extension_in_python
24.93s call     tests/test_io.py::test_compression_level[zstd]
17.69s call     tests/test_memory.py::test_env_var
14.40s setup    tests/parquet/test_dataset.py::test_read_s3fs
...

https://github.com/apache/arrow/actions/runs/21868873981/job/63117751343?pr=49189#step:10:8522
= 7506 passed, 594 skipped, 11 xfailed, 2 xpassed, 53 warnings in 746.69s (0:12:26) =

Skipping large memory tests for 15-intel and purely coincidental ccache hit got macOS 15-intel job runtime from 76 minutes down to 32 minutes!

For ARM macOS 14 PYARROW_TEST_LARGE_MEMORY: remains ON, so large memory tests remain there.

============================= slowest 40 durations =============================
803.30s call     tests/test_pandas.py::test_nested_chunking_valid
48.00s call     tests/test_convert_builtin.py::test_string_too_large[ty1]
43.55s call     tests/test_convert_builtin.py::test_auto_chunking_binary_like
36.04s call     tests/test_convert_builtin.py::test_large_binary_array[ty0]
33.03s call     tests/test_convert_builtin.py::test_large_binary_array[ty1]
28.53s call     tests/test_convert_builtin.py::test_string_too_large[ty2]
27.13s call     tests/test_feather.py::test_chunked_binary_error_message
23.89s call     tests/interchange/test_conversion.py::test_pyarrow_roundtrip_large_string
22.55s call     tests/test_convert_builtin.py::test_array_from_pylist_data_overflow
22.32s call     tests/test_pandas.py::TestConvertStringLikeTypes::test_bytes_exceed_2gb
21.72s call     tests/parquet/test_parquet_writer.py::test_parquet_writer_chunk_size
...
= 7527 passed, 573 skipped, 11 xfailed, 2 xpassed, 53 warnings in 1451.60s (0:24:11) =

--> Is skipping large memory tests only for macOS 15-intel and keeping on ARM macOS 14 something that can accepted for now to get CI to pass?
I'll be removing this macOS timeout-minutes: 90, so back to 60.

rok · 2026-02-10T21:10:11Z

.github/workflows/python.yml

+      PYARROW_TEST_LARGE_MEMORY: ${{ matrix.large-memory-tests }}
+      PYTEST_ARGS: "--durations=40 -v"
      # Current oldest supported version according to https://endoflife.date/macos
      MACOSX_DEPLOYMENT_TARGET: 12.0


Perhaps we should bump this to 14? Since it's the oldest not EOL yet.

Tried setting MACOSX_DEPLOYMENT_TARGET: 14.0 and having large memory tests reenabled for macOS 15-intel but there isn't any timing improvement for either macOS.
15-intel is back to being cancelled at 60m while it's at 91% of the longest test_pandas.py
https://github.com/apache/arrow/actions/runs/21907459012/job/63251481075?pr=49189#step:10:248

I suppose disabling large-memory-tests is best for now then.

Ok, back to the previous fix - disabling large memory tests on 15-intel which take more than two thirds of test time there.

@raulcd should we bump MACOSX_DEPLOYMENT_TARGETs?

I am pretty sure I've seen this conversation popping up somewhere else but I can't find where.
I think it's reasonable, we did update it 1.5 years ago here but I think it's time to upgrade:
6db12f2
Probably worth opening an issue and tracking this individually also to give visibility to the issue in case there are any concerns.

Opened an issue #49246

raulcd

Thanks @tadeja for tackling this!

raulcd · 2026-02-12T08:40:52Z

.github/workflows/python.yml

      ARROW_BUILD_TESTS: OFF
-      PYARROW_TEST_LARGE_MEMORY: ON
+      PYARROW_TEST_LARGE_MEMORY: ${{ matrix.large-memory-tests }}
+      PYTEST_ARGS: "--durations=40"


Maybe we could add the duration output to more jobs as an improvement so we can act if we start finding some really slow tests 🤔. This would be a different issue though.

raulcd · 2026-02-12T08:47:17Z

Tests are still taking around 30 minutes to pass on macOS. If the problem is memory bound due to GitHub runners limitations there's not much we could do. I think we should merge this so we fix CI but maybe we should open a different issues to keep investigating whether there's something we could do about that. Example, has someone with a macOS with "normal" specs (not GitHub runner) validate whether tests are that slow there?

tadeja · 2026-02-12T10:38:02Z

Thank you, @raulcd.
What do you say about trying additional pytest-xdist in python/requirements-test.txt and running pytest -n auto only for macOS on CI while continuing to investigate?

I don't have 14 or 15 but locally on M1 26 with one worker, without parallelism for pytest finishing under 2 minutes!
(Earlier post here: #49189 (comment).
And today I have ====== 7552 passed, 513 skipped, 15 xfailed, 2 xpassed, 54 warnings in 90.61s (0:01:30) ====== )

raulcd · 2026-02-12T11:04:20Z

====== 7552 passed, 513 skipped, 15 xfailed, 2 xpassed, 54 warnings in 90.61s (0:01:30) ====== )

On one hand that sounds awesome, on the other hand and looking at the specs for the GH runners, specifically for the macos-15-intel, 14GB of RAM should be more than enough:
https://docs.github.com/en/actions/reference/runners/github-hosted-runners#standard-github-hosted-runners-for-public-repositories

rok

Glad to see these tests speed up! I think this is good to merge.

We should probably look at sccache in a separate effort as we seem to be doing a lot of early cache evictions.

Temporary timeout increase

603dafc

tadeja requested review from assignUser, jonkeane, kou and raulcd as code owners February 9, 2026 15:29

github-actions bot added the awaiting review Awaiting review label Feb 9, 2026

tadeja added 3 commits February 9, 2026 17:43

Run CI with 75 timeout

c74c435

Run CI with 75 timeout

79a2fcc

Run CI with 75 timeout

9d81700

Revert and update macos job

d9590d8

tadeja force-pushed the 49184-macos-15-intel-timeout branch from 92520c5 to d9590d8 Compare February 9, 2026 21:41

Pytest verbose

33e313e

Temporary timeout 90

b0b5aa5

tadeja added 2 commits February 10, 2026 15:13

Skip large memory tests for Intel

5571a78

Skip large memory tests for Intel

d382fa2

Revert macOS timeout to 60

343956a

rok reviewed Feb 10, 2026

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Feb 10, 2026

Test run target 14 instead 12

241f634

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 11, 2026

Run again with large memory tests enabled

3b65e68

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 11, 2026

Back to disabling large memory tests on 15-intel

033cc64

github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Feb 11, 2026

Back to MacOS target 12

3fd861e

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 11, 2026

rok mentioned this pull request Feb 11, 2026

[CI] Bump MACOSX_DEPLOYMENT_TARGET to 14 #49246

Open

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 11, 2026

raulcd approved these changes Feb 12, 2026

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Feb 12, 2026

Testing pytest-xdist for macOS

8acec56

tadeja requested a review from AlenkaF as a code owner February 12, 2026 10:58

github-actions bot added the Component: Python label Feb 12, 2026

rok approved these changes Feb 12, 2026

View reviewed changes

GH-49184: [CI] AMD64 macOS 15-intel Python 3 consistently times out #49189

Are you sure you want to change the base?

GH-49184: [CI] AMD64 macOS 15-intel Python 3 consistently times out #49189

Conversation

tadeja commented Feb 9, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

kou commented Feb 9, 2026

Uh oh!

tadeja commented Feb 9, 2026

Uh oh!

kou commented Feb 9, 2026

Uh oh!

tadeja commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tadeja commented Feb 10, 2026

Uh oh!

tadeja commented Feb 10, 2026

Uh oh!

tadeja commented Feb 10, 2026

Uh oh!

rok Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

tadeja Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

rok Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

tadeja Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rok Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

raulcd Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

rok Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

raulcd left a comment

Choose a reason for hiding this comment

Uh oh!

raulcd Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

raulcd commented Feb 12, 2026

Uh oh!

tadeja commented Feb 12, 2026

Uh oh!

raulcd commented Feb 12, 2026

Uh oh!

rok left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tadeja commented Feb 9, 2026 •

edited by github-actions bot

Loading

tadeja commented Feb 10, 2026 •

edited

Loading

tadeja Feb 11, 2026 •

edited

Loading