Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in to_dataset_dict with local netcdf files - dask related? #697

Open
menzel-gfdl opened this issue Jan 17, 2025 · 12 comments

Comments

@menzel-gfdl
Copy link

Description

I am trying to use a catalog with local netCDF datasets, but am seeing a segmentation fault. It occurs when calling to_dataset_dict(), but doesn't always occur when processing the same file (and rarely doesn't occur at all). From the traceback, it appears to happen when dask runs the _open_dataset routine in parallel using threads:

Fatal Python error: Segmentation fault                                                         
                                                                                               
Thread 0x00007f5bf37fe700 (most recent call first):                                            
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backen
ds/file_manager.py", line 217 in _acquire_with_cache_info 
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backen
ds/file_manager.py", line 199 in acquire_context 
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/contextlib.py", line 135 in __enter__ 
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backen
ds/netCDF4_.py", line 455 in _acquire 
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backen
ds/netCDF4_.py", line 461 in ds 
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backen
ds/netCDF4_.py", line 393 in __init__ 
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backen
ds/netCDF4_.py", line 452 in open 
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backen
ds/netCDF4_.py", line 666 in open_dataset 
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backen
ds/api.py", line 679 in open_dataset 
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/intake_esm/so
urce.py", line 77 in _open_dataset 
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/utils.py
", line 77 in apply 
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/_task_sp
ec.py", line 745 in __call__ 
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py
", line 229 in execute_task 
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py
", line 243 in <listcomp> 
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py
", line 243 in batch_execute_tasks 
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 58 in run 
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker 
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 953 in run     
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 1016 in _bootstrap_inner 
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 973 in _bootstrap 

Catalog JSON I am using:

{                                                                                              
  "esmcat_version": "0.0.1",                                                                   
  "attributes": [                                                                              
    {                                                                                          
      "column_name": "activity_id",                                                            
      "vocabulary": "",                                                                        
      "required": false                                                                        
    },                                                                                         
    {                                                                                          
      "column_name": "institution_id",                                                         
      "vocabulary": "",                                                                        
      "required": false                                                                        
    },                                                                                         
    {                                                                                          
      "column_name": "source_id",                                                              
      "vocabulary": "",                                                                        
      "required": false                                                                        
    },                                                                                         
    {                                                                                          
      "column_name": "experiment_id",                                                          
      "vocabulary": "",                                                                        
      "required": true                                                                         
    },                                                                                         
    {                                                                                          
      "column_name": "frequency",                                                              
      "vocabulary": "https://raw.githubusercontent.com/NOAA-GFDL/CMIP6_CVs/master/CMIP6_frequency.json",
      "required": true                                                                         
    },                                                                                         
    {                                                                                          
      "column_name": "realm",                                                                  
      "vocabulary": "",                                                                        
      "required": true                                                                         
    },                                                                                         
    {                                                                                          
      "column_name": "table_id",                                                               
      "vocabulary": "",                                                                        
      "required": false                                                                        
    },                                                                                         
    {                                                                                          
      "column_name": "member_id",                                                              
      "vocabulary": "",                                                                        
      "required": false                                                                        
    },                                                                                         
    {                                                                                          
      "column_name": "grid_label",                                                             
      "vocabulary": "",                                                                        
      "required": false                                                                        
    },                                                                                         
    {                                                                                          
      "column_name": "variable_id",                                                            
      "vocabulary": "",                                                                     
      "required": true                                                                      
    },                                                                                      
    {                                                                                       
      "column_name": "time_range",                                                          
      "vocabulary": "",                                                                     
      "required": true                                                                      
    },                                                                                      
    {                                                                                       
      "column_name": "chunk_freq",                                                          
      "required": false                                                                     
    },                                                                                      
    {                                                                                       
      "column_name": "platform",                                                            
      "vocabulary": "",                                                                     
      "required": false                                                                     
    },                                                                                      
    {                                                                                       
      "column_name": "target",                                                              
      "vocabulary": "",                                                                     
      "required": false                                                                     
    },                                                                                      
    {                                                                                       
      "column_name": "cell_methods",                                                        
      "vocabulary": "",                                                                     
      "required": "enhanced"                                                                
    },                                                                                      
    {                                                                                       
      "column_name": "path",                                                                
      "vocabulary": "",                                                                     
      "required": true                                                                      
    },                                                                                      
    {                                                                                       
      "column_name": "dimensions",                                                          
      "vocabulary": "",                                                                     
      "required": "enhanced"                                                                
    },                                                                                      
    {                                                                                       
      "column_name": "version_id",                                                          
      "vocabulary": "",                                                                     
      "required": false                                                                     
    },                                                                                      
    {                                                                                       
      "column_name": "standard_name",                                                       
      "vocabulary": "",                                                                     
      "required": "enhanced"                                                                
    }                                                                                       
  ],                                                                                        
  "assets": {                                                                                  
    "column_name": "path",                                                                     
    "format": "netcdf",                                                                        
    "format_column_name": null                                                                 
  },                                                                                           
  "aggregation_control": {                                                                     
    "variable_column_name": "variable_id",                                                     
    "groupby_attrs": [                                                                         
      "source_id",                                                                             
      "experiment_id",                                                                         
      "frequency",                                                                             
      "table_id",                                                                              
      "grid_label",                                                                            
      "realm",                                                                                 
      "member_id",                                                                             
      "chunk_freq"                                                                             
    ],                                                                                         
    "aggregations": [                                                                          
      {                                                                                        
        "type": "union",                                                                       
        "attribute_name": "variable_id",                                                       
        "options": {}                                                                          
      },                                                                                       
      {                                                                                        
        "type": "join_existing",                                                               
        "attribute_name": "time_range",                                                        
        "options": {                                                                           
          "dim": "time",                                                                       
          "coords": "minimal",                                                                 
          "compat": "override"                                                                 
        }                                                                                      
      }                                                                                        
    ]                                                                                          
  },                                                                                           
  "id": "esm_catalog_ESM4",                                                                    
  "description": null,                                                                         
  "title": null,                                                                               
  "last_updated": "2023-05-07T16:35:52Z",                                                      
  "catalog_file": "/local2/home/cleanup-analysis-scripts/user-analysis-scripts/freanalysis_radiation/tests/mycatalog.csv"
}

Versions of the software that I have installed:

>>> import intake_esm; intake_esm.show_versions()

INSTALLED VERSIONS
------------------

cftime: 1.6.4.post1
dask: 2024.12.1
fastprogress: 1.0.3
fsspec: 2024.10.0
gcsfs: None
intake: 0.7.0
intake_esm: 2024.2.6
netCDF4: 1.7.2
pandas: 2.2.3
requests: 2.32.3
s3fs: None
xarray: 2025.1.1
zarr: 2.18.3

Attempted solutions - What I Tried

By playing around with the source code, I have found that removing the @dask.delayed decorator and
corresponding compute calls from here gets rid of this segmentation fault (but makes the code run more slowly).
Is there a better solution?

@aradhakrishnanGFDL
Copy link

Wonder if this is related to
pydata/xarray#9779

I don’t exactly recall if I got the same errors but the resolution was to update my conda environment, though this needs further testing.

cftime: 1.6.4
dask: 2024.8.1
fastprogress: 1.0.3
fsspec: 2024.6.1
gcsfs: None
intake: 0.7.0
intake_esm: 2024.2.6
netCDF4: 1.7.1
pandas: 2.2.2
requests: 2.32.3
s3fs: None
xarray: 2024.7.0
zarr: 2.18.2

@charles-turner-1
Copy link
Collaborator

@menzel-gfdl Can you try pip install xarray[complete] and see if that resolves the issue?

I vaguely recall having the same issue a while ago when I first started playing with intake-esm & I think that resolved it for me and @marc-white.

If it does, that should confirm that it's a dependency issue, & we can start looking into how to resolve that.

@menzel-gfdl
Copy link
Author

menzel-gfdl commented Jan 21, 2025

@charles-turner-1 @marc-white , thank you. I just tried pip install xarray[complete], but still get a segmentation fault.

(env) $ python3 test_freanalysis_radiation.py 
Fatal Python error: Segmentation fault

Thread 0x00007f1af97fa700 (most recent call first):
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 217 in _acquire_with_cache_info
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 199 in acquire_context
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/contextlib.py", line 135 in __enter__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 455 in _acquire
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 461 in ds
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 393 in __init__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 452 in open
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 666 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/api.py", line 679 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/intake_esm/source.py", line 77 in _open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/utils.py", line 77 in apply
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/_task_spec.py", line 745 in __call__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 229 in execute_task
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in <listcomp>
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in batch_execute_tasks
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 953 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007f1b817fa700 (most recent call first):
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 217 in _acquire_with_cache_info
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 199 in acquire_context
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/contextlib.py", line 135 in __enter__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 455 in _acquire
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 461 in ds
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 393 in __init__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 452 in open
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 666 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/api.py", line 679 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/intake_esm/source.py", line 77 in _open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/utils.py", line 77 in apply
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/_task_spec.py", line 745 in __call__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 229 in execute_task
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in <listcomp>
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in batch_execute_tasks
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 953 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007f1b81ffb700 (most recent call first):
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 217 in _acquire_with_cache_info
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 199 in acquire_context
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/contextlib.py", line 135 in __enter__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 455 in _acquire
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 461 in ds
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 393 in __init__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 452 in open
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 666 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/api.py", line 679 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/intake_esm/source.py", line 77 in _open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/utils.py", line 77 in apply
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/_task_spec.py", line 745 in __call__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 229 in execute_task
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in <listcomp>
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in batch_execute_tasks
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 953 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007f1b82ffd700 (most recent call first):
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 217 in _acquire_with_cache_info
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 199 in acquire_context
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/contextlib.py", line 135 in __enter__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 455 in _acquire
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 461 in ds
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 393 in __init__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 452 in open
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 666 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/api.py", line 679 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/intake_esm/source.py", line 77 in _open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/utils.py", line 77 in apply
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/_task_spec.py", line 745 in __call__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 229 in execute_task
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in <listcomp>
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in batch_execute_tasks
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 953 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007f1b837fe700 (most recent call first):
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 217 in _acquire_with_cache_info
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 199 in acquire_context
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/contextlib.py", line 135 in __enter__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 455 in _acquire
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 461 in ds
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 393 in __init__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 452 in open
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 666 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/api.py", line 679 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/intake_esm/source.py", line 77 in _open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/utils.py", line 77 in apply
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/_task_spec.py", line 745 in __call__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 229 in execute_task
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in <listcomp>
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in batch_execute_tasks
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 953 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007f1b83fff700 (most recent call first):
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 217 in _acquire_with_cache_info
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 199 in acquire_context
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/contextlib.py", line 135 in __enter__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 455 in _acquire
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 461 in ds
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 393 in __init__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 452 in open
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 666 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/api.py", line 679 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/intake_esm/source.py", line 77 in _open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/utils.py", line 77 in apply
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/_task_spec.py", line 745 in __call__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 229 in execute_task
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in <listcomp>
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in batch_execute_tasks
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 953 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007f1b98981700 (most recent call first):
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 217 in _acquire_with_cache_info
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 199 in acquire_context
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/contextlib.py", line 135 in __enter__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 455 in _acquire
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 461 in ds
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 393 in __init__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 452 in open
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 666 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/xarray/backends/api.py", line 679 in open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/intake_esm/source.py", line 77 in _open_dataset
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/utils.py", line 77 in apply
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/_task_spec.py", line 745 in __call__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 229 in execute_task
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in <listcomp>
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/dask/local.py", line 243 in batch_execute_tasks
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 58 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/concurrent/futures/thread.py", line 83 in _worker
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 953 in run
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/app/conda/miniforge/envs/python-310/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007f1b99182700 (most recent call first):
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/fsspec/implementations/local.py", line 364 in _open
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/fsspec/implementations/local.py", line 359 in __init__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/fsspec/implementations/local.py", line 195 in _open
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/fsspec/spec.py", line 1301 in open
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/fsspec/core.py", line 105 in __enter__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/fsspec/core.py", line 190 in <listcomp>
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/fsspec/core.py", line 190 in __enter__
  File "/local2/home/cleanup-analysis-scripts/core/analysis_scripts/env/lib/python3.10/site-packages/fsspec/core.pySegmentation fault (core dumped)
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import intake_esm; intake_esm.show_versions()

INSTALLED VERSIONS
------------------

cftime: 1.6.4.post1
dask: 2024.12.1
fastprogress: 1.0.3
fsspec: 2024.10.0
gcsfs: None
intake: 0.7.0
intake_esm: 2024.2.6
netCDF4: 1.7.2
pandas: 2.2.3
requests: 2.32.3
s3fs: None
xarray: 2025.1.1
zarr: 2.18.3

@charles-turner-1
Copy link
Collaborator

No worries. I'll try and reproduce the error and see what I can find out about the cause.

@menzel-gfdl
Copy link
Author

Thank you. Removing dask from the _open_dataset call does seem to get rid of this issue. It seems like some race condition on the file locks that the OS doesn't like. I think that xarray issue linked above is seeing this same problem.

@charles-turner-1
Copy link
Collaborator

Okay, so I can reproduce a segfault running the tests in a fresh conda environment where I haven't installed from ci/environment.yml: ie.

$ conda create --name intake-esm-seg python=3.11 && conda activate intake-esm-seg
(intake-esm-seg) $ pip install intake-esm pytest pytest-cov
(intake-esm-seg) $ pytest
...
tests/test_core.py::test_to_dataset_dict[/Users/u1166368/catalog/intake-esm/tests/sample-catalogs/cmip6-netcdf.json-query1-xarray_open_kwargs1] Fatal Python error: Segmentation fault

Rerunning the tests inside the ci environment gets rid of the segfault:

$ conda env create -f ci/environment.yml --name intake-esm && conda activate intake-esm
(intake-esm) $ pytest
...
Coverage XML written to file coverage.xml


Results (178.29s (0:02:58)):
     152 passed
       2 xfailed

I'm currently working though bisecting the dependencies in order to work out exactly where this issue arises, but it doesn't seem to be anything obvious.

In the meantime, can you try installing the dependencies in ci/environment.yml and let me know if that resolves the error for you? It'll also let me pinpoint if I'm getting a segfault for the same reasons as you.

@menzel-gfdl
Copy link
Author

It is much more rare, but with the CI conda environment I still see an occasional failure:

(/local2/home/conda/env/test-intake-esm-dev) rlm:/local2/home/cleanup-analysis-scripts/user-analysis-scripts/freanalysis_radiation/tests> python3 ray1.py

--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:02<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
 |█████████████████████████████████████████████| 100.00% [1/1 00:01<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id.experiment_id.frequency.realm.chunk_freq'
Traceback (most recent call last):-------------| 0.00% [0/1 00:00<?]
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/intake_esm/source.py", line 259, in _open_dataset
    self._ds = xr.combine_by_coords(datasets, **self.xarray_combine_by_coords_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/combine.py", line 973, in combine_by_coords
    concatenated_grouped_by_data_vars = tuple(
                                        ^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/combine.py", line 974, in <genexpr>
    _combine_single_variable_hypercube(
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/combine.py", line 645, in _combine_single_variable_hypercube
    concatenated = _combine_nd(
                   ^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/combine.py", line 247, in _combine_nd
    combined_ids = _combine_all_along_first_dim(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/combine.py", line 282, in _combine_all_along_first_dim
    new_combined_ids[new_id] = _combine_1d(
                               ^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/combine.py", line 305, in _combine_1d
    combined = concat(
               ^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/concat.py", line 277, in concat
    return _dataset_concat(
           ^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/concat.py", line 545, in _dataset_concat
    concat_over, equals, concat_dim_lengths = _calc_concat_over(
                                              ^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/concat.py", line 440, in _calc_concat_over
    process_subset_opt(coords, "coords")
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/concat.py", line 396, in process_subset_opt
    v_rhs = ds_rhs.variables[k].compute()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/variable.py", line 1028, in compute
    return new.load(**kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/variable.py", line 1006, in load
    self._data = to_duck_array(self._data, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/namedarray/pycompat.py", line 130, in to_duck_array
    loaded_data, *_ = chunkmanager.compute(data, **kwargs)  # type: ignore[var-annotated]
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py", line 85, in compute
    return compute(*data, **kwargs)  # type: ignore[no-untyped-call, no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/dask/base.py", line 662, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/indexing.py", line 573, in __array__
    return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/indexing.py", line 578, in get_duck_array
    return self.array.get_duck_array()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/indexing.py", line 789, in get_duck_array
    return self.array.get_duck_array()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/indexing.py", line 652, in get_duck_array
    array = self.array[self.key]
            ~~~~~~~~~~^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/backends/netCDF4_.py", line 103, in __getitem__
    return indexing.explicit_indexing_adapter(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/core/indexing.py", line 1013, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/xarray/backends/netCDF4_.py", line 116, in _getitem
    array = getitem(original_array, key)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/netCDF4/_netCDF4.pyx", line 5055, in netCDF4._netCDF4.Variable.__getitem__
  File "src/netCDF4/_netCDF4.pyx", line 4628, in netCDF4._netCDF4.Variable.shape.__get__
  File "src/netCDF4/_netCDF4.pyx", line 4580, in netCDF4._netCDF4.Variable._getdims
  File "src/netCDF4/_netCDF4.pyx", line 2193, in netCDF4._netCDF4._inq_vardimid
  File "src/netCDF4/_netCDF4.pyx", line 2185, in netCDF4._netCDF4._inq_varndims
  File "src/netCDF4/_netCDF4.pyx", line 2164, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Not a valid ID

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/local2/home/cleanup-analysis-scripts/user-analysis-scripts/freanalysis_radiation/tests/ray1.py", line 49, in <module>
    d_ = catalog.search(**query_params).to_dataset_dict(progressbar=True)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py", line 38, in wrapper_function
    return wrapper(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py", line 111, in __call__
    res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/intake_esm/core.py", line 686, in to_dataset_dict
    raise exc
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/intake_esm/core.py", line 682, in to_dataset_dict
    key, ds = task.result()
              ^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/intake_esm/core.py", line 833, in _load_source
    return key, source.to_dask()
                ^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/intake_esm/source.py", line 272, in to_dask
    self._load_metadata()
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/intake/source/base.py", line 283, in _load_metadata
    self._schema = self._get_schema()
                   ^^^^^^^^^^^^^^^^^^
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/intake_esm/source.py", line 208, in _get_schema
    self._open_dataset()
  File "/local2/home/conda/env/test-intake-esm-dev/lib/python3.12/site-packages/intake_esm/source.py", line 264, in _open_dataset
    raise ESMDataSourceError(
intake_esm.source.ESMDataSourceError: Failed to load dataset with key='am5.c96L65_am5f8d0r0_amip.mon.atmos.1yr'
                 You can use `cat['am5.c96L65_am5f8d0r0_amip.mon.atmos.1yr'].df` to inspect the assets/files for this key.
                 
(/local2/home/conda/env/test-intake-esm-dev) rlm:/local2/home/cleanup-analysis-scripts/user-analysis-scripts/freanalysis_radiation/tests>

@charles-turner-1
Copy link
Collaborator

Okay cool, that's still erroring out but it's not a segfault.

It looks like something dependency related, so I'll see if I can work out what the missing dependency causing the issue is.

There are a couple of things I can think of to try now:

  1. Take a look at your dask configuration - we've had similar (although I'm not convinced the same) issues arising from the number of dask workers - see here.
  2. Can you take a look at the netCDF files listed by cat['am5.c96L65_am5f8d0r0_amip.mon.atmos.1yr'].df (the suggestion in the ESMDataSourceError) for me? If your dask configuration isn't causing the issue, it's possible you have a corrupted netCDF file that you're trying to open. A script that iterates through all the netCDF files in cat['am5.c96L65_am5f8d0r0_amip.mon.atmos.1yr'].df and opens/closes them in turn should identify if this is the case.

@menzel-gfdl
Copy link
Author

Thank you:

1.) It looks like intake_esm tries to use as many worker threads as possible? On my machine, that number is 8:

(/local2/home/conda/env/test-intake-esm-dev) rlm:/local2/home/cleanup-analysis-scripts/user-analysis-scripts/freanalysis_radiation/tests> python3
Python 3.12.8 | packaged by conda-forge | (main, Dec  5 2024, 14:24:40) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dask
>>> dask.system.CPU_COUNT
8

I agree with the link that you provided - netCDF is not thread-safe, and it appears the netCDF backend in xarray contains a race condition when used with dask. Although the above error is not a segmentation fault, I think it is still evidence of this same race condition.

2.) Running this script on my catalog does not crash:

from netCDF4 import Dataset             


def main():
    with open("mycatalog.csv") as catalog:
        test_values = {
            "source_id": "am5",                                          
            "experiment_id": "c96L65_am5f8d0r0_amip",                    
            "frequency": "mon",                                          
            "realm": "atmos",                                            
            "chunk_freq": "1yr"                                          
        }                                                                
        column_names = [x.strip() for x in catalog.readline().split(",")]
        indices = {x: column_names.index(x) for x in test_values.keys()} 
        for line in catalog:                                             
            parameters = [x.strip() for x in line.split(",")]            
            for key, value in test_values.items():                       
                if value != parameters[indices[key]]:                    
                    break                                                
            else:                                                        
                path = parameters[-1]                                    
                print(f"Found dataset {path}.")                          
                with Dataset(path):                                      
                    pass                                                 
                                                                         
                                                                         
if __name__ == "__main__":                                               
    main()                                                               

Likewise, when I comment out the use of dask, the scripts run successfully. So I am pretty sure there are no corrupt files.

@charles-turner-1
Copy link
Collaborator

Okay, I've done a bit more digging, and this appears to be some sort of version incompatibility coming out of pip's dependency resolver when I try to reproduce the bug.

Can you let me know how you installed intake-esm? Was it pip or conda?

If I run (in a fresh environment)

$ conda create --name intake-esm-pip python=3.11 && conda activate intake-esm-pip
(intake-esm-pip) $ pip install intake-esm pytest pytest-cov
(intake-esm-pip) $ pytest tests
...
tests/test_core.py::test_to_dataset_dict[/Users/u1166368/catalog/intake-esm/tests/sample-catalogs/cmip6-netcdf.json-query1-xarray_open_kwargs1] Fatal Python error: Segmentation fault

as before. However, installing with conda:

$ conda create --name intake-esm-conda python=3.11 && conda activate intake-esm-conda
(intake-esm-conda) $ conda install intake-esm 
(intake-esm-conda) $ pip install pytest pytest-cov
(intake-esm-conda) $ pytest tests
...
================================================== short test summary info ==================================================
FAILED tests/test_cat.py::test_esmcatmodel_load[https://raw.githubusercontent.com/NCAR/cesm-lens-aws/master/intake-catalogs/aws-cesm1-le.json] - ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
FAILED tests/test_cat.py::test_esmcatmodel_load[https://storage.googleapis.com/cmip6/pangeo-cmip6.json] - ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
FAILED tests/test_core.py::test_catalog_init[https://raw.githubusercontent.com/NCAR/cesm-lens-aws/master/intake-catalogs/aws-cesm1-le.json-.-None-None] - ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
FAILED tests/test_core.py::test_catalog_init[https://storage.googleapis.com/cmip6/pangeo-cmip6.json-*-None-None] - ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
FAILED tests/test_core.py::test_catalog_with_registry_search - ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
FAILED tests/test_core.py::test_to_dataset_dict[https://storage.googleapis.com/cmip6/pangeo-cmip6.json-query0-xarray_open_kwargs0] - ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
FAILED tests/test_core.py::test_to_dataset_dict[/Users/u1166368/catalog/intake-esm/tests/sample-catalogs/cmip6-bcc-mixed-formats.json-query3-xarray_open_kwargs3] - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='CMIP.BCC.BCC-ESM1.piControl.Amon.gn'
FAILED tests/test_core.py::test_to_datatree[https://storage.googleapis.com/cmip6/pangeo-cmip6.json-query0-xarray_open_kwargs0] - ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
FAILED tests/test_core.py::test_to_datatree_levels - ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
FAILED tests/test_core.py::test_to_dask[https://storage.googleapis.com/cmip6/pangeo-cmip6.json-query0-xarray_open_kwargs0] - ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
FAILED tests/test_core.py::test_to_dataset_dict_with_registry - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='atm.20C.daily'
FAILED tests/test_core.py::test_to_dask_opendap - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='2005001.482'
FAILED tests/test_core.py::test_options - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='atm.20C.daily'
FAILED tests/test_derived.py::test_registry_derive_variables - ImportError: tutorial.open_dataset depends on pooch to download and manage datasets. To proceed please install pooch.
FAILED tests/test_derived.py::test_registry_derive_variables_error - ImportError: tutorial.open_dataset depends on pooch to download and manage datasets. To proceed please install pooch.
FAILED tests/test_source.py::test_open_dataset[tar://tasmax_Amon_HadGEM2-AO_rcp85_r1i1p1_200511-200512.nc::/var/folders/qg/vhg6q9gn4zl01xbg1mfw6jzhxzf1wt/T/tmpv3e42sct/test.tar-2-scipy] - ValueError: unrecognized engine 'scipy' must be one of your download engines: ['netcdf4', 'store', 'zarr']. To install a...
FAILED tests/test_source.py::test_open_dataset_kerchunk - ImportError: Install s3fs to access S3
FAILED tests/test_tutorial.py::test_open_from_url[aws_cesm2_le-https://raw.githubusercontent.com/intake/intake-esm/main/tutorial-catalogs/AWS-CESM2-LENS.json] - ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
FAILED tests/test_tutorial.py::test_open_from_url[aws_cmip6-https://raw.githubusercontent.com/intake/intake-esm/main/tutorial-catalogs/AWS-CMIP6.json] - ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
FAILED tests/test_tutorial.py::test_open_from_url[google_cmip6-https://raw.githubusercontent.com/intake/intake-esm/main/tutorial-catalogs/GOOGLE-CMIP6.json] - ImportError: HTTPFileSystem requires "requests" and "aiohttp" to be installed
================================== 20 failed, 132 passed, 2 xfailed, 96 warnings in 18.11s

I've left the failing tests due to uninstalled other dependencies to keep the environment diff as limited as possible - this would be easy to fix.

Looking at the environment diffs:

(intake-esm-conda) $ conda list > env_conda.txt
(intake-esm-conda) $ conda deactivate && conda activate intake-esm-pip
(intake-esm-pip) $ conda list > env_pip.txt 

If we then compare the environments there are substantial differences:

$ diff env_conda.txt env_pip.txt
1c1
< # packages in environment at /Users/u1166368/miniforge3/envs/intake-esm-conda:
---
> # packages in environment at /Users/u1166368/miniforge3/envs/intake-esm-pip:
4,26c4,6
< annotated-types           0.7.0              pyhd8ed1ab_1    conda-forge
< appdirs                   1.4.4              pyhd8ed1ab_1    conda-forge
< aws-c-auth                0.8.1                hfc2798a_0    conda-forge
< aws-c-cal                 0.8.1                hc8a0bd2_3    conda-forge
< aws-c-common              0.10.6               h5505292_0    conda-forge
< aws-c-compression         0.3.0                hc8a0bd2_5    conda-forge
< aws-c-event-stream        0.5.0               h54f970a_11    conda-forge
< aws-c-http                0.9.2                h96aa502_4    conda-forge
< aws-c-io                  0.15.3               haba67d1_6    conda-forge
< aws-c-mqtt                0.11.0              h24f418c_12    conda-forge
< aws-c-s3                  0.7.9                hf37e03c_1    conda-forge
< aws-c-sdkutils            0.2.2                hc8a0bd2_0    conda-forge
< aws-checksums             0.2.2                hc8a0bd2_4    conda-forge
< aws-crt-cpp               0.29.9               ha81f72f_2    conda-forge
< aws-sdk-cpp               1.11.489             h0e5014b_0    conda-forge
< azure-core-cpp            1.14.0               hd50102c_0    conda-forge
< azure-identity-cpp        1.10.0               hc602bab_0    conda-forge
< azure-storage-blobs-cpp   12.13.0              h7585a09_1    conda-forge
< azure-storage-common-cpp  12.8.0               h9ca1f76_1    conda-forge
< azure-storage-files-datalake-cpp 12.12.0              hcdd55da_1    conda-forge
< blosc                     1.21.6               h7dd00d9_1    conda-forge
< bokeh                     3.6.2              pyhd8ed1ab_1    conda-forge
< brotli-python             1.1.0           py311h3f08180_2    conda-forge
---
> annotated-types           0.7.0                    pypi_0    pypi
> appdirs                   1.4.4                    pypi_0    pypi
> bokeh                     3.6.2                    pypi_0    pypi
28d7
< c-ares                    1.34.4               h5505292_0    conda-forge
30,36c9,14
< certifi                   2024.12.14         pyhd8ed1ab_0    conda-forge
< cffi                      1.17.1          py311h3a79f62_0    conda-forge
< cftime                    1.6.4           py311h0f07fe1_1    conda-forge
< charset-normalizer        3.4.1              pyhd8ed1ab_0    conda-forge
< click                     8.1.8              pyh707e725_0    conda-forge
< cloudpickle               3.1.1              pyhd8ed1ab_0    conda-forge
< contourpy                 1.3.1           py311h210dab8_0    conda-forge
---
> certifi                   2024.12.14               pypi_0    pypi
> cftime                    1.6.4.post1              pypi_0    pypi
> charset-normalizer        3.4.1                    pypi_0    pypi
> click                     8.1.8                    pypi_0    pypi
> cloudpickle               3.1.1                    pypi_0    pypi
> contourpy                 1.3.1                    pypi_0    pypi
38,58c16,25
< crc32c                    2.7.1           py311h917b07b_0    conda-forge
< cytoolz                   1.0.1           py311h917b07b_0    conda-forge
< dask                      2025.1.0           pyhd8ed1ab_0    conda-forge
< dask-core                 2025.1.0           pyhd8ed1ab_0    conda-forge
< deprecated                1.2.18             pyhd8ed1ab_0    conda-forge
< distributed               2025.1.0           pyhd8ed1ab_0    conda-forge
< donfig                    0.8.1.post1        pyhd8ed1ab_1    conda-forge
< entrypoints               0.4                pyhd8ed1ab_1    conda-forge
< fastprogress              1.0.3              pyhd8ed1ab_1    conda-forge
< freetype                  2.12.1               hadb7bae_2    conda-forge
< fsspec                    2024.12.0          pyhd8ed1ab_0    conda-forge
< gflags                    2.2.2             hf9b8971_1005    conda-forge
< glog                      0.7.1                heb240a5_0    conda-forge
< h2                        4.1.0              pyhd8ed1ab_1    conda-forge
< hdf4                      4.2.15               h2ee6834_7    conda-forge
< hdf5                      1.14.4          nompi_ha698983_105    conda-forge
< hpack                     4.1.0              pyhd8ed1ab_0    conda-forge
< hyperframe                6.1.0              pyhd8ed1ab_0    conda-forge
< icu                       75.1                 hfee45f7_0    conda-forge
< idna                      3.10               pyhd8ed1ab_1    conda-forge
< importlib-metadata        8.6.1              pyha770c72_0    conda-forge
---
> crc32c                    2.7.1                    pypi_0    pypi
> dask                      2025.1.0                 pypi_0    pypi
> deprecated                1.2.18                   pypi_0    pypi
> distributed               2025.1.0                 pypi_0    pypi
> donfig                    0.8.1.post1              pypi_0    pypi
> entrypoints               0.4                      pypi_0    pypi
> fastprogress              1.0.3                    pypi_0    pypi
> fsspec                    2024.12.0                pypi_0    pypi
> idna                      3.10                     pypi_0    pypi
> importlib-metadata        8.6.1                    pypi_0    pypi
60,83c27,29
< intake                    0.7.0              pyhd8ed1ab_0    conda-forge
< intake-esm                2024.2.6           pyhd8ed1ab_2    conda-forge
< jinja2                    3.1.5              pyhd8ed1ab_0    conda-forge
< krb5                      1.21.3               h237132a_0    conda-forge
< lcms2                     2.16                 ha0e7c42_0    conda-forge
< lerc                      4.0.0                h9a09cb3_0    conda-forge
< libabseil                 20240722.0      cxx17_h07bc746_4    conda-forge
< libaec                    1.1.3                hebf3989_0    conda-forge
< libarrow                  19.0.0           h819e3af_8_cpu    conda-forge
< libarrow-acero            19.0.0           hf07054f_8_cpu    conda-forge
< libarrow-dataset          19.0.0           hf07054f_8_cpu    conda-forge
< libarrow-substrait        19.0.0           h4239455_8_cpu    conda-forge
< libblas                   3.9.0           27_h10e41b3_openblas    conda-forge
< libbrotlicommon           1.1.0                hd74edd7_2    conda-forge
< libbrotlidec              1.1.0                hd74edd7_2    conda-forge
< libbrotlienc              1.1.0                hd74edd7_2    conda-forge
< libcblas                  3.9.0           27_hd702729_openblas    conda-forge
< libcrc32c                 1.1.2                hbdafb3b_0    conda-forge
< libcurl                   8.11.1               h73640d1_0    conda-forge
< libcxx                    19.1.7               ha82da77_0    conda-forge
< libdeflate                1.23                 hec38601_0    conda-forge
< libedit                   3.1.20240808    pl5321hafb1f1b_0    conda-forge
< libev                     4.33                 h93a5062_2    conda-forge
< libevent                  2.1.12               h2757513_1    conda-forge
---
> intake                    0.7.0                    pypi_0    pypi
> intake-esm                2024.2.6                 pypi_0    pypi
> jinja2                    3.1.5                    pypi_0    pypi
86,93d31
< libgfortran               5.0.0           13_2_0_hd922786_3    conda-forge
< libgfortran5              13.2.0               hf226fd6_3    conda-forge
< libgoogle-cloud           2.34.0               hdbe95d5_0    conda-forge
< libgoogle-cloud-storage   2.34.0               h7081f7f_0    conda-forge
< libgrpc                   1.67.1               h0a426d6_1    conda-forge
< libiconv                  1.17                 h0d3ecfb_2    conda-forge
< libjpeg-turbo             3.0.0                hb547adb_1    conda-forge
< liblapack                 3.9.0           27_hc9a63f6_openblas    conda-forge
95,103d32
< libnetcdf                 4.9.2           nompi_h6569565_116    conda-forge
< libnghttp2                1.64.0               h6d7220d_0    conda-forge
< libopenblas               0.3.28          openmp_hf332438_1    conda-forge
< libopentelemetry-cpp      1.18.0               h0c05b2d_1    conda-forge
< libopentelemetry-cpp-headers 1.18.0               hce30654_1    conda-forge
< libparquet                19.0.0           h636d7b7_8_cpu    conda-forge
< libpng                    1.6.46               h3783ad8_0    conda-forge
< libprotobuf               5.28.3               h3bd63a1_1    conda-forge
< libre2-11                 2024.07.02           h07bc746_2    conda-forge
105,112d33
< libssh2                   1.11.1               h9cc3647_0    conda-forge
< libthrift                 0.21.0               h64651cc_0    conda-forge
< libtiff                   4.7.0                h551f018_3    conda-forge
< libutf8proc               2.10.0               hda25de7_0    conda-forge
< libwebp-base              1.5.0                h2471fea_0    conda-forge
< libxcb                    1.17.0               hdb1d25a_0    conda-forge
< libxml2                   2.13.5               h178c5d8_1    conda-forge
< libzip                    1.11.2               h1336266_0    conda-forge
114,119c35,38
< llvm-openmp               19.1.7               hdb05f8b_0    conda-forge
< locket                    1.0.0              pyhd8ed1ab_0    conda-forge
< lz4                       4.3.3           py311h3a49619_2    conda-forge
< lz4-c                     1.10.0               h286801f_1    conda-forge
< markupsafe                3.0.2           py311h4921393_1    conda-forge
< msgpack-python            1.1.0           py311h2c37856_0    conda-forge
---
> locket                    1.0.0                    pypi_0    pypi
> lz4                       4.4.3                    pypi_0    pypi
> markupsafe                3.0.2                    pypi_0    pypi
> msgpack                   1.1.0                    pypi_0    pypi
121,125c40,42
< netcdf4                   1.7.2           nompi_py311ha5aeccf_101    conda-forge
< nlohmann_json             3.11.3               h00cdb27_1    conda-forge
< numcodecs                 0.15.0          py311hca32420_0    conda-forge
< numpy                     2.2.2           py311h762c074_0    conda-forge
< openjpeg                  2.5.3                h8a3d83b_0    conda-forge
---
> netcdf4                   1.7.2                    pypi_0    pypi
> numcodecs                 0.15.0                   pypi_0    pypi
> numpy                     2.2.2                    pypi_0    pypi
127,131c44,47
< orc                       2.0.3                h0ff2369_2    conda-forge
< packaging                 24.2               pyhd8ed1ab_2    conda-forge
< pandas                    2.2.3           py311h9cb3ce9_1    conda-forge
< partd                     1.4.2              pyhd8ed1ab_0    conda-forge
< pillow                    11.1.0          py311hb9ba9e9_0    conda-forge
---
> packaging                 24.2                     pypi_0    pypi
> pandas                    2.2.3                    pypi_0    pypi
> partd                     1.4.2                    pypi_0    pypi
> pillow                    11.1.0                   pypi_0    pypi
134,142c50,53
< prometheus-cpp            1.3.0                h0967b3e_0    conda-forge
< psutil                    6.1.1           py311h917b07b_0    conda-forge
< pthread-stubs             0.4               hd74edd7_1002    conda-forge
< pyarrow                   19.0.0          py311ha1ab1f8_0    conda-forge
< pyarrow-core              19.0.0          py311he04fa90_0_cpu    conda-forge
< pycparser                 2.22               pyh29332c3_1    conda-forge
< pydantic                  2.10.6             pyh3cfb1c2_0    conda-forge
< pydantic-core             2.27.2          py311h3ff9189_0    conda-forge
< pysocks                   1.7.1              pyha55dd90_7    conda-forge
---
> psutil                    6.1.1                    pypi_0    pypi
> pyarrow                   19.0.0                   pypi_0    pypi
> pydantic                  2.10.6                   pypi_0    pypi
> pydantic-core             2.27.2                   pypi_0    pypi
146,151c57,59
< python-dateutil           2.9.0.post0        pyhff2d567_1    conda-forge
< python-tzdata             2025.1             pyhd8ed1ab_0    conda-forge
< python_abi                3.11                    5_cp311    conda-forge
< pytz                      2024.1             pyhd8ed1ab_0    conda-forge
< pyyaml                    6.0.2           py311h4921393_2    conda-forge
< re2                       2024.07.02           h6589ca4_2    conda-forge
---
> python-dateutil           2.9.0.post0              pypi_0    pypi
> pytz                      2024.2                   pypi_0    pypi
> pyyaml                    6.0.2                    pypi_0    pypi
153c61
< requests                  2.32.3             pyhd8ed1ab_1    conda-forge
---
> requests                  2.32.3                   pypi_0    pypi
155,158c63,65
< six                       1.17.0             pyhd8ed1ab_0    conda-forge
< snappy                    1.2.1                h98b9ce2_1    conda-forge
< sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
< tblib                     3.0.0              pyhd8ed1ab_1    conda-forge
---
> six                       1.17.0                   pypi_0    pypi
> sortedcontainers          2.4.0                    pypi_0    pypi
> tblib                     3.0.0                    pypi_0    pypi
160,165c67,71
< toolz                     1.0.0              pyhd8ed1ab_1    conda-forge
< tornado                   6.4.2           py311h917b07b_0    conda-forge
< typing-extensions         4.12.2               hd8ed1ab_1    conda-forge
< typing_extensions         4.12.2             pyha770c72_1    conda-forge
< tzdata                    2025a                h78e105d_0    conda-forge
< urllib3                   2.3.0              pyhd8ed1ab_0    conda-forge
---
> toolz                     1.0.0                    pypi_0    pypi
> tornado                   6.4.2                    pypi_0    pypi
> typing-extensions         4.12.2                   pypi_0    pypi
> tzdata                    2025.1                   pypi_0    pypi
> urllib3                   2.3.0                    pypi_0    pypi
167,178c73,78
< wrapt                     1.17.2          py311h917b07b_0    conda-forge
< xarray                    2025.1.1           pyhd8ed1ab_0    conda-forge
< xorg-libxau               1.0.12               h5505292_0    conda-forge
< xorg-libxdmcp             1.1.5                hd74edd7_0    conda-forge
< xyzservices               2025.1.0           pyhd8ed1ab_0    conda-forge
< yaml                      0.2.5                h3422bc3_2    conda-forge
< zarr                      3.0.1              pyhd8ed1ab_0    conda-forge
< zict                      3.0.0              pyhd8ed1ab_1    conda-forge
< zipp                      3.21.0             pyhd8ed1ab_1    conda-forge
< zlib                      1.3.1                h8359307_2    conda-forge
< zstandard                 0.23.0          py311ha60cc69_1    conda-forge
< zstd                      1.5.6                hb46c0d2_0    conda-forge
---
> wrapt                     1.17.2                   pypi_0    pypi
> xarray                    2025.1.1                 pypi_0    pypi
> xyzservices               2025.1.0                 pypi_0    pypi
> zarr                      3.0.1                    pypi_0    pypi
> zict                      3.0.0                    pypi_0    pypi
> zipp                      3.21.0                   pypi_0    pypi

I haven't been through this list in detail to determine which dependency causes the issue.

Assuming you've installed intake-esm with pip, can you reinstall it with conda and then try to reproduce the segfault again?

@menzel-gfdl
Copy link
Author

@charles-turner-1 I installed intake-esm with pip. But I also tried with a conda environment, created from the environment.yml you suggested (see #697 (comment)). With pip, I get a segmentation fault regularly. With conda, I get intake_esm.source.ESMDataSourceError errors, but they are much rarer. Either way, it would be nice to find a solution that is robust.

Even with your conda install, it looks like you still get similar intake_esm.source.ESMDataSourceError errors?

FAILED tests/test_core.py::test_to_dataset_dict[/Users/u1166368/catalog/intake-esm/tests/sample-catalogs/cmip6-bcc-mixed-formats.json-query3-xarray_open_kwargs3] - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='CMIP.BCC.BCC-ESM1.piControl.Amon.gn'
FAILED tests/test_core.py::test_to_dataset_dict_with_registry - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='atm.20C.daily'
FAILED tests/test_core.py::test_to_dask_opendap - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='2005001.482'
FAILED tests/test_core.py::test_options - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='atm.20C.daily'

Am I right in thinking that these test should pass (even with your minimal conda environment)? Do these tests use netCDF datasets in their catalogs?

@charles-turner-1
Copy link
Collaborator

So, after installing the dependencies needed for the tests, I get the following:

(intake-esm-conda) $ conda install requests http pooch scipy s3fs
(intake-esm-conda) $ pytest tests
...
tests/test_core.py::test_to_dataset_dict[https://storage.googleapis.com/cmip6/pangeo-cmip6.json-query0-xarray_open_kwargs0] FAILED                                    [ 72/154]
...
FAILED tests/test_core.py::test_to_dataset_dict[https://storage.googleapis.com/cmip6/pangeo-cmip6.json-query0-xarray_open_kwargs0] - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='AerChemMIP.BCC.BCC-ESM1.ssp370.Amon.gn'
FAILED tests/test_core.py::test_to_dataset_dict[/Users/u1166368/catalog/intake-esm/tests/sample-catalogs/cmip6-bcc-mixed-formats.json-query3-xarray_open_kwargs3] - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='CMIP.BCC.BCC-ESM1.piControl.Amon.gn'
FAILED tests/test_core.py::test_to_datatree[https://storage.googleapis.com/cmip6/pangeo-cmip6.json-query0-xarray_open_kwargs0] - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='AerChemMIP/BCC/BCC-ESM1/ssp370/Amon/gn' 
FAILED tests/test_core.py::test_to_datatree_levels - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='BCC-ESM1'
FAILED tests/test_core.py::test_to_dask[https://storage.googleapis.com/cmip6/pangeo-cmip6.json-query0-xarray_open_kwargs0] - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='AerChemMIP.BCC.BCC-ESM1.ssp370.Amon.gn'
FAILED tests/test_core.py::test_to_dask_opendap - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='2005001.482'

Ignoring the tests that look to be cold start/cloud related (see #681 (comment)):

FAILED tests/test_core.py::test_to_dataset_dict[https://storage.googleapis.com/cmip6/pangeo-cmip6.json-query0-xarray_open_kwargs0] - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='AerChemMIP.BCC.BCC-ESM1.ssp370.Amon.gn'
^ Cloud service down/ cold start related
FAILED tests/test_core.py::test_to_dataset_dict[/Users/u1166368/catalog/intake-esm/tests/sample-catalogs/cmip6-bcc-mixed-formats.json-query3-xarray_open_kwargs3] - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='CMIP.BCC.BCC-ESM1.piControl.Amon.gn'
^ Attempt to open unavailable dataset from above
FAILED tests/test_core.py::test_to_datatree[https://storage.googleapis.com/cmip6/pangeo-cmip6.json-query0-xarray_open_kwargs0] - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='AerChemMIP/BCC/BCC-ESM1/ssp370/Amon/gn' 
^ Cloud service down/ cold start related
FAILED tests/test_core.py::test_to_datatree_levels - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='BCC-ESM1'
^ Attempt to open unavailable dataset from above
FAILED tests/test_core.py::test_to_dask[https://storage.googleapis.com/cmip6/pangeo-cmip6.json-query0-xarray_open_kwargs0] - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='AerChemMIP.BCC.BCC-ESM1.ssp370.Amon.gn'
^ Attempt to open unavailable dataset from above 
FAILED tests/test_core.py::test_to_dask_opendap - intake_esm.source.ESMDataSourceError: Failed to load dataset with key='2005001.482'

So of all the errors, save the bottom one (I haven't checked explicitly) are related to cloud storage issues. I'd hazard a guess that this final error is too, just not obviously.

Noting that your catalog file uses a remotely hosted schema

...
    {                                                                                          
      "column_name": "frequency",                                                              
      "vocabulary": "https://raw.githubusercontent.com/NOAA-GFDL/CMIP6_CVs/master/CMIP6_frequency.json",
      "required": true                                                                         
    }, 

can you try as a further troubleshooting step to download that schema locally, update the path to the schema in your catalog.json and see if that resolves the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants