Skip to content

Conversation

@wm-ytakano
Copy link
Contributor

@wm-ytakano wm-ytakano commented Jan 8, 2026

Earth2Studio Pull Request

Description

Background / Motivation

ECMWF has released AIFS-Single v1.1 (HF: ecmwf/aifs-single-1.1). Compared to v1.0, the checkpoint’s variable ordering differs, and the current implementation assumes a fixed ordering (including hard-coded positions for generated forcings). This can lead to incorrect feature alignment when loading v1.1.

Additionally, AIFS-Single v1.1 requires a newer anemoi stack than v1.0, so we likely need a separate optional dependency group to avoid upgrading dependencies for v1.0 users.

What this change proposes

  1. Add an optional dependency group aifs11 to install the anemoi stack needed for AIFS-Single v1.1.
  2. Support multiple AIFS-Single checkpoint versions in the AIFS class via an explicit version switch:
    • AIFS.load_default_package() -> v1.0
    • AIFS.load_default_package(version="1.1") -> v1.1
  3. Derive variable ordering from checkpoint metadata (ai-models.json, dataset.variables) instead of relying on a fixed VARIABLES ordering.
  4. Remove the hard-coded assumption that generated forcings live at indices 92..100; compute indices by variable name in the checkpoint ordering.

Implementation details (high level)

  • When loading the checkpoint, read metadata["dataset"]["variables"] from ai-models.json. If missing/unexpected, fall back to the existing VARIABLES list.
  • Keep two aligned variable lists:
    • ckpt_variables: raw checkpoint names in checkpoint ordering
    • variables: Earth2Studio-facing names derived from ckpt_variables via a mapping function (e.g., 10u -> u10m, q_50 -> q50, tp -> tp06)
  • Treat generated forcings as a named set (e.g., cos_latitude, sin_longitude, insolation) and compute their indices from ckpt_variables. This makes the feature insertion and removal robust to ordering differences.
  • Add a small dependency check for v1.1 (anemoi-inference>=..., anemoi-models>=...) that raises an actionable OptionalDependencyError pointing users to uv add earth2studio --extra aifs11.

API / UX concern: redundant version specification

Right now the explicit call can look redundant, e.g.:

model = AIFS.load_model(AIFS.load_default_package(version="1.1"), version="1.1")

Docs / Tests

  • Update install docs to mention aifs11 and provide pip/uv examples.
  • Add unit tests for:
    • checkpoint variable-name mapping to Earth2Studio IDs
    • load_default_package(version=...) behavior and invalid version handling

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.
  • Assess and address Greptile feedback (AI code review bot for guidance; use discretion, addressing all feedback is not required).

Dependencies

- Add optional dependency group "aifs11" for the newer anemoi stack
- Load checkpoint variable ordering from ai-models.json to handle v1.0 vs v1.1 differences
- Make generated forcing indices checkpoint-driven (remove hard-coded 92..100 assumptions)
- Add version switching to AIFS.load_default_package() and AIFS.load_model()
- Update install docs and add regression tests for variable-name mapping and version selection
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

adds AIFS-Single v1.1 support with checkpoint-driven variable ordering to handle differences between v1.0 and v1.1. key changes include:

  • new aifs11 optional dependency group with pinned anemoi versions (0.6.3/0.5.0)
  • AIFS.load_default_package(version="1.1") to load v1.1 checkpoints
  • variable ordering derived from checkpoint metadata (ai-models.json) instead of hardcoded
  • generated forcing indices computed dynamically by variable name rather than fixed positions (92..100)
  • dependency validation for v1.1 with actionable error messages

the implementation correctly refactors the rigid v1.0 assumptions into a flexible checkpoint-aware system. variable mapping (_ckpt_var_to_e2s) handles ECMWF shorthand (10u -> u10m, q_50 -> q50, etc.) and the dual variable lists (_ckpt_variables and _variables) maintain alignment.

Confidence Score: 3/5

  • safe to merge after fixing the dependency check bug for autodetected v1.1 loads
  • the refactoring is well-structured and removes hardcoded assumptions, but contains a critical bug where autodetected v1.1 checkpoints bypass dependency validation. this could cause cryptic errors for users without the correct anemoi stack. once fixed, the implementation is solid with good test coverage.
  • earth2studio/models/px/aifs.py - must add dependency check after checkpoint autodetection resolves to v1.1

Important Files Changed

File Analysis

Filename Score Overview
earth2studio/models/px/aifs.py 3/5 adds checkpoint-driven variable ordering and v1.1 support with version switching, but missing dependency check for autodetected v1.1 loads
pyproject.toml 5/5 adds aifs11 optional dependency group with correct version pins and conflict declarations
test/models/px/test_aifs.py 5/5 adds tests for checkpoint variable mapping and version switching behavior

Comment on lines +416 to +459
if version == "1.1":
cls._require_aifs11_optional_dependencies()

# Load model
model_path = package.resolve("aifs-single-mse-1.0.ckpt")
ckpt_candidates: list[tuple[str, str]] = []
if version == "1.0":
ckpt_candidates = [("aifs-single-mse-1.0.ckpt", "1.0")]
elif version == "1.1":
ckpt_candidates = [("aifs-single-mse-1.1.ckpt", "1.1")]
else:
# Best-effort autodetect based on package root and available ckpt name
if "aifs-single-1.0" in package.root:
ckpt_candidates = [
("aifs-single-mse-1.0.ckpt", "1.0"),
("aifs-single-mse-1.1.ckpt", "1.1"),
]
elif "aifs-single-1.1" in package.root:
ckpt_candidates = [
("aifs-single-mse-1.1.ckpt", "1.1"),
("aifs-single-mse-1.0.ckpt", "1.0"),
]
else:
ckpt_candidates = [
("aifs-single-mse-1.1.ckpt", "1.1"),
("aifs-single-mse-1.0.ckpt", "1.0"),
]

model_path = None
resolved_version = None
last_err: Exception | None = None
for ckpt_name, v in ckpt_candidates:
try:
model_path = package.resolve(ckpt_name)
resolved_version = v
break
except Exception as e: # pragma: no cover - depends on remote FS
last_err = e
continue

if model_path is None or resolved_version is None:
msg = "Could not resolve any known AIFS-Single checkpoint from package."
if version is not None:
msg += f" Requested version={version}."
raise FileNotFoundError(msg) from last_err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependency check missing for autodetected v1.1

when version=None and autodetection resolves to v1.1 (line 449: resolved_version = "1.1"), _require_aifs11_optional_dependencies() is never called. the check on line 416-417 only runs when the user explicitly passes version="1.1".

add dependency validation after checkpoint resolution:

Suggested change
if version == "1.1":
cls._require_aifs11_optional_dependencies()
# Load model
model_path = package.resolve("aifs-single-mse-1.0.ckpt")
ckpt_candidates: list[tuple[str, str]] = []
if version == "1.0":
ckpt_candidates = [("aifs-single-mse-1.0.ckpt", "1.0")]
elif version == "1.1":
ckpt_candidates = [("aifs-single-mse-1.1.ckpt", "1.1")]
else:
# Best-effort autodetect based on package root and available ckpt name
if "aifs-single-1.0" in package.root:
ckpt_candidates = [
("aifs-single-mse-1.0.ckpt", "1.0"),
("aifs-single-mse-1.1.ckpt", "1.1"),
]
elif "aifs-single-1.1" in package.root:
ckpt_candidates = [
("aifs-single-mse-1.1.ckpt", "1.1"),
("aifs-single-mse-1.0.ckpt", "1.0"),
]
else:
ckpt_candidates = [
("aifs-single-mse-1.1.ckpt", "1.1"),
("aifs-single-mse-1.0.ckpt", "1.0"),
]
model_path = None
resolved_version = None
last_err: Exception | None = None
for ckpt_name, v in ckpt_candidates:
try:
model_path = package.resolve(ckpt_name)
resolved_version = v
break
except Exception as e: # pragma: no cover - depends on remote FS
last_err = e
continue
if model_path is None or resolved_version is None:
msg = "Could not resolve any known AIFS-Single checkpoint from package."
if version is not None:
msg += f" Requested version={version}."
raise FileNotFoundError(msg) from last_err
if model_path is None or resolved_version is None:
msg = "Could not resolve any known AIFS-Single checkpoint from package."
if version is not None:
msg += f" Requested version={version}."
raise FileNotFoundError(msg) from last_err
if resolved_version == "1.1":
cls._require_aifs11_optional_dependencies()
model = torch.load(

Ensure AIFS.load_model() runs the v1.1 optional-dependency check when checkpoint
autodetection resolves to 1.1, preventing cryptic runtime errors when the aifs11
(anemoi) stack is not installed.
@NickGeneva
Copy link
Collaborator

Hi @wm-ytakano

Thanks a ton for this PR. This is awesome. I've been merging in your commits with the new checkpoint support in another existing PR we have here: #606

We will just move the wrapper onto the 1.1 version of the checkpoint.

I'll close this one when we get the other merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants