Skip to content

Registry of echoIA datasets — standardized metadata and Zenodo links for intrinsic alignment data compilations (A₋IA vs luminosity, mass, redshift, etc.).

License

Notifications You must be signed in to change notification settings

echo-IA/echoia-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

echoIA Datasets Registry

This repository is the index and governance hub for datasets curated by the echoIA Collaboration. It is where you create, update, and register echoIA datasets before publishing them on Zenodo.

How to use this repository

Use this repo to scaffold and register datasets following echoIA’s standardized schemas. Each schema (meaning your dataset type e.g. aia-lum, aia-z, eta) has its own scaffolding script and validation tool to ensure consistency.

Workflow

  • Create a branch for your dataset addition or update. Name the branch descriptively.
  • Pick a schema — choose from available options in schemas/.
  • Run the matching scaffold script in scripts/ to generate the dataset files.
  • Fill in the generated CSV and metadata YAML with your measurements and details.
  • Upload the ZIP (data + short README; not metadata yaml file) to Zenodo, then record its DOI in the metadata.
  • Validate with the corresponding validate_*.py tool.
  • Commit only the metadata and registry entry (not the data files).
  • Open a pull request to add or update the dataset in the registry.

See datasets/aia-lum/README.md for full details on aia-lum datasets.

Directory structure

schemas/ — dataset schema definitions
scripts/ — scaffolding and validation tools
datasets/ — scaffolded records and metadata

Each dataset corresponds to a single Zenodo record, linked here by its canonical ID and DOI.


Dataset naming convention

Each dataset is assigned a canonical ID, used both on Zenodo and in this registry. This ensures consistent citation, easy discovery, and machine-readable linking across echoIA datasets.

General pattern:

<category>-<yyyy>-<sample>-<firstauthoryy>

where:

  • <category> — short dataset family tag (e.g., aia-lum, aia-z, eta, mock-ia)
  • <yyyy> — publication year of the original measurement or dataset source
  • <sample> — short descriptor of the galaxy sample, survey, or subsample (lowercase, hyphenated)
  • <firstauthoryy> — first author tag with publication year, e.g. Author et al. (2025) → a25

Example:

aia-lum-2017-sdss-redmapper-u17

Rules:

  • Lowercase, hyphen-separated
  • <sample> may include short survey tag or subsample identifier
  • <yyyy> = publication year of the original measurement

Currently supported dataset schemas:

  • aia-lum — intrinsic alignment amplitude vs luminosity

Example: intrinsic alignment vs luminosity (aia-lum schema)

Run from the repo root:

python scripts/make_ia_vs_lum_dataset.py \
    --schema aia-lum \
    --id aia-lum-YYYY-sample-firstauthor \
    --title "Descriptive dataset title" \
    --year YYYY \
    --first-author SurnameYY \
    --sample sample-tag \
    --creator-name "echoIA Collaboration" \
    --creator-affil "echoIA"

This creates three files in datasets/aia-lum/:

  • <id>-metadata.yaml — echoIA-side metadata (kept in Git)
  • <id>-data.csv — the dataset (uploaded to Zenodo, not stored in Git)
  • <id>-README.txt — helper notes (temporary)

License & Attribution

  • License: CC-BY-4.0
  • We do not claim ownership of original research results.
  • All numerical values are factual reproductions from cited works.
  • When using an echoIA dataset:
    • Cite the original paper(s) listed in metadata
    • Optionally cite the echoIA registry DOI

About

Registry of echoIA datasets — standardized metadata and Zenodo links for intrinsic alignment data compilations (A₋IA vs luminosity, mass, redshift, etc.).

Resources

License

Stars

Watchers

Forks

Languages