This repository is the index and governance hub for datasets curated by the echoIA Collaboration. It is where you create, update, and register echoIA datasets before publishing them on Zenodo.
Use this repo to scaffold and register datasets following echoIA’s standardized schemas. Each schema (meaning your dataset type e.g. aia-lum, aia-z, eta) has its own scaffolding script and validation tool to ensure consistency.
- Create a branch for your dataset addition or update. Name the branch descriptively.
- Pick a schema — choose from available options in schemas/.
- Run the matching scaffold script in scripts/ to generate the dataset files.
- Fill in the generated CSV and metadata YAML with your measurements and details.
- Upload the ZIP (data + short README; not metadata yaml file) to Zenodo, then record its DOI in the metadata.
- Validate with the corresponding validate_*.py tool.
- Commit only the metadata and registry entry (not the data files).
- Open a pull request to add or update the dataset in the registry.
See datasets/aia-lum/README.md for full details on aia-lum datasets.
schemas/ — dataset schema definitions
scripts/ — scaffolding and validation tools
datasets/ — scaffolded records and metadata
Each dataset corresponds to a single Zenodo record, linked here by its canonical ID and DOI.
Each dataset is assigned a canonical ID, used both on Zenodo and in this registry. This ensures consistent citation, easy discovery, and machine-readable linking across echoIA datasets.
General pattern:
<category>-<yyyy>-<sample>-<firstauthoryy>
where:
<category>— short dataset family tag (e.g.,aia-lum,aia-z,eta,mock-ia)<yyyy>— publication year of the original measurement or dataset source<sample>— short descriptor of the galaxy sample, survey, or subsample (lowercase, hyphenated)<firstauthoryy>— first author tag with publication year, e.g. Author et al. (2025) → a25
Example:
aia-lum-2017-sdss-redmapper-u17
Rules:
- Lowercase, hyphen-separated
<sample>may include short survey tag or subsample identifier<yyyy>= publication year of the original measurement
aia-lum— intrinsic alignment amplitude vs luminosity
Run from the repo root:
python scripts/make_ia_vs_lum_dataset.py \
--schema aia-lum \
--id aia-lum-YYYY-sample-firstauthor \
--title "Descriptive dataset title" \
--year YYYY \
--first-author SurnameYY \
--sample sample-tag \
--creator-name "echoIA Collaboration" \
--creator-affil "echoIA"This creates three files in datasets/aia-lum/:
<id>-metadata.yaml— echoIA-side metadata (kept in Git)<id>-data.csv— the dataset (uploaded to Zenodo, not stored in Git)<id>-README.txt— helper notes (temporary)
- License: CC-BY-4.0
- We do not claim ownership of original research results.
- All numerical values are factual reproductions from cited works.
- When using an echoIA dataset:
- Cite the original paper(s) listed in metadata
- Optionally cite the echoIA registry DOI