https://www.youtube.com/watch?v=eGtwgYt_QnA&list=RDeGtwgYt_QnA
This repo contains a small set of scripts to prepare data, train the U-Net, run inference, and shuttle artifacts between machines.
- Conda/YAML envs live in
Config/Environment Defs/:- CUDA:
icme3.12-cuda.yml - Apple Metal:
icme3.12-metal.yml - CPU:
icme3.12.yml
- CUDA:
- Example:
conda env create -f "Config/Environment Defs/icme3.12-cuda.yml" && conda activate icme3.12 - PIP fallback:
pip install -r "Config/Environment Defs/requirements.txt"
- Per-host settings are in
Config/Machine.json. The key must matchsocket.gethostname(). It's possible to override the host with $MACHINE - Required keys:
fits_root,masks_root,hmi_root: where raw data livesartifact_root: where parquet/plots/models are written (per host)train_batch_size,apply_batch_size,chunk_size,max_inflight_plots,plot_threads
- Optional:
inheritslets a host clone another entry and override only a few paths. - Global plot settings:
Config/Plot.json(target_px,dpi). - Paths/parquet outputs live under
Outputs/Artifacts/<hostname>/. - Models save to
Outputs/Models/<architecture><date_range>.keras. - Model definitions and date ranges live in
Models/(e.g.,Models/A1.py).
Build the dataset parquet from raw FITS/masks/HMI roots.
python Scripts/Make.py Dataset [hourly]- Uses roots from
Config/Machine.json. - Default is
hourly=False(keep all matches). Passhourlyto keep one sample per hour per day. - Writes
Outputs/Artifacts/<host>/Paths.parquetplus helper CSVs for missing data. - Run
python Scripts/Make.pyto list available Make scripts.
python Scripts/Train.py <architecture_id> <date_range_id>- Example:
python Scripts/Train.py A2 D1 - Loads
Models/<architecture_id>.pyand injectstrain_batch_sizefromMachine.jsononly ifbatch_sizeis not set in the model definition. - Date ranges are defined in
Models/<architecture_id>.pyand selected by<date_range_id>. - Uses generator-based training with optional
correct_steps_by_n. - Saves model to
Outputs/Models/<architecture_id><date_range_id>.keras.
python Scripts/Apply.py <architecture_id> <date_range_id> <postprocessing> <start> <end>- Example:
python Scripts/Apply.py A2 D1 P1 20170601 20170701 <postprocessing>must match a file inConfig/Postprocessing/(e.g.,P0,P1,Custom).<start>/<end>slice the Paths.parquet index (timestamp strings likeYYYYMMDD_HHMM).- Uses
apply_configfromMachine.json(apply_batch_size,chunk_size,plot_threads,max_inflight_plots). - Outputs:
.npypmaps viaLibrary.IO.pmap_path(co-located with masks)- CH overlay PNGs and mask-only PNGs for requested and baseline
P0postprocessing
python Scripts/Make.py Stats <architecture_id> <date_range_id> <postprocessing> [synoptic]- Example:
python Scripts/Make.py Stats A1 D1 P1 - Add
synopticto readPaths (Synoptic).parquetinstead ofPaths.parquet. - Writes
Outputs/Artifacts/<host>/Stats/<architecture><date_range><postprocessing>_stats.parquet.
Sync FITS/masks/HMI trees between the main and “mini” hosts.
python Scripts/Make.py Synoptic up # copy miracle -> miracle_mini
python Scripts/Make.py Synoptic down # copy miracle_mini -> miracle (rsync entire roots)
python Scripts/Make.py Synoptic inplace # build synoptic subset only- Relies on
miracleandmiracle_minientries inConfig/Machine.json. - Uses rsync; creates missing destination directories.
- In
upmode, builds a synoptic subset (00/06/12/18) before copying. - In
inplacemode, only buildsPaths (Synoptic).parquetfromPaths.parquet.
- TF/XLA logging is suppressed in scripts; GPU growth is enabled in Apply.
Library/Config.pyauto-selects the host section fromMachine.jsonand exposespaths,apply_config, andtrain_batch_sizeto scripts.- Run
python -m Scriptsto list available commands.