GitHub - ukaea/fair-mast-ingestion: Ingestion tool for the FAIR MAST project

FAIR MAST Data Ingestion

Running on CSD3

Installation on CSD3

After logging into your CSD3 account (on Icelake node), first load the correct Python module:

module load python/3.9.12/gcc/pdcqf4o5

Clone the repository:

git clone [email protected]:ukaea/fair-mast-ingestion.git
cd fair-mast-ingestion

Create a virtual environment:

python -m venv fair-mast-ingestion
source fair-mast-ingestion/bin/activate

Update pip and install required packages:

python -m pip install -U pip
python -m pip install -e .

The final step to installation is to have mastcodes:

git clone [email protected]:MAST-U/mastcodes.git
cd mastcodes

Edit uda/python/setup.py and change the "version" to 1.3.9.

python -m pip install uda/python
cd ..
source ~/rds/rds-ukaea-ap002-mOlK9qn0PlQ/fairmast/uda-ssl.sh

S3 Support (Optional)

Finally, for uploading to S3 we need to install s5cmd and make sure it is on the path:

wget https://github.com/peak/s5cmd/releases/download/v2.2.2/s5cmd_2.2.2_Linux-64bit.tar.gz
tar -xvzf s5cmd_2.2.2_Linux-64bit.tar.gz
PATH=$PWD:$PATH

And add a config file for the bucket keys, by creating a file called .s5cfg.stfc:

[default]
aws_access_key_id=<access-key>
aws_secret_access_key=<secret-key>

You should now be able to run the following commands.

Submitting runs on CSD3

First submit a job to collect all the metadata:

sbatch ./jobs/metadata.csd3.slurm.sh

Then submit an ingestion job

sbatch ./jobs/ingest.csd3.slurm.sh campaign_shots/tiny_campaign.csv s3://mast/test/shots/ amc

Manually Running Ingestor

Local Ingestion

The following section details how to ingest data into a local folder on freia with UDA.

Parse the metadata for all signals and sources for a list of shots with the following command

mpirun -n 16 python3 -m src.create_uda_metadata data/uda campaign_shots/tiny_campaign.csv

mpirun -np 16 python3 -m src.main data/local campaign_shots/tiny_campaign.csv --metadata_dir data/uda --source_names amc xsx --file_format nc

Files will be output in the NetCDF format to data/local.

Ingestion to S3

The following section details how to ingest data into the s3 storage on freia with UDA.

Parse the metadata for all signals and sources for a list of shots with the following command

mpirun -n 16 python3 -m src.create_uda_metadata data/uda campaign_shots/tiny_campaign.csv

This will create the metadata for the tiny campaign. You may do the same for full campaigns such as M9.

Run the ingestion pipleline by submitting the following job:

mpirun -np 16 python3 -m src.main data/local campaign_shots/tiny_campaign.csv --bucket_path s3://mast/test/shots --source_names amc xsx --file_format zarr --upload --force

This will submit a job to the freia job queue that will ingest all of the shots in the tiny campaign and push them to the s3 bucket.

CPF Metadata

To parse CPF metadata we can use the following script (only on Friea):

qsub ./jobs/freia_write_cpf.qsub campaign_shots/tiny_campaign.csv

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.github/workflows		.github/workflows
campaign_shots		campaign_shots
jobs		jobs
mappings		mappings
parameters		parameters
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FAIR MAST Data Ingestion

Running on CSD3

Installation on CSD3

S3 Support (Optional)

Submitting runs on CSD3

Manually Running Ingestor

Local Ingestion

Ingestion to S3

CPF Metadata

About

Releases

Packages

Contributors 4

Languages

ukaea/fair-mast-ingestion

Folders and files

Latest commit

History

Repository files navigation

FAIR MAST Data Ingestion

Running on CSD3

Installation on CSD3

S3 Support (Optional)

Submitting runs on CSD3

Manually Running Ingestor

Local Ingestion

Ingestion to S3

CPF Metadata

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages