Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch dataloader #7

Open
mariusgiger opened this issue May 6, 2022 · 5 comments
Open

Pytorch dataloader #7

mariusgiger opened this issue May 6, 2022 · 5 comments

Comments

@mariusgiger
Copy link

Hi all,

Thanks for the great work to provide more suitable primitives for Machine Learning in Heliophysics!

It would be great to have a Pytorch DataLoader for the SDO ML v2 dataset in order to ease data loading and facilitate reproducibility.

Ideally the DataLoader should:

  • allow to access header information as well as optional labels during training/validation/testing
  • provide straightforward extensibility for a custom set of labels (e.g. flaring/non-flaring) in addition to the already existing header information
  • implement a set of default transformations that are suitable for the different channels (clamping, normalizing)
  • provide a strategy to downsample the data temporally
  • provide a way to downsample the data spatially (target size)
  • provide a suitable strategy to split the data into train/validation sets
  • provide a guideline how to split data between the test and train/val sets
  • ...?

A potential starting point can be found here: https://github.com/i4Ds/awesome-helio/pull/10/files (still a few TODOs) - feedback is welcome.

Cheers,
Marius

@PaulJWright
Copy link
Member

PaulJWright commented May 6, 2022

That would be great! I will take a look at the one there when I get a moment; please do feel free to do a PR to incorporate your DataLoader when it's finished!

@mariusgiger
Copy link
Author

A more advanced version can be found here: https://github.com/i4Ds/sdo-cli/blob/main/src/sdo/sood/data/sdo_ml_v2_dataset.py, still a few open issues but it solves some of the features mentioned above.

@PaulJWright
Copy link
Member

PaulJWright commented Jul 10, 2022

Thanks Marius. This looks great!

I am toying with developing a DataLoader, but we can definitely link to https://github.com/i4Ds/sdo-cli/blob/main/src/sdo/sood/data/sdo_ml_v2_dataset.py from the SDOML github?

@mariusgiger
Copy link
Author

Hi @PaulJWright, sure you can link it. Not all the things I have put there will be needed by others but it can serve as an inspiration.

Let me know if you need some help.

Cheers!

@PaulJWright
Copy link
Member

PaulJWright commented Jul 12, 2022

Great, will do @mariusgiger! I like yours a lot, so I think I will keep the one i'm developing for the most basic use-cases (the notebooks here, for example), and direct people over to yours for more complicated things!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants