Are you working reproducibly? Why or why not? #227
Replies: 2 comments 6 replies
-
|
Hi there, just stumbled upon calkit. It's a very interesting endeavor and I'll be following its development! I have been reviewing the workflows in my projects this year, trying to adopt best practices from DevOps, MLOps and DataOps worlds. So far, everything that is strictly software related has been a breeze to implement (e.g. automatic checks, tests, doc-building on CI/CD). However I struggle with finding good solutions for maintaining and automating my data pipelines, especially because I work on an HPC environment. Each of my projects will typically have a dataset living somewhere in the cluster filesystem. It's not always clear if the data will always live there, and therefore have restricted access, or if it can be packaged and published (e.g. to Zenodo) together with a publication. On top of that, a lot of the steps in my pipeline need to be scheduled to compute nodes via Slurm (so a CI/CD runner wouldn't have access to running these steps). I generally like the approach of tools like DVC but I couldn't figure out yet how to make if work for my setup. In the forums the interfacing with Slurm systems seems to be a current pain point (e.g. treeverse/dvc#1057). And ideally I would like to avoid duplication of storage from Do you have any thoughts about this use case? |
Beta Was this translation helpful? Give feedback.
-
|
Re. ideal workflow: I don't know if I could trigger the pipelines from GitLab CI/CD because the runner is not on the cluster and does not have access to the compute nodes. And even if I could, some jobs are really long running (3 days) so I would prefer to trigger them manually. Now, as to whether triggering jobs from my local machine or from the cluster, I don't think there's much difference for me. I wouldn't mind having to log in to the cluster to run the pipelines. From your design brainstorming, I like the idea of the "machines" category, because I could imagine projects where I would have access to some cloud computing service, or run things in another HPC center. The |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Working reproducibly can be loosely defined as:
What's your current workflow like?
Beta Was this translation helpful? Give feedback.
All reactions