Question about calkit with multiple subprojects #359
Replies: 3 comments
-
|
That's a good question. As of now, no subproject functionality has been built yet, though I started to try to make my own PhD work into a Calkit project. I would lean towards using a monorepo with no submodules, but during my PhD I created a new repo for each experiment, simulation campaign, paper, and one for the dissertation, so that seems to lend itself to subprojects with submodules. They all had overlapping dependencies as you say, and I did some wacky (non-reproducible) stuff to copy things back and forth when they something was modified. The overarching philosophy is that everything should be able to be reproduced with a single command. I could imagine something like this that uses submodules: # calkit.yaml for the "super project"
name: my-phd
title: Investigating a thing
subprojects:
- path: project-a
- path: project-b
pipeline:
stages:
copy-files-from-project-a:
kind: shell-command
command: cp project-a/somefile.tex project-a.tex
inputs:
- project-a/somefile.tex
outputs:
- path: project-a.tex
storage: null # Keep out of version control here since it's already in project A
copy-files-from-project-b:
kind: shell-command
command: cp project-a/somefile.tex project-b.tex
inputs:
- project-b/somefile.tex
outputs:
- path: project-b.tex
storage: null
build-thesis:
kind: latex
tex_file_path: thesis.tex
inputs:
- from_stage_outputs: copy-files-from-project-a
- from_stage_outputs: copy-files-from-project-bWhen What do you think? Do you have an ideal workflow in mind? Are project A and B done now, so all you need to work on is the dissertation, but you'd like to collect up all of the publications, code, figures, data, etc., into one project so they can be consumed by others in one place, sort of like a bundle of all of your PhD research? Or are you mostly concerned with syncing text from the papers into the disseration? If you feel like sharing the projects, I'd be happy to take a look and see if I can build something that helps make things easier. |
Beta Was this translation helpful? Give feedback.
-
|
In my case:
The submodule of project B inside project A isn't set up yet: currently One thing that I suspect would be important for reproducibility is to allow different super projects to point to different commits of the sub projects, since the sub projects change over time. Because if someone is trying to reproduce an optimization paper, they would presumably both like to be able to exactly reproduce what is published (ie using model v1), and rerun the optimization using model v2 if a recent model enhancement has been made, and the dissertation might use slightly different versions of the text than the publications for each project do. It seems like that would be very difficult with monorepo but easy with submodules. That also seems to suggest that perhaps the MDOcean model and optimization should be in separate repos, which is not currently the case. All the projects are still changing. The consolidation of dissertation writing will start in a few months, and my goal would be to sync figures and data as well as text. Thanks for your thoughts. |
Beta Was this translation helpful? Give feedback.
-
|
I set up the submodule (symbiotic-engineering/MDOcean#97) and updated the calkit and dvc yaml files with the submodule paths (without any file copying as suggested). Then Running again: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
What is the intended use of calkit for multiple projects with partially overlapping dependencies? For example, a dissertation.tex that only I need access to, and it contains project_A.tex, project_B.tex, and project_C.tex, each of which have their own github repo and overleaf doc with distinct collaborators, but project_A uses some code from project_B via submodule?
Beta Was this translation helpful? Give feedback.
All reactions