-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile inputs into "annotation bundles" that are downloadable #938
Comments
I've created annotation bundles containing the inputs outlined on the Gathering Input Files wiki page. They run between 50-100 gigs unzipped, and I don't think there's much room for improvement in size.
At this point, I'd like to solicit feedback on
|
This is awesome. A couple thoughts:
|
For distributing them, I think a Google and/or AWS S3 bucket would be appropriate. Generally if we do a file server, we will have to do that with a cloud server anyway. And the volume attached will be active disk and cost more per month than just placing them in a bucket. We have been using genomedata.org for this kind of thing for a while now but it is surprising how much the cost adds up to keep it up and maintain a backup. |
Great feedback - thanks!
|
One option might be to distribute some partially filled-in input YAMLs with the bundle. If we do make versions of the pipelines that accept the bundles as a single input, I hope we also keep versions that don't. I'd like the flexibility to use existing files I have lying around various places without having to convert them into a bundle first. We already have a GCP bucket with the test input files from our repo, so that seems like a reasonable place to put it (as long as it isn't too expensive to host). |
initially, just for common species (human, mouse) and updates to coincide with version releases
The text was updated successfully, but these errors were encountered: