Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topics to teach #15

Open
tracykteal opened this issue Jan 8, 2017 · 8 comments
Open

Topics to teach #15

tracykteal opened this issue Jan 8, 2017 · 8 comments
Labels

Comments

@tracykteal
Copy link

We'll discuss what topics to teach on the first day of the hackathon, but this is to give some more context for that discussion.

Data Carpentry workshops follow a narrative approach of how someone would go from start (getting their data back and setting up their project) through to the final output. In a regular Data Carpentry workshop, that would be a plot or figure, but here we're looking all the way through to publication of the code/notebook and data.

For instance, this is the overview of the R Reproducible Research curriculum
https://github.com/datacarpentry/rr-workshop/blob/gh-pages/workshopOverview.md

Therefore the narrative components to working reproducibly with data in the Jupyter notebook could be

  • Data organization, metadata and project organization
  • Loading data into the Jupyter notebook (potentially from remote as well as local sources) and exploring the data, getting a sense of the data
  • Data cleaning?
  • Data analysis and plotting
  • Version control
  • Automation
  • Publishing/making available notebooks
  • Publishing/making available data

The last few including things like

  • Put notebook(s) in more viewable form (e.g. mybinder)
  • DOIs where to upload/distribute data and code from
  • ORCID, linking ‘research products’ to your ORCID

What core topics are missing from this? What do you do in your workflow that's not included here? Any that shouldn't be included as core topics?

Since we only have two days in a workshop, we have to identify the core concepts and skills to teach. We also can identify good references or other lessons to link to for things we don't have time to discuss though.

@bridgethass
Copy link

bridgethass commented Jan 8, 2017

It might be useful to include a component on:

  • Techniques for debugging and troubleshooting.

This could potentially be nested under organization and/or version control - I have sometimes found it challenging to stay organized while developing code and I tend to make separate "scratch" scripts as I am writing functions that I like to save to remember everything I tried.

@kellieotto
Copy link
Contributor

+1 for debugging and troubleshooting.

Unit testing might also be nested under version control or automation.

@choldgraf
Copy link
Contributor

I think it'd also be useful to include some component of pushing your work out there into the world. There are a lot of repositories with neat analyses / data / etc but it's still not that useful unless other people find and use it. Maybe talking about the different avenues we have for sharing work would be useful, though it might be something that doesn't have a clear answer and is better as a general discussion or something.

@choldgraf
Copy link
Contributor

also I'm +1 for data cleaning because I think it interacts nicely with data organization etc. maybe a quick intro to the concepts behind "tidy" data or something like this.

@dsoto
Copy link
Contributor

dsoto commented Jan 9, 2017

I like this list and the narrative format. As I look at it, I see some themes emerging that we can return to for the participants.

  • Clean Structure: directories and filenames, tidy data, data cleaning, and other topics all share a concept of creating structures that are simultaneously useful to humans and machines
  • Documentation: version control, data provenance, and code comments share a need to articulate what was done and why for future understanding or reuse.
  • Communication: what are the most portable formats we can store our work in so they can be easily shared in different media?

I'm sure there are others. I think these meta-topics could help provide coherence to a list of topics that could seem disconnected to some novices.

@mpacer
Copy link

mpacer commented Jan 9, 2017

I said a lot of things in my comment in #8 that seem like they would be sub-points to these topics. I like this structure a lot so that makes me think that I'm thinking along similar lines, which is reassuring.

One thing that seems to be missing from here are methods for collaborating with others who don't want to or won't use notebooks (e.g., advisors). An example would be something like mybinder, where there is little to no setup cost for the other person to at least see what the code and results look like directly.

Related to collaboration is integrating with extant code specific to your lab's prior work & software systems that don't easily integrate with notebooks. I'm not brimming with solutions but these are definitely problems that arise, especially in interdisciplinary work.

@butterflyology
Copy link

butterflyology commented Jan 9, 2017

Backing up a bit, should there be an Introduction to Reproducible Research lesson?

Here is the Intro lesson for 'R' and the formatted 'gh-pages'

For reference, here are the Reproducible Research with R lessons:

@ErinBecker
Copy link

ErinBecker commented Jan 9, 2017

To follow up on my verbal comment - I'd like to think about integrating discussion about learner mindset with respect to potentially feeling threatened/judged when making their code or analyses available to the public at large. This could include normalizing error, building a computational identity, imposter syndrome. This wouldn't be its own half-day module, but I'd like to see how we can integrate these topics into each of the lessons and how we interact with the learners throughout this curriculum.

@hlapp hlapp added the discuss label Jan 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants