Pierrette Lo
with additional report by Nicola Long below
May 13 & 14, 2020
I had the pleasure of attending the csv,conf,v5 virtual conference last week. It's named after the CSV data format, but it's a unique mix of speakers and topics at the intersection of data science, journalism, tech, advocacy, open source, academic research, and government.
I attended in person last year and wasn't sure it would be relevant to our interests as biomedical researchers, but I came away inspired about open science and reproducibility. It was also refreshing to interact with so many interesting people from outside our usual bubble.
This year's conference was scheduled to take place in Washington, DC, but as a small upside of the pandemic, it was converted to a virtual conference. They had solid attendance, the Crowdcast platform worked well, and the organizers did a phenomenal job of keeping things running smoothly and on time while making everyone feel welcome. Between the chat/question windows, Slack, and live-tweeting, I'm amazed at the multitasking involved in participating in a virtual conference.
Below is a quick rundown of talks and other resources that I found most relevant to my work. There was so much more content that I didn't cover here, so if you have data-related side passions, I encourage you to explore the full schedule!
General links:
- Schedule & abstracts (will eventually link to all the talks and slides)
- Slide collection
- Video collection
Teaching/Community/Collaborations:
- Learning-centered teaching for the non-traditional data classroom (talk) – Helpful tips for teaching/learning in workshops, meetups, online – basically the types of learning that most of us do these days
- Data Communities and Those Who Build Them (talk) – How to build a community of practice to keep that nontraditional learning going. These "people" concepts are applicable to any area, not just data science
- Building successful collaborations around healthcare data (talk) – How medical experts and computationalists can collaborate happily
- A graduate student perspective on overcoming barriers to interacting with open-source software (article) – Good read whether you're the grad student or the person teaching them
Open data/reproducible research:
- Accessibility and reproducibility in ecological time series analysis (talk) – How to package your open dataset into a compendium that others will actually be able to use. Example compendium here
- Data and code for reproducible research (talk) – Interesting recap of the NLM's Reproducibility Workshop held last year – basically a group effort to try to reproduce a paper (spoiler alert: it was hard and it didn't work, but they learned a lot while trying!). There are still some good tutorials and other materials on best practices and tools for reproducible research on the workshop's website.
- How Frictionless Data can help you grease your data (talk) – "The Frictionless Data initiative at Open Knowledge Foundation aims to reduce friction in working with data, with a goal to make it effortless to transport data among different tools and platforms for further analysis, and with an emphasis on reproducible research and open data."
- Free virtual Frictionless Data workshop on May 20 – sign up here
- Protected health information breaches on GitHub (talk) – A terrifying true tale of PHI accidently uploaded to GitHub 😱😱😱
- How to read a research compendium (article) – "Research compendia are an increasingly used form of publication, which packages not only the research paper’s text and figures, but also all data and software for better reproducibility. We introduce the existing conventions for research compendia and suggest how to utilise their shared properties in a structured reading process."
- How open science helps researchers succeed (article) – Practicing "open science" isn't just about altruism – it can help your career, too
R:
- RMarkdown-Driven Development (talk) – This was my favourite talk in terms of immediately useful tips. If you've been working in R but are ashamed to share your code with others – (a) don't be! Everyone feels the same way; but (b) the tips in this talk will help you clean things up, both for others and your future self. This talk is based on this excellent blog post and technical appendix.
Below are some additional talks that I found interesting:
-
Decision making in "successful' data analysis (talk) - We think of data as being objective when it's not. When we're analyzing data, we're making many explicit and implicit decisions along the way which impact the outcome.
-
How data has transformed journalism. Inside and out. by Sisi Wei (talk) - Keynote address from an investigative journalist. I like how she talked about how people in her industry can "collaborate on technology and compete on the stories". This also applies to our academic research. I also liked her idea of interviewing your data set. "If you had a human being with all of this knowledge, what would you ask them?" Great use of data visualizations.
-
Data Solidarity: Lessons from co-building open source tools with Indigenous Peoples by Emily Jacobi (talk) - This was the second keynote address. It had little to do with our research work, but the data story around the technology of Mapeo was beautifully recounted and worth watching.