-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Davis pilot workshop reflection #7
Comments
Generally went well Michael, but here are some thoughts and they should only be taken in the spirit of constructive criticism. While I think you did an excellent job covering the material outlined, and you obviously have a mastery of the material, I would have structured the lesson another way because I think the lesson tended to over-emphasized advanced topics and underemphasize base concepts. First, I think OpenRefine was a red herring at best. An idiosyncratic Java package with a dubious future and questionable scalability is not something I would have spent time on. Would be much better in my mind to learn just a few of the tasks that OpenRefine can do in R (such as grep, regexp, etc.). Yes, I know folks liked it but they also like using Excel, and that is exactly what we are trying to move them away from. Day 1 covered most of the high points of an average introduction-to-stats-software course, although I think a bit more time could be spent on the nature of data in R (data-frames verus scalar/vectors/matricies) than we did, and a bit less on logical data (important topic, but only in context of data manipulation/generation). Topics that might have been added include: importing data from other applications, transforming variables (not covered until Day 2), summary statistics (for categorical as well as continuous data), base functions (math, string, logical, dates), replacing/recoding data, and metadata (variable & value labels). In some ways the instructor was handicapped by choice of gapminder data, and if you really want to focus on social science topics, you are going to want a more mixed data set (e.g. survey data). Looking through the tidyr material, I just noticed we skipped join altogether. I do not usually cover this in a short class, but that would have been a useful topic as well, although you would need a bit more substantive example to drive the lesson. Day 2 was really a hodge-podge of marginally related topics, and the lesson flow suffered as a result. For the most part the dplyr section was on target, but I do have a general reservation of teaching idiosyncratic library functions. Now I realize that R is mostly just idiosyncratic library functions, but I'm always hesitant to teach foreign functions before the students have even a basic understanding of the underlying base language. Now I realize due to R's evolutionary development, what is a foreign library today may be in the base tomorrow, and I do not have enough experience to judge the merits of dplyr versus the alternatives, but I did want this issue to be raised, whether it be considered or disregarded. The function section was pretty much a throwaway. Yes, it is a useful topic but it has very limited application for a novice R user. In my opinion, they would have gotten more out of a discussion of loops, than the brief exposure to functions. Similiarly, the dynamic documents section was very cool from a programming point of view, and an unnessary diversion from a teaching point of view. A very useful bit of technology, but not really something I would spend time teaching beginners. The statistics section was fine but without a bit on non-parametrics (frequencies and crosstabs at least) it felt somewhat lacking. There are a ton of other techniques you could have covered (ANOVA, T-test, logistic regression) but I agree that there is only so much time you want to spend on this section. Perhaps a discussion of what is built-in and what needs to be installed as a package might have been helpful. Versioning was a major issue throughout the course. Something has to be done, probably at the beginning of the class, to make sure everyone is using the same version of R and the libraries. This issue cropped up way too often, although with the nature of R, it might be unavoidable. I am not sure how married the Data Carpentry program is to the two day workshop, but my recommendation would be to par down the class to its basics and squeeze it into a day. If you must have two days, then you may want to split the class into part I and part II, using part I as the prerequisite. Finally, I think there is an underlying philosophical/pedagocial theme that runs through the course the I would encourage you to re-evaluate. This notion that courseplans can and should be improvised is antithetical in my mind to a properly paced class that flows logically and keeps the students' interest. Perhaps what I say is heretical, flying in the face of a guiding principle of Data Carpentry, but my experience has show me that the more defined the course, the more logical the process, and the tighter the presentation, the more effective the class. This is a guiding principle of education that we like to ignore in higher education because we are supposedly 'beyond' the need for such structure. My argument is we are not, we just tend to be too lazy and rationalize when our shortcuts fail -- we all do it, myself included. Note, a good part of this problem unavoidably stems from the moduluar nature of the course. I think the first R lesson flows the best because the topics are better linked conceptually than then the later lessons, so the problem is surmountable. In conclusion, I am very impressed with how developed the materials are and with Michael's ability to teach a very difficult topic to an 'unusual' audience. Feedback was requested so I thought I would add a few thoughts, in the hopes of refining the overall lesson, and it was not my intent to sound hypercritical, only to point out the shortcomings I perceived. |
Thanks Michael for getting the conversation started here. I vastly enjoyed the workshop and thought that it was exceptionally good, particularly for our first pilot of the social sciences material. I have no experience teaching R, but do have training in instructional design and experience working with social scientists (education researchers), so will limit my comments to those areas.
This isn't simply a "guiding principle" of Data Carpentry. There is an extensive body of education research literature showing that responding to learners' difficulties in real time (ie during class) is a more effective method of teaching than working through a pre-set lesson plan to cover the prescribed material without getting any feedback from learners. This is why our workshops are taught the way they are and why the minute cards at mid-point and end of each day are essential components of our teaching. I think Michael did a fantastic job of responding to learner feedback both from minute cards and during class time and have no doubt that learners had a more useful learning experience from this than they would if he had walked through a pre-defined set of examples without getting learner input. That being said, I think this was implemented better the first day than the second, which may have been due to trying to fit too much in day 2.
Overall, I think the curriculum has some tweaking to do, but that it was a very solid pilot that provided a useful learning experience for the participants. |
Hi, Michael, I am going to echo Erin's comments that this was a great launch for the social studies R workshop. I saw a suggestion in the student feedback about challenge questions for homework. I think that's a good idea. That way they can take as much time as they need and find out what is truly unclear vs. not having had a chance to process a lot of new information. Again, looking through the student feedback, it looks like they were happy by the pace on the first day, but had too much the second day. Would the Data Carpentry consider moving pieces of the workshop around? For example, on day 1 start with OpenRefine, continue with intro to R, project management, subsetting and data.frame, maybe some tidyr for day 1. On day 2, start with statistics and plotting, and finish with dynamic documents and spreadsheets and best practices--more easy to absorb even when the students are tired. I have qualms about suggesting doing the applied stuff prior to best practices, especially since spreadsheets leads so nicely into OpenRefine, but on the other hand it is important to take into account the students ability to learn as they get more tired. Vessela |
Hi Michael, Sorry for the delay in getting a response to this - I've been traveling for the past few weeks. Most of my comments have been mentioned above (and I was only present for Day 1), but I would support re-arranging of concepts so that Day 2 doesn't end up being overwhelming. I liked OpenRefine (hadn't been exposed to it before) and I can definitely see myself using it in the future - but as a learner I would have also liked knowing about the tools available to do the same thing in R (even just a mention of them so I could look them up later if I wanted to streamline my workflow). Myfanwy |
Feedback
Reflections
In general, I thought it went well, learners seemed happy with it. Day 1 especially (spreadsheets, openrefine, and R lessons 1-4) I think is pretty solid as is.
Day 2 could use some tinkering. Single-table
dplyr
took the whole morning. Afternoon consisted oftidyr::gather
, statistical modeling, writing functions, and dynamic documents, in that order.By the end of day 2, students were pretty fried. I don't know if there is a way around that: Forging new neural connections for two days is just exhausting. But the (my) tendency to cram material in the second afternoon needs to be avoided. Reserving space for a capstone exercise might help with this, or students might be too spent to do that kind of independent work at the end. An alternative is a showcase of possible next steps: Here's the kind of natural language processing you can do in R (showing without teaching) and your first resource to start learning it, and the same for social network analysis, structural equation modeling, etc.
Lesson 5 -
dplyr
People like learning
dplyr
, understandably so. It handles most of what most people do. The basic structure is good, I think.Piping of data.frames to the first argument in the subsequent function didn't sink in with some students, even though I felt like I went over it quite a few times. In exercises, several students would include data.frames in functions that were receiving them from a pipe. I think this is a symptom of the various arguments to
dplyr
functions not being clear enough, and the issue below about the structure ofmutate
andsummarise
being different than the others is part of this. Introducing all the verbs with intermediate assignment and then introducing piping at the very end might help.The structure of
mutate
andsummarise
is different than the other verbs because they contain acolName =
that the others don't. Maybe pointing explicitly to that syntactical difference a couple times -- "these two functions create new columns, and we give those columns names withcolName =
" -- would help. Assignment to columns within piped functions and assigning the resulting data.frame to a variable is complicated, and at least some learners have a hard time groking the component parts.Piping to
head
at the end ofdplyr
chains inevitably leads to students copy-and-pasting code and assigning the head of a data.frame to a variable.head
should be taught, withstr
andsummary
, but we can keep it separate from piping by usingtbl_df
's nice printing. Maybe start thedplyr
lesson with conversion totbl_df
. It's conceptually easy and would only take 30 seconds at the beginning of the lesson and would avoid headaches further along.Lesson 6 -
tidyr
Something was missing from the
gather
part of lesson 6. I was trying to move quickly and so gave a pretty quick explanation and worked one example before giving the students an exercise and moving on. I don't think the motivation was clear, and a lot of students had trouble with the various arguments (key
andvalue
especially) togather
. A second example, perhaps bigger and more realistic would be useful. Separately from this lesson a student asked about working with three-dimensional arrays in R, he had subject-by-time-by-electrode data... tidying a dataset like that could be cool. Making a stronger connection between tidy data and ggplot might help motivate. E.g. If you wanted to plot this wide data and map the various conditions to color, how would you do it inggplot
? You can't easily, but withgather
you can convert it to the form thatggplot
(andlm
and more) expect.Lesson 9 - statistical modeling
The social scientists were hungry for this, as we rely heavily on statistical models. The content that is there worked well, and I love the connection with ggplot. Introducing a few more functions (t-test, anova) might be useful and low cost.
Lesson 7 - writing functions
I rushed through this. Students were able to write their own (F_to_C) function and source their
code/functions.R
files, so they actually got quite a bit rather quickly. Some saw the payoff in terms of organization, but we need a better motivating function after the temperature conversion examples. Something that makes learners say "oh yeah, I do that over and over, it would be great to write one function and just be able to call that."Lesson 8 - dynamic documents
This lesson needs some improvement. Making our own custom .Rmd template will help; that way we can introduce students gently (the first code chunk in the default template is probably overwhelming!). I'd start with basics of markdown and later introduce code chunks and then code chunk options.
Part of the problem is that this forces a break from the model of the rest of the workshop, especially if the instructor has been piping a live-script to learners' browsers. Not sure what to do with that, but again the custom template might help by getting instructor and learners doing the same things in the same place.
What's missing?
paste
,gsub
,grep
, etc. would be useful to many. I haven't usedstringr
, but I understand it uses a more consistent syntax than the base functions so would likely provide a gentler introduction.lapply
. Not sure this belongs in the first two days, but it enables automated read/write and so much more.The text was updated successfully, but these errors were encountered: