You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a suggestion from #29 to incorporate more about the General Refine Expression Language (GREL) in the lesson.
Several specific suggestions have been made in response, but no concensus has been reached on which specific suggestions should be implemented.
The Library OpenRefine uses a lot of GREL expressions, reformatting dates, reformatting names. People say that’s where they tend to spend a lot of their time.
More GREL: yes! GREL is of course the way to transform the data. I wonder if GREL should be introduced with simpler examples than .replace on strings, like incrementing numbers (value + 1) or combining strings to create URLs ("https://example.org/" + value).
What kind of GREL expressions should be added to the lesson?
Assuming the current dataset, the cells with lists in look good for some GREL examples. So for instance the "items_owned" column can be manipulated using GREL to give a count of the most common items that are owned (mobile phones and radios just ahead of ploughs).
The current format of those lists makes the GREL slightly complicated to get a clean list and done correctly I think a series of steps that goes through the process of 'cleaning' this column could be provide a really good set of learning materials - one of the great things about OpenRefine is that ability to get real time feedback on changes as you work with the data.
OTOH if a more accessible example is needed the data set could be updated to simplify those lists to be just semi-colon separated which would make the process much simpler.
Another GREL example that would work with the current dataset would be the formatting of the "interview_date" column which is currently in dd-MMM-yyyy (vs the start and end columns which use ISO-8601). So something like: value.toDate("dd-MMM-yy").toString("yyyy-MM-dd")
could provide a good example.
And give an opportunity to more generally talk about Date manipulation in OR (I would have guessed that date issues might come up commonly in social science datasets - but I may be wrong as not my area)
One comment from teaching this recently with the list of items column - the lesson uses GREL to facet by subsets of the column but doesn't demonstrate how to change that column to something more usable (such as dummy variables for each category of item once they're cleaned). As a bonus, parsing it to columns also highlights for learners the difference between cell transforms, multi-valued cell splits, and column splits.
All of that said, adding more GREL is also tricky when learners don't have programming experience because chaining functions can rapidly become confusing to novice coders.
As to your comment, @ndporter: the idea of using OpenRefine to create dummy variables from the items column had not yet crossed my mind. I like it. After trying and going through the manual and StackOverflow for a little bit, I think it is doable, but not in this workshop. It requires exporting the ID and items columns, doing the transformation in a new project and then importing the new columns (crossing them one by one, potentially) into the project. That is madness. Perhaps there are easier ways using column splitting, but I guess the current exercise of splitting to count is good enough. I'm open to other suggestions for introducing more GREL.
The text was updated successfully, but these errors were encountered:
How could the content be improved?
This is a suggestion from #29 to incorporate more about the General Refine Expression Language (GREL) in the lesson.
Several specific suggestions have been made in response, but no concensus has been reached on which specific suggestions should be implemented.
Discussion from #29
The original discussion from the 2018 CAC included:
In my first comment on the issue, I wrote:
@ostephens suggested (copied from a Slack discussion) to my question in #29 (comment):
@ndporter suggested in #29 (comment):
To which I responded in #29 (comment):
The text was updated successfully, but these errors were encountered: