Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format exercises, add descriptive titles #159

Merged
merged 2 commits into from
Jun 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 23 additions & 13 deletions episodes/02-working-with-openrefine.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,10 @@ OpenRefine interface.

Here we will use faceting to look for potential errors in data entry in the `village` column.

::::::::::::::::::::::::: challenge

### Finding (potential) errors

1. Scroll over to the `village` column.
2. Click the down arrow and choose `Facet` > `Text facet`.
3. In the left panel, you'll now see a box containing every unique value in the
Expand All @@ -131,7 +135,7 @@ Here we will use faceting to look for potential errors in data entry in the `vil

::::::::::::::: solution

## Solution
### Solution

- `Chirdozo` is likely a mis-entry of `Chirodzo`.
- `Ruca` is likely a mis-entry of `Ruaca`.
Expand All @@ -141,13 +145,15 @@ Here we will use faceting to look for potential errors in data entry in the `vil
mistyped entries in a later exercise.
- The entry `49` is almost certainly an error but you will not be able to fix
it by reference to other data.


:::::::::::::::::::::::::

:::::::::::::::::::::::::::::::::::


::::::::::::::::::::::::::::::::::::::: challenge

## Exercise
### Interview collection dates

1. Using faceting, find out how many different `interview_date` values there
are in the survey results.
Expand All @@ -162,7 +168,7 @@ Here we will use faceting to look for potential errors in data entry in the `vil

::::::::::::::: solution

## Solution
### Solution

For the column `interview_date` do `Facet` > `Text facet`. A box will
appear in the left panel showing that there are 19 unique entries in
Expand Down Expand Up @@ -284,14 +290,14 @@ and the quotes.

::::::::::::::::::::::::::::::::::::::: challenge

## Exercise
### Remove unwanted characters

Use this same strategy to remove the single quote marks (`'`), the
right square brackets (`]`), and spaces from the `items_owned` column.

::::::::::::::: solution

## Solution
### Solution

1. `value.replace("'", "")`
2. `value.replace("]", "")`
Expand All @@ -315,14 +321,14 @@ You should now see a new text facet box in the left-hand pane.

::::::::::::::::::::::::::::::::::::::: challenge

## Exercise
### Commonly owned items

Which two items are the most commonly owned? Which are the two
least commonly owned?

::::::::::::::: solution

## Solution
### Solution

Select `Sort by:` `count`. The most commonly owned items are
mobile phone and radio, the least commonly owned are cars and computers.
Expand All @@ -334,15 +340,15 @@ mobile phone and radio, the least commonly owned are cars and computers.

::::::::::::::::::::::::::::::::::::::: challenge

## Exercise
### Month(s) with farmers lacking food

Perform the same clean up steps and customized text faceting for
the `months_lack_food` column. Which month(s) were farmers
more likely to lack food?

::::::::::::::: solution

## Solution
### Solution

All four cleaning steps can be performed by combining `.replace`
statements. The command is:
Expand All @@ -357,7 +363,7 @@ November was the most common month for respondents to lack food.

::::::::::::::::::::::::::::::::::::::: challenge

## Exercise
### Clean up other columns

Perform the same clean up steps for the `months_no_water`, `liv_owned`,
`res_change`, and `no_food_mitigation` columns.
Expand All @@ -376,7 +382,7 @@ provides `Undo` and `Redo` operations to make this easy.

::::::::::::::::::::::::::::::::::::::: challenge

## Exercise
### Explore undo and redo

1. Click where it says `Undo / Redo` on the left side of the screen. All the
changes you have made so far are listed here.
Expand All @@ -388,7 +394,6 @@ provides `Undo` and `Redo` operations to make this easy.
Before moving on to the next lesson, redo all the steps in your analysis
so that all of the columns you modified are lacking in square brackets,
spaces, and single quotes.


::::::::::::::::::::::::::::::::::::::::::::::::::

Expand All @@ -409,13 +414,18 @@ This is then applied to the data in all columns.
OpenRefine also provides a menu option to remove blank
characters from the beginning and end of any entries in the column that you choose.

::::::::::::::::::::::::: challenge

### Remove a trailing space

1. Edit the `village` on the first row to introduce a space at the end, set to `God `.
2. Create a new text facet for the `village` column. You should now see two
different entries for `God`, one of which has a trailing whitespace.
3. To remove the whitespace, choose `Edit cells` > `Common transforms` >
`Trim leading and trailing whitespace`.
4. You should now see only four choices in your text facet again.

:::::::::::::::::::::::::::::::::::


:::::::::::::::::::::::::::::::::::::::: keypoints
Expand Down
34 changes: 15 additions & 19 deletions episodes/03-filter-sort.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,23 +25,23 @@ There are many entries in our data table. We can filter it to work on a subset
of the data in the list for the next set of operations. Please ensure you
perform this step to save time during the class.

::::::::::::::::::::::::::::::::::::::: challenge

### Using a Text Filter

1. Click the down arrow next to `respondent_roof_type` > `Text filter`. A
`respondent_roof_type` facet will appear on the left margin.
2. Type in `mabat` and press return. There are 58 matching rows of the original
131 rows (and these rows are selected for the subsequent steps).
3. At the top, change the view to `Show` 50 `rows`. This way you will see most
of the matching rows.

::::::::::::::::::::::::::::::::::::::: challenge

## Exercise

1. What roof types are selected by this procedure?
2. How would you restrict this to only one of the roof types?
4. Answer these questions:
1. What roof types are selected by this procedure?
2. How would you restrict this to only one of the roof types?

::::::::::::::: solution

## Solution
### Solution

1. Do `Facet` > `Text facet` on the `respondent_roof_type` column after
filtering. This will show that two names match your filter criteria.
Expand All @@ -53,7 +53,7 @@ perform this step to save time during the class.

::::::::::::::::::::::::::::::::::::::::::::::::::

### Excluding entries
## Excluding entries

In addition to the simple text filtering we used above, another way to narrow
our filter is to `include` and/or `exclude` entries in a facet. You will see
Expand All @@ -71,13 +71,13 @@ analysis.

::::::::::::::::::::::::::::::::::::::: challenge

## Exercise
### Including and excluding rows using a facet

Use `include / exclude` to select only entries from one of these two roof types.

::::::::::::::: solution

## Solution
### Solution

1. In the facet (left margin), click on one of the names, such as
`mabatisloping`. Notice that when you click on the name, or hover over
Expand Down Expand Up @@ -108,14 +108,14 @@ sorting.

::::::::::::::::::::::::::::::::::::::: challenge

## Exercise
### Finding GPS Altitude outliers

Sort the data by `gps_Altitude`. Do you think the first few entries may have
incorrect altitudes?

::::::::::::::: solution

## Solution
### Solution

In the `gps_Altitude` column, select `Sort...` > `numbers` and select
`smallest first`. The first few values are all 0. The altitudes are more
Expand All @@ -124,8 +124,6 @@ the gps information added automatically by the app. The lack of an altitude
value suggests that the smartphone was unable to provide it and it
defaulted to 0.



:::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::
Expand All @@ -150,7 +148,7 @@ only column sorted, then data reverts to its original order.

::::::::::::::::::::::::::::::::::::::: challenge

## Exercise
### Finding village "49"

We discovered in an earlier lesson that the value for one of the `village`
entries was given as 49. This is clearly wrong. By looking at the GPS
Expand All @@ -170,15 +168,13 @@ the data in that column was collected from?

::::::::::::::: solution

## Solution
### Solution

The interview data for that row is in a small cluster of Chirodzo
interviews when sorting by GPS coordinates. When sorting by interview date,
it is also with Chirodzo interviews. In fact, only Chirodzo had interviews
conducted on that date.



:::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::
Expand Down
9 changes: 3 additions & 6 deletions episodes/04-numbers.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,24 +39,22 @@ right-justified, and black to green in color.

::::::::::::::::::::::::::::::::::::::: challenge

## Exercise
### Transforming column contents to numbers

Transform three more columns, `no_membrs`, `years_liv`, and
`buildings_in_compound`, from text to numbers. Can all columns be transformed
to numbers? - Try it with `village` for example.

::::::::::::::: solution

## Solution
### Solution

Only observations that include only numerals (0-9) can be transformed to
numbers. If you apply a number transformation to a column that doesn't meet
this criteria, and then click the `Undo / Redo` tab, you will see a step
that starts with `Text transform on 0 cells`. This means that the data in
that column was not transformed.



:::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::
Expand All @@ -69,7 +67,7 @@ them. We can do that with a `Numeric facet`.

::::::::::::::::::::::::::::::::::::::: challenge

## Exercise
### Using a numeric facet

1. For a column you transformed to numbers, edit one or two cells, replacing
the numbers with text (such as `abc`) or blank (no number or text). You
Expand All @@ -82,7 +80,6 @@ them. We can do that with a `Numeric facet`.
`Non-numeric` and `Blank` if you changed some values.
4. Experiment with checking or unchecking these boxes to select subsets of
your data.


::::::::::::::::::::::::::::::::::::::::::::::::::

Expand Down
13 changes: 11 additions & 2 deletions episodes/05-scripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ files had the same column names, you could save the JSON script, open a new
file to clean in OpenRefine, paste in the script and run it. This gives you a
quick way to clean all of your related data.

## Saving your work as a script
::::::::::::::::::::::::::::::::::::::: challenge

### Saving your work as a script

1. In the `Undo / Redo` section, click `Extract...`, and select the steps that
you want to apply to other datasets by clicking the check boxes.
Expand All @@ -42,7 +44,12 @@ quick way to clean all of your related data.
text file. In TextEdit, do this by selecting `Format` > `Make plain text`
and save the file as a `.txt` file.

## Importing a script to use against another dataset
::::::::::::::::::::::::::::::::::::::::::::::::::


::::::::::::::::::::::::::::::::::::::: challenge

### Importing a script to use against another dataset

Let's practice running these steps on a new dataset. We'll test this on an
uncleaned version of the dataset we've been working with.
Expand All @@ -54,6 +61,8 @@ uncleaned version of the dataset we've been working with.
3. Click `Perform operations`. The dataset should now be the same as your other
cleaned dataset.

::::::::::::::::::::::::::::::::::::::::::::::::::

For convenience, we used the same dataset. In reality you could use this
process to clean related datasets. For example, data that you had collected
over different fieldwork periods or data that was collected by different
Expand Down
6 changes: 5 additions & 1 deletion episodes/06-saving.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ By default OpenRefine is saving your project continuously. If you close
OpenRefine and open it up again, you'll see a list of your projects. You can
click on any one of them to open it up again.

### Exporting
::::::::::::::::::::::::: challenge

### Exporting the project

You can also export a project. This is helpful, for instance, if you wanted to
send your raw data and cleaning steps to a collaborator, or share this
Expand Down Expand Up @@ -65,6 +67,8 @@ You should see:

:::::::::::::::::::::::::

:::::::::::::::::::::::::::::::::::

You can import an existing project into OpenRefine by clicking `Open...` in the
upper right > `Import Project` and selecting the `tar.gz` project file. This
project will include all of the raw data and cleaning steps that were part of
Expand Down
4 changes: 1 addition & 3 deletions episodes/07-resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,15 +48,13 @@ your journey.

::::::::::::::::::::::::::::::::::::::: challenge

## Exercise
### Discuss a resource

Visit one of these sites and share what you find with another person.


::::::::::::::::::::::::::::::::::::::::::::::::::



:::::::::::::::::::::::::::::::::::::::: keypoints

- Other examples and resources online are good for learning more about OpenRefine.
Expand Down