Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atualizações do livro #110

Open
wants to merge 41 commits into
base: traducao-pt-2ed
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
5286060
Fix typo, closes #1601
mine-cetinkaya-rundel Nov 20, 2023
85d2209
Fix typo in numbers.qmd (#1602)
pabloedug Nov 25, 2023
bfa59fa
Make variable names consistent in data-transform.qmd (#1604)
granieri Nov 30, 2023
b0aa400
Update quarto.qmd (#1621)
e-linc Jan 19, 2024
d685f8f
Fix typo (#1624)
ynsec37 Jan 28, 2024
6f512d1
Update Quarto set up action
mine-cetinkaya-rundel Jan 28, 2024
e89d083
Update workflow-scripts.qmd
mine-cetinkaya-rundel Jan 28, 2024
aa5eaac
logicals.qmd: use better variable names in example (#1623)
florisvdh Jan 29, 2024
e56ccb6
logicals.qmd: use if_else() in exercises instead of ifelse() (#1622)
florisvdh Jan 29, 2024
716cc5b
Add both versions of insert anything shortcut, closes #1607
mine-cetinkaya-rundel Jan 29, 2024
f40e78a
Chapter 13 (numbers): minor updates (#1627)
florisvdh Feb 2, 2024
33e7012
small typo fix (#1635)
kew24 Feb 26, 2024
f55997b
Some fixes for chapters regexps & factors (#1636)
florisvdh Mar 2, 2024
bdd847c
17.2: clarify language (#1637)
stevenprimeaux Mar 13, 2024
f312f48
Fix typo
mine-cetinkaya-rundel Apr 5, 2024
b88683e
Update iteration.qmd (#1646)
davidkane9 Apr 17, 2024
eedfa73
Update databases.qmd (#1648)
davidkane9 Apr 19, 2024
0e6ee25
Update databases.qmd (#1650)
davidkane9 Apr 20, 2024
e42ee44
probably a careless mistake (#1652)
mitsuoxv May 4, 2024
da70663
Fix/data transform.qmd (#1654)
mitsuoxv May 13, 2024
f46f3d9
Add "of" between numbers and rows (#1649)
daniel-stafford May 13, 2024
f703a6c
Fix/data-tidy.qmd small typos (#1655)
mitsuoxv May 17, 2024
fe302ac
Suggest/layers.qmd shape descriptions in fig-alt, etc. (#1660)
mitsuoxv May 28, 2024
43ee557
Fixing some minor errors (#1657)
davidrsch May 28, 2024
c70b13b
Suggest/data-import.qmd (#1659)
mitsuoxv May 28, 2024
87fb6ee
Fix typo, closes #1644
mine-cetinkaya-rundel May 31, 2024
5bfcc87
Fix typo in b calculation, closes #1638
mine-cetinkaya-rundel May 31, 2024
95f1cb1
Fix/communication.qmd, mainly fig-alt corrections (#1663)
mitsuoxv Jun 1, 2024
24b38c6
Update iteration.qmd (#1647)
davidkane9 Jun 1, 2024
bc998be
Fix typo, closes #1643
mine-cetinkaya-rundel Jun 1, 2024
caf872c
Fix/logicals.qmd and transform.qmd; correction of fig-alt, and typos …
mitsuoxv Jun 1, 2024
01afcfb
Undo wrong edit
mine-cetinkaya-rundel Jun 1, 2024
6e07796
probably typos in fig-alt (#1667)
mitsuoxv Jun 4, 2024
06f8d5c
Edit typo in EDA chapter, summaries -> summarizes (#1676)
LeoLuongVuong Jul 13, 2024
643ab1b
Edit a typo in the logicals chapter (#1677)
LeoLuongVuong Jul 14, 2024
9a9ec24
Fix typo (closes #1681) + various other copy edits
mine-cetinkaya-rundel Sep 2, 2024
3e8bf23
correct ordered factor definition (#1686)
leorjorge Sep 27, 2024
12c6aff
add Portuguese in the list of translations - index.qmd (#1691)
beatrizmilz Nov 17, 2024
f3b95c4
Added missing word joins.qmd (#1696)
ndrscalia Dec 30, 2024
6a1bb7a
Fix some typos (#1701)
kleintom Jan 12, 2025
d6c3daa
Update index.qmd (#1702)
MattTheCuber Jan 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/build_book.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ jobs:
steps:
- uses: actions/checkout@v2

- name: Install Quarto
uses: quarto-dev/quarto-actions/install-quarto@v1
- name: Set up Quarto
uses: quarto-dev/quarto-actions/setup@v2
with:
# To install LaTeX to build PDF book
tinytex: true
Expand Down
4 changes: 2 additions & 2 deletions EDA.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ You can see variation easily in real life; if you measure any continuous variabl
This is true even if you measure quantities that are constant, like the speed of light.
Each of your measurements will include a small amount of error that varies from measurement to measurement.
Variables can also vary if you measure across different subjects (e.g., the eye colors of different people) or at different times (e.g., the energy levels of an electron at different moments).
Every variable has its own pattern of variation, which can reveal interesting information about how that it varies between measurements on the same observation as well as across observations.
Every variable has its own pattern of variation, which can reveal interesting information about how it varies between measurements on the same observation as well as across observations.
The best way to understand that pattern is to visualize the distribution of the variable's values, which you've learned about in @sec-data-visualization.

We'll start our exploration by visualizing the distribution of weights (`carat`) of \~54,000 diamonds from the `diamonds` dataset.
Expand Down Expand Up @@ -597,7 +597,7 @@ ggplot(smaller, aes(x = carat, y = price)) +
```

`cut_width(x, width)`, as used above, divides `x` into bins of width `width`.
By default, boxplots look roughly the same (apart from number of outliers) regardless of how many observations there are, so it's difficult to tell that each boxplot summaries a different number of points.
By default, boxplots look roughly the same (apart from number of outliers) regardless of how many observations there are, so it's difficult to tell that each boxplot summarizes a different number of points.
One way to show that is to make the width of the boxplot proportional to the number of points with `varwidth = TRUE`.

#### Exercises
Expand Down
27 changes: 14 additions & 13 deletions communication.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -185,9 +185,10 @@ This useful package will automatically adjust labels so that they don't overlap:

```{r}
#| fig-alt: |
#| Scatterplot of highway fuel efficiency versus engine size of cars, where
#| points are colored according to the car class. Some points are labelled
#| with the car's name. The labels are box with white, transparent background
#| Scatterplot of highway mileage versus engine size where points are colored
#| by drive type. Smooth curves for each drive type are overlaid.
#| Text labels identify the curves as front-wheel, rear-wheel, and 4-wheel.
#| The labels are box with white background
#| and positioned to not overlap.

ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
Expand Down Expand Up @@ -364,7 +365,7 @@ ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
You can use `labels` in the same way (a character vector the same length as `breaks`), but you can also set it to `NULL` to suppress the labels altogether.
This can be useful for maps, or for publishing plots where you can't share the absolute numbers.
You can also use `breaks` and `labels` to control the appearance of legends.
For discrete scales for categorical variables, `labels` can be a named list of the existing levels names and the desired labels for them.
For discrete scales for categorical variables, `labels` can be a named list of the existing level names and the desired labels for them.

```{r}
#| fig-alt: |
Expand All @@ -390,7 +391,7 @@ Note that `breaks` is in the original scale of the data.
#| fig-alt: |
#| Two side-by-side box plots of price versus cut of diamonds. The outliers
#| are transparent. On both plots the x-axis labels are formatted as dollars.
#| The x-axis labels on the plot start at $0 and go to $15,000, increasing
#| The x-axis labels on the left plot start at $0 and go to $15,000, increasing
#| by $5,000. The x-axis labels on the right plot start at $1K and go to
#| $19K, increasing by $6K.

Expand Down Expand Up @@ -461,7 +462,7 @@ The theme setting `legend.position` controls where the legend is drawn:
#| fig-alt: |
#| Four scatterplots of highway fuel efficiency versus engine size of cars
#| where points are colored based on class of car. Clockwise, the legend
#| is placed on the right, left, top, and bottom of the plot.
#| is placed on the right, left, bottom, and top of the plot.

base <- ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class))
Expand Down Expand Up @@ -575,7 +576,7 @@ This will also help ensure your plot is interpretable in black and white.

```{r}
#| fig-alt: |
#| Two scatterplots of highway mileage versus engine size where both color
#| Scatterplot of highway mileage versus engine size where both color
#| and shape of points are based on drive type. The color palette is not
#| the default ggplot2 palette.

Expand Down Expand Up @@ -686,8 +687,9 @@ Subsetting the data has affected the x and y scales as well as the smooth curve.
#| fig-width: 4
#| message: false
#| fig-alt: |
#| On the left, scatterplot of highway mileage vs. displacement, with
#| displacement. The smooth curve overlaid shows a decreasing, and then
#| On the left, scatterplot of highway mileage vs. displacement
#| where points are colored by drive type.
#| The smooth curve overlaid shows a decreasing, and then
#| increasing trend, like a hockey stick. On the right, same variables
#| are plotted with displacement ranging only from 5 to 6 and highway
#| mileage ranging only from 10 to 25. The smooth curve overlaid shows a
Expand Down Expand Up @@ -969,10 +971,9 @@ In the following, `|` places the `p1` and `p3` next to each other and `/` moves
#| fig-alt: |
#| Three plots laid out such that first and third plot are next to each other
#| and the second plot stretched beneath them. The first plot is a
#| scatterplot of highway mileage versus engine size, third plot is a
#| scatterplot of highway mileage versus city mileage, and the third plot is
#| side-by-side boxplots of highway mileage versus drive train) placed next
#| to each other.
#| scatterplot of highway mileage versus engine size, the third plot is a
#| scatterplot of highway mileage versus city mileage, and the second plot is
#| side-by-side boxplots of highway mileage versus drive train).

p3 <- ggplot(mpg, aes(x = cty, y = hwy)) +
geom_point() +
Expand Down
6 changes: 3 additions & 3 deletions data-import.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ read_csv("data/students.csv") |>

We can read this file into R using `read_csv()`.
The first argument is the most important: the path to the file.
You can think about the path as the address of the file: the file is called `students.csv` and that it lives in the `data` folder.
You can think about the path as the address of the file: the file is called `students.csv` and it lives in the `data` folder.

```{r}
#| message: true
Expand Down Expand Up @@ -88,7 +88,7 @@ students

In the `favourite.food` column, there are a bunch of food items, and then the character string `N/A`, which should have been a real `NA` that R will recognize as "not available".
This is something we can address using the `na` argument.
By default, `read_csv()` only recognizes empty strings (`""`) in this dataset as `NA`s, we want it to also recognize the character string `"N/A"`.
By default, `read_csv()` only recognizes empty strings (`""`) in this dataset as `NA`s, and we want it to also recognize the character string `"N/A"`.

```{r}
#| message: false
Expand Down Expand Up @@ -131,7 +131,7 @@ students |>
Note that the values in the `meal_plan` variable have stayed the same, but the type of variable denoted underneath the variable name has changed from character (`<chr>`) to factor (`<fct>`).
You'll learn more about factors in @sec-factors.

Before you analyze these data, you'll probably want to fix the `age` and `id` columns.
Before you analyze these data, you'll probably want to fix the `age` column.
Currently, `age` is a character variable because one of the observations is typed out as `five` instead of a numeric `5`.
We discuss the details of fixing this issue in @sec-import-spreadsheets.

Expand Down
4 changes: 2 additions & 2 deletions data-tidy.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,7 @@ household
```

This dataset contains data about five families, with the names and dates of birth of up to two children.
The new challenge in this dataset is that the column names contain the names of two variables (`dob`, `name)` and the values of another (`child,` with values 1 or 2).
The new challenge in this dataset is that the column names contain the names of two variables (`dob`, `name`) and the values of another (`child`, with values 1 or 2).
To solve this problem we again need to supply a vector to `names_to` but this time we use the special `".value"` sentinel; this isn't the name of a variable but a unique value that tells `pivot_longer()` to do something different.
This overrides the usual `values_to` argument to use the first component of the pivoted column name as a variable name in the output.

Expand Down Expand Up @@ -456,7 +456,7 @@ cms_patient_experience |>
Neither of these columns will make particularly great variable names: `measure_cd` doesn't hint at the meaning of the variable and `measure_title` is a long sentence containing spaces.
We'll use `measure_cd` as the source for our new column names for now, but in a real analysis you might want to create your own variable names that are both short and meaningful.

`pivot_wider()` has the opposite interface to `pivot_longer()`: instead of choosing new column names, we need to provide the existing columns that define the values (`values_from`) and the column name (`names_from)`:
`pivot_wider()` has the opposite interface to `pivot_longer()`: instead of choosing new column names, we need to provide the existing columns that define the values (`values_from`) and the column name (`names_from`):

```{r}
cms_patient_experience |>
Expand Down
Loading