Skip to content

Commit af79a5e

Browse files
Fixing typos, including some broken cross-references (#486)
* Fixing typos, including some broken cross-references * Update inference-many-means.qmd --------- Co-authored-by: Mine Cetinkaya-Rundel <[email protected]>
1 parent 9f0cb5f commit af79a5e

15 files changed

+33
-33
lines changed

foundations-errors.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -383,7 +383,7 @@ text(2.08, 0.21, "5%", cex = 1.2)
383383
```
384384

385385
First, suppose the sample difference was larger than 0.
386-
In a one-sided test, we would set $H_A:$ difference $> 0.$ If the observed difference falls in the upper 5% of the distribution, we would reject $H_0$ since the p-value would just be a the single tail.
386+
In a one-sided test, we would set $H_A:$ difference $> 0.$ If the observed difference falls in the upper 5% of the distribution, we would reject $H_0$ since the p-value would just be the single tail.
387387
Thus, if $H_0$ is true, we incorrectly reject $H_0$ about 5% of the time when the sample mean is above the null value, as shown above.
388388

389389
Then, suppose the sample difference was smaller than 0.

foundations-mathematical.qmd

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -546,7 +546,7 @@ openintro::normTail(m = 0, s = 1, L = 0.43)
546546
We can also find the Z score associated with a percentile.
547547
For example, to identify Z for the $80^{th}$ percentile, we use `qnorm()` which identifies the **quantile** for a given percentage.
548548
The quantile represents the cutoff value.
549-
(To remember the function `qnorm()` as providing a cutozff, notice that both `qnorm()` and "cutoff" start with the sound "kuh".
549+
(To remember the function `qnorm()` as providing a cutoff, notice that both `qnorm()` and "cutoff" start with the sound "kuh".
550550
To remember the `pnorm()` function as providing a probability from a given cutoff, notice that both `pnorm()` and probability start with the sound "puh".) We determine the Z score for the $80^{th}$ percentile using `qnorm()`: 0.84.
551551

552552
```{r}
@@ -1058,7 +1058,7 @@ When the sample size is sufficiently large, the normal approximation generally p
10581058

10591059
### Observed data
10601060

1061-
In Section @sec-caseStudyOpportunityCost we were introduced to the opportunity cost study, which found that students became thriftier when they were reminded that not spending money now means the money can be spent on other things in the future.
1061+
In @sec-caseStudyOpportunityCost we were introduced to the opportunity cost study, which found that students became thriftier when they were reminded that not spending money now means the money can be spent on other things in the future.
10621062
Let's re-analyze the data in the context of the normal distribution and compare the results.
10631063

10641064
::: {.data data-latex=""}
@@ -1144,8 +1144,8 @@ Next, let's turn our attention to the medical consultant case study.
11441144

11451145
### Observed data
11461146

1147-
In Section @sec-case-study-med-consult we learned about a medical consultant who reported that only 3 of their 62 clients who underwent a liver transplant had complications, which is less than the more common complication rate of 0.10.
1148-
In that work, we did not model a null scenario, but we will discuss a simulation method for a one proportion null distribution in Section sec-one-prop-null-boot, such a distribution is provided in @fig-MedConsNullSim-w-normal.
1147+
In @sec-case-study-med-consult we learned about a medical consultant who reported that only 3 of their 62 clients who underwent a liver transplant had complications, which is less than the more common complication rate of 0.10.
1148+
In that work, we did not model a null scenario, but we will discuss a simulation method for a one proportion null distribution in @sec-one-prop-null-boot, such a distribution is provided in @fig-MedConsNullSim-w-normal.
11491149
We have added the best-fitting normal curve to the figure, which has a mean of 0.10.
11501150
Borrowing a formula that we'll encounter in [Chapter -@sec-inference-one-prop], the standard error of this distribution was also computed: $SE = 0.038.$
11511151

foundations-randomization.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ You may agree that there is almost always variability in data -- one dataset wil
2828
However, quantifying the variability in the data is neither obvious nor easy to do, i.e., answering the question "*how* different is one dataset from another?" is not trivial.
2929

3030
First, a note on notation.
31-
We generally use $p$ to denote a population proportion and $\hat{p}$ to a sample proportion.
31+
We generally use $p$ to denote a population proportion and $\hat{p}$ to denote a sample proportion.
3232
Similarly, we generally use $\mu$ to denote a population mean and $\bar{x}$ to denote a sample mean.
3333

3434
::: {.workedexample data-latex=""}

inf-model-applications.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -293,7 +293,7 @@ Interpret the interval in context.[^27-inf-model-applications-8]
293293
:::
294294

295295
[^27-inf-model-applications-8]: Because there were 1,000 bootstrap resamples, we look for the cutoffs which provide 50 bootstrap slopes on the left, 900 in the middle, and 50 on the right.
296-
Looking at the bootstrap histogram, the rough 95% confidence interval is \$9 to \$13.10.
296+
Looking at the bootstrap histogram, the rough 90% confidence interval is \$9 to \$13.10.
297297
For games that are new, the average price is higher by between \$9.00 and \$13.10 than games that are used, with 90% confidence.
298298

299299
### Cross-validation

inf-model-logistic.qmd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ ggplot(spam_pred, aes(x = .pred_1, y = spam)) +
124124
```
125125

126126
We'd like to assess the quality of the model.
127-
For example, we might ask: if we look at emails that we modeled as having 10% chance of being spam, do we find out 10% of the actually are spam?
127+
For example, we might ask: if we look at emails that we modeled as having 10% chance of being spam, do we find out 10% of them actually are spam?
128128
We can check this for groups of the data by constructing a plot as follows:
129129

130130
1. Bucket the observations into groups based on their predicted probabilities.
@@ -320,7 +320,7 @@ Using the example above and focusing on each of the variable p-values (here we w
320320
- $H_0: \beta_1 = 0$ given `cc`, `dollar`, and `urgent_subj` are included in the model
321321
- $H_0: \beta_2 = 0$ given `to_multiple`, `dollar`, and `urgent_subj` are included in the model
322322
- $H_0: \beta_3 = 0$ given `to_multiple`, `cc`, and `urgent_subj` are included in the model
323-
- $H_0: \beta_4 = 0$ given `to_multiple`, `dollar`, and `dollar` are included in the model
323+
- $H_0: \beta_4 = 0$ given `to_multiple`, `cc`, and `dollar` are included in the model
324324

325325
The very low p-values from the software output tell us that three of the variables (that is, not `cc`) act as statistically discernible predictors in the model at the discernibility level of 0.05, despite the inclusion of any of the other variables.
326326
Consider the p-value on $H_0: \beta_1 = 0$.
@@ -346,7 +346,7 @@ A full treatment of cross-validation and logistic regression models is beyond th
346346
Using $k$-fold cross-validation, we can build $k$ different models which are used to predict the observations in each of the $k$ holdout samples.
347347
As with linear regression (see @sec-inf-mult-reg-cv), we compare a smaller logistic regression model to a larger logistic regression model.
348348
The smaller model uses only the `to_multiple` variable, see the complete dataset (not cross-validated) model output in @tbl-emaillogmodel1.
349-
The logistic regression model can be written as, where $\hat{p}$ is the estimated probability of being a spam email message:
349+
The logistic regression model can be written as follows, where $\hat{p}$ is the estimated probability of being a spam email message.
350350

351351
```{r}
352352
#| include: false

inf-model-mlr.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ What is the difference in total amount?
260260

261261
------------------------------------------------------------------------
262262

263-
Two samples of coins with the same number of low coins (3), but a different number of total coins (4 vs 5) and a different total total amount (\$0.41 vs \$0.66).
263+
Two samples of coins with the same number of low coins (3), but a different number of total coins (4 vs 5) and a different total amount (\$0.41 vs \$0.66).
264264

265265
```{r}
266266
#| label: lowsame
@@ -283,7 +283,7 @@ What is the difference in total amount?
283283

284284
------------------------------------------------------------------------
285285

286-
Two samples of coins with the same total number of coins (4), but a different number of low coins (3 vs 4) and a different total total amount (\$0.41 vs \$0.17).
286+
Two samples of coins with the same total number of coins (4), but a different number of low coins (3 vs 4) and a different total amount (\$0.41 vs \$0.17).
287287

288288
```{r}
289289
#| label: totalsame

inf-model-slr.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,7 @@ ggplot(sandwich3, aes(x = ad, y = rev)) +
149149

150150
\vspace{-5mm}
151151

152-
@fig-sand-samp12 shows the two samples and the least squares regressions from fig-sand-samp on the same plot.
152+
@fig-sand-samp12 shows the two samples and the least squares regressions from @fig-sand-samp on the same plot.
153153
We can see that the two lines are different.
154154
That is, there is **variability** in the regression line from sample to sample.
155155
The concept of the sampling variability is something you've seen before, but in this lesson, you will focus on the variability of the line often measured through the variability of a single statistic: **the slope of the line**.
@@ -723,7 +723,7 @@ In America's two-party system (the vast majority of House members through histor
723723
In 2020 there were 232 Democrats, 198 Republicans, and 1 Libertarian in the House.
724724

725725
To assess the validity of the claim related to unemployment and voting patterns, we can compile historical data and look for a connection.
726-
We consider every midterm election from 1898 to 2018, with the exception of those elections during the Great Depression.
726+
We consider every midterm election from 1898 to 2018, with the exception of the elections during the Great Depression.
727727
The House of Representatives is made up of 435 voting members.
728728

729729
::: {.data data-latex=""}

inference-applications.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ tsim_table |>
175175
- One-sample or differences from paired data: the observations (or differences) must be independent and nearly normal. For larger sample sizes, we can relax the nearly normal requirement, e.g., slight skew is okay for sample sizes of 15, moderate skew for sample sizes of 30, and strong skew for sample sizes of 60.
176176
- For a difference of means when the data are not paired: each sample mean must separately satisfy the one-sample conditions for the $t$-distribution, and the data in the groups must also be independent.
177177

178-
- Compute the point estimate of interest, the standard error, and the degrees of freedom For $df,$ use $n-1$ for one sample, and for two samples use either statistical software or the smaller of $n_1 - 1$ and $n_2 - 1.$
178+
- Compute the point estimate of interest, the standard error, and the degrees of freedom. For $df,$ use $n-1$ for one sample, and for two samples use either statistical software or the smaller of $n_1 - 1$ and $n_2 - 1.$
179179

180180
- Compute the T score and p-value.
181181

@@ -307,7 +307,7 @@ Remember that there are a total of 44 subjects in the study (22 English and 22 S
307307
There are two rows in the dataset for each of the subjects: one representing data from when they were shown an image with 4 items on it and the other with 16 items on it.
308308
Each subject was asked 10 questions for each type of image (with a different layout of items on the image for each question).
309309
The variable of interest to us is `redundant_perc`, which gives the percentage of questions the subject used a redundant adjective to identify "the blue triangle".
310-
Note that the variable in "percentage", and we are interested in the average percentage.
310+
Note that the variable is "percentage", and we are interested in the average percentage.
311311
Therefore, we will use methods for means.
312312
If the variable had been "success or failure" (e.g., "used redundant or didn't"), we would have used methods for proportions.
313313

inference-one-mean.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -679,7 +679,7 @@ pt(-2.10, df = 18)
679679
\vspace{-5mm}
680680

681681
::: {.workedexample data-latex=""}
682-
What proportion of the𝑡-distribution with 20 degrees of freedom falls above 1.65?
682+
What proportion of the 𝑡-distribution with 20 degrees of freedom falls above 1.65?
683683

684684
------------------------------------------------------------------------
685685

inference-one-prop.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ The proportions that are equal to or less than $\hat{p} = 0.0484$ are shaded.
127127
The shaded areas represent sample proportions under the null distribution that provide at least as much evidence as $\hat{p}$ favoring the alternative hypothesis.
128128
There were `r medical_consultant_n_sim` simulated sample proportions with $\hat{p}_{sim} \leq 0.0484.$ We use these to construct the null distribution's left-tail area and find the p-value:
129129

130-
$$\text{left tail area} = \frac{\text{Number of observed simulations with }\hat{p}_{sim} \leq \text{ 00.0484}}{10000}$$
130+
$$\text{left tail area} = \frac{\text{Number of observed simulations with }\hat{p}_{sim} \leq \text{ 0.0484}}{10000}$$
131131

132132
Of the 10,000 simulated $\hat{p}_{sim},$ `r medical_consultant_n_sim` were equal to or smaller than $\hat{p}.$ Since the hypothesis test is one-sided, the estimated p-value is equal to this tail area: `r medical_consultant_p_val`.
133133

@@ -554,7 +554,7 @@ The single tail area which represents the p-value is 0.2776.
554554
Because the p-value is larger than 0.05, we do not reject $H_0.$ The poll does not provide convincing evidence that a majority of payday loan borrowers support regulations around credit checks and evaluation of debt payments.
555555

556556
In @sec-two-prop-errors we discuss two-sided hypothesis tests of which the payday example may have been better structured.
557-
That is, we might have wanted to ask whether the borrows **support or oppose** the regulations (to study opinion in either direction away from the 50% benchmark).
557+
That is, we might have wanted to ask whether the borrowers **support or oppose** the regulations (to study opinion in either direction away from the 50% benchmark).
558558
In that case, the p-value would have been doubled to 0.5552 (again, we would not reject $H_0).$ In the two-sided hypothesis setting, the appropriate conclusion would be to claim that the poll does not provide convincing evidence that a majority of payday loan borrowers support or oppose regulations around credit checks and evaluation of debt payments.
559559

560560
In both the one-sided or two-sided setting, the conclusion is somewhat unsatisfactory because there is no conclusion.

0 commit comments

Comments
 (0)