Skip to content

Commit

Permalink
sample_props_small often has fewer than the requested 25 elements.
Browse files Browse the repository at this point in the history
The call to
filter(scientist_work == "Doesn't benefit")
is filtering out any replicates where there are no "Doesn't benefit"s in the small sample. As a result any replicates with p_hat=0 are filtered out and are not displayed.

This issue is caused by using a small sample size and a true proportion close to 0 (p=.2).

I have replaced this filtering code with the following

 group_by(replicate)%>% summarize(p_hat = mean(scientist_work=="Doesn't benefit"))

Fixes OpenIntroStat#107.
  • Loading branch information
mamcisaac committed Oct 31, 2022
1 parent 410c112 commit 2be915a
Showing 1 changed file with 6 additions and 11 deletions.
17 changes: 6 additions & 11 deletions 05a_sampling_distributions/sampling_distributions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,7 @@ samp1 %>%
```{r inline-calc, include=FALSE}
# For use inline below
samp1_p_hat <- samp1 %>%
count(scientist_work) %>%
mutate(p_hat = n /sum(n)) %>%
filter(scientist_work == "Doesn't benefit") %>%
summarize(p_hat = mean(scientist_work=="Doesn't benefit")) %>%
pull(p_hat) %>%
round(2)
```
Expand All @@ -138,15 +136,14 @@ Not surprisingly, every time you take another random sample, you might get a dif
It's useful to get a sense of just how much variability you should expect when estimating the population mean this way.
The distribution of sample proportions, called the *sampling distribution (of the proportion)*, can help you understand this variability.
In this lab, because you have access to the population, you can build up the sampling distribution for the sample proportion by repeating the above steps many times.
Here, we use R to take 15,000 different samples of size 50 from the population, calculate the proportion of responses in each sample, filter for only the *Doesn't benefit* responses, and store each result in a vector called `sample_props50`.
Here, we use R to take 15,000 different samples of size 50 from the population, calculate the proportion of responses in each sample, count the *Doesn't benefit* responses, and store each result in a vector called `sample_props50`.
Note that we specify that `replace = TRUE` since sampling distributions are constructed by sampling with replacement.

```{r iterate}
sample_props50 <- global_monitor %>%
rep_sample_n(size = 50, reps = 15000, replace = TRUE) %>%
count(scientist_work) %>%
mutate(p_hat = n /sum(n)) %>%
filter(scientist_work == "Doesn't benefit")
group_by(replicate)%>%
summarize(n = sum(scientist_work=="Doesn't benefit"), p_hat = mean(scientist_work=="Doesn't benefit"))
```

And we can visualize the distribution of these proportions with a histogram.
Expand Down Expand Up @@ -179,9 +176,7 @@ We would have to manually run the following code 15,000 times
```{r sample-code}
global_monitor %>%
sample_n(size = 50, replace = TRUE) %>%
count(scientist_work) %>%
mutate(p_hat = n /sum(n)) %>%
filter(scientist_work == "Doesn't benefit")
summarize(n = sum(scientist_work=="Doesn't benefit"), p_hat = mean(scientist_work=="Doesn't benefit"))
```

as well as store the resulting sample proportions each time in a separate vector.
Expand Down Expand Up @@ -326,4 +321,4 @@ You are welcome to use the app for exploration.

------------------------------------------------------------------------

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">![Creative Commons License](https://i.creativecommons.org/l/by-sa/4.0/88x31.png){style="border-width:0"}</a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">![Creative Commons License](https://i.creativecommons.org/l/by-sa/4.0/88x31.png){style="border-width:0"}</a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.

0 comments on commit 2be915a

Please sign in to comment.