Based on Chater 8 of ModernDive. Code for Quiz 12.
Load the R package we will use.
What is the average age of members that have served in congress? - Set random seed generator to 123 - Take a sample of 100 from the dataset congress_age and assign it to congress_age_100
set.seed(123)
congress_age_100 <- congress_age %>%
rep_sample_n(size = 100)
Construct the confidence interval
1. Use specify to indicate the variable from congress_age_100 that you are interested in
congress_age_100 %>%
specify(response = age)
Response: age (numeric)
# A tibble: 100 x 1
age
<dbl>
1 53.1
2 54.9
3 65.3
4 60.1
5 43.8
6 57.9
7 55.3
8 46
9 42.1
10 37
# ... with 90 more rows
2.Generate 1000 replicates of your sample of 100
The output has 100,000 rows
3. Calculate the mean for each replicate
bootstrap_distribution_mean_age <- congress_age_100 %>% specify(response = age) %>% generate(reps = 1000, type = “bootstrap”) %>% calculate(stat = “mean”)
bootstrap_distribution_mean_age
# A tibble: 1,000 x 2
replicate stat
* <int> <dbl>
1 1 51.7
2 2 55.6
3 3 54.2
4 4 52.8
5 5 53.1
6 6 52.7
7 7 53.5
8 8 52.9
9 9 52.4
10 10 52.6
# ... with 990 more rows
4. Visualize the bootstrap distribution
visualize(bootstrap_distribution_mean_age)
Calculate the 95% confidence interval using the percentile method - Assign the output to congress_ci_percentile - Display congress_ci_percentile
congress_ci_percentile <- bootstrap_distribution_mean_age %>%
get_confidence_interval(type = "percentile", level = 0.95)
- Calculate the observed point estimate of the mean and assign it to obs_mean_age - Display obs_mean_age,
obs_mean_age <- congress_age_100 %>%
specify(response = age) %>%
calculate(stat = "mean") %>%
pull()
obs_mean_age
[1] 53.36
visualize(bootstrap_distribution_mean_age) +
shade_confidence_interval(endpoints = congress_ci_percentile) +
geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 1)
pop_mean_age <-congress_age %>%
summarize(pop_mean = mean(age)) %>% pull()
pop_mean_age
[1] 53.31373
visualize(bootstrap_distribution_mean_age) +
shade_confidence_interval(endpoints = congress_ci_percentile) +
geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 1) +
geom_vline(xintercept = pop_mean_age, color = "purple", size = 3)