Bootstrapping and Confidence Intervals

Based on Chater 8 of ModernDive. Code for Quiz 12.

Load the R package we will use.

What is the average age of members that have served in congress? - Set random seed generator to 123 - Take a sample of 100 from the dataset congress_age and assign it to congress_age_100

set.seed(123)
congress_age_100 <- congress_age %>%
  rep_sample_n(size = 100)

Construct the confidence interval

1. Use specify to indicate the variable from congress_age_100 that you are interested in

congress_age_100 %>% 
  specify(response = age)
Response: age (numeric)
# A tibble: 100 x 1
     age
   <dbl>
 1  53.1
 2  54.9
 3  65.3
 4  60.1
 5  43.8
 6  57.9
 7  55.3
 8  46  
 9  42.1
10  37  
# ... with 90 more rows

2.Generate 1000 replicates of your sample of 100

The output has 100,000 rows

3. Calculate the mean for each replicate

bootstrap_distribution_mean_age
# A tibble: 1,000 x 2
   replicate  stat
 *     <int> <dbl>
 1         1  51.7
 2         2  55.6
 3         3  54.2
 4         4  52.8
 5         5  53.1
 6         6  52.7
 7         7  53.5
 8         8  52.9
 9         9  52.4
10        10  52.6
# ... with 990 more rows

4. Visualize the bootstrap distribution

visualize(bootstrap_distribution_mean_age)

Calculate the 95% confidence interval using the percentile method - Assign the output to congress_ci_percentile - Display congress_ci_percentile

congress_ci_percentile <- bootstrap_distribution_mean_age %>% 
  get_confidence_interval(type = "percentile", level = 0.95)

- Calculate the observed point estimate of the mean and assign it to obs_mean_age - Display obs_mean_age,

obs_mean_age <- congress_age_100 %>% 
  specify(response = age) %>% 
  calculate(stat = "mean") %>% 
  pull()
obs_mean_age
[1] 53.36
visualize(bootstrap_distribution_mean_age) +
  shade_confidence_interval(endpoints = congress_ci_percentile) +
  geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 1)

pop_mean_age <-congress_age %>% 
  summarize(pop_mean = mean(age)) %>% pull()
pop_mean_age
[1] 53.31373
visualize(bootstrap_distribution_mean_age) +
  shade_confidence_interval(endpoints = congress_ci_percentile) +
  geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 1) +
  geom_vline(xintercept = pop_mean_age, color = "purple", size = 3)