Bootstrap Confidence Intervals

Bootstrap Confidence IntervalsThiyanga Talagala1 / 10

Data

Model building will be done using:

Response: science (standardised)
Explanatory variables: ST004D01T, JOYSCIE, ST013Q01TA, ST012Q01TA.

library(tidyverse)
load("pisa_au.rda")
pisa_au <- pisa_au %>% 
  mutate(science = (PV1SCIE+PV2SCIE+PV3SCIE+PV4SCIE+
                    PV5SCIE+PV6SCIE+PV7SCIE+PV8SCIE+
                    PV9SCIE+PV10SCIE)/10) %>%
  select(science, ST004D01T, JOYSCIE, ST013Q01TA, ST012Q01TA, SENWT)

2 / 10

Your turnremove cases with missing values
change gender to be a factor
standardise the science scores
3 / 10

Model building

aus_glm <- glm(science_std~gender+JOYSCIE+nbooks+ntvs, data=aus_nomiss, weights=SENWT)
summary(aus_glm)

## 
## Call:
## glm(formula = science_std ~ gender + JOYSCIE + nbooks + ntvs, 
##     data = aus_nomiss, weights = SENWT)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.42754  -0.29052  -0.00837   0.28447   1.97050  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.324459   0.043067  -7.534 5.27e-14 ***
## genderm      0.052947   0.015299   3.461  0.00054 ***
## JOYSCIE      0.277368   0.006645  41.744  < 2e-16 ***
## nbooks       0.216462   0.005469  39.576  < 2e-16 ***
## ntvs        -0.113736   0.010720 -10.610  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.2542255)
## 
##     Null deviance: 4231.3  on 12355  degrees of freedom
## Residual deviance: 3139.9  on 12351  degrees of freedom
## AIC: 35006
## 
## Number of Fisher Scoring iterations: 2

4 / 10

Classical confidence interval

${\hat{β}}_{4} \pm t_{α / 2, n - (k + 1)} \times S . E . ({\hat{β}}_{4})$

coef <- summary(aus_glm)$coefficients
n <- nrow(aus_nomiss)# no. of observations
beta_4 <- coef[4, 1] # coefficient
se_4 <- coef[4, 2] # standard error
df <- n - 5 # degree of freedom = n - (k + 1)
t_crit <- qt(0.975, df) # t critical value
c(beta_4 - t_crit * se_4, beta_4 + t_crit * se_4) # (lower, upper)

## [1] 0.2057405 0.2271826

5 / 10

Confidence intervals via bootsrap

Step 1:

boot package can generate bootsrap samples for weighted data.
To use the boot function for drawing samples, you need a function to compute the statistic of interest.
Write the function to return the slope for nbooks after fitting a glm to a bootstrap sample.

The skeleton of the function:

calc_stat <- function(d, i) { # d-data, i - vector of indices of the bootstrap sample
  x <- d[i,] 
  mod <- FILL IN THE NECESSARY CODE
  stat <- FILL IN THE NECESSARY CODE
  return(stat)
}

6 / 10

Confidence intervals via bootsrap (cont.)

Step 2:

Make a 95% bootsrap confidence interval

library(boot)
stat <- boot(aus_nomiss, statistic=calc_stat, R=1000,
     weights=aus_nomiss$SENWT)
stat

7 / 10

Confidence intervals via bootsrap (cont.)

## 
## WEIGHTED BOOTSTRAP
## 
## 
## Call:
## boot(data = aus_nomiss, statistic = calc_stat, R = 1000, weights = aus_nomiss$SENWT)
## 
## 
## Bootstrap Statistics :
##      original       bias    std. error  mean(t*)
## t1* 0.2164616 -0.001094138 0.006303779 0.2153674

The 95% bootsrap confidence interval is

c(sort(stat$t)[25],sort(stat$t)[975])

## [1] 0.2023871 0.2271530

8 / 10

Bootstrap confidence interval for predicted value

Step 1:

Write a function to calculate the predicted value for a girl, enjoys science (JOYSCIE=1.0) two TVs and 26-100 books in the home. The weight for a student like this is 0.2.

The skeleton of the function:

calc_pred <- function(d, i, newd) {
  x <- d[i,]
  mod <- FILL IN THE NECESSARY CODE
  pred <- FILL IN THE NECESSARY CODE
  return(pred)
}

Step 02:

Make a 95% bootsrap confidence interval
Be sure to convert the values back into the actual math score range

9 / 10

Prediction Interval

Compute the residuals from the fitted model
Bootstrap the residuals
Find the desired quantiles of the residuals
Compute prediction intervals by adding residual quantiles to fitted value

calc_res <- function(d, i) {
  x <- d[i,]
  mod <- FILL IN THE NECESSARY CODE
  res <- FILL IN THE NECESSARY CODE
  return(c(l=min(res), u=max(res)))
}

10 / 10

Data

Model building will be done using:

Response: science (standardised)

Explanatory variables: ST004D01T, JOYSCIE, ST013Q01TA, ST012Q01TA.

library(tidyverse) load("pisa_au.rda") pisa_au <- pisa_au %>% mutate(science = (PV1SCIE+PV2SCIE+PV3SCIE+PV4SCIE+ PV5SCIE+PV6SCIE+PV7SCIE+PV8SCIE+ PV9SCIE+PV10SCIE)/10) %>% select(science, ST004D01T, JOYSCIE, ST013Q01TA, ST012Q01TA, SENWT)

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Bootstrap Confidence Intervals

Thiyanga Talagala

Data

Your turn

Model building

Classical confidence interval

Confidence intervals via bootsrap

Confidence intervals via bootsrap (cont.)

Confidence intervals via bootsrap (cont.)

Bootstrap confidence interval for predicted value

Prediction Interval

Data

Help