+ - 0:00:00
Notes for current slide
Notes for next slide

Bootstrap Confidence Intervals

Thiyanga Talagala

1 / 10

Data

Model building will be done using:

  • Response: science (standardised)
  • Explanatory variables: ST004D01T, JOYSCIE, ST013Q01TA, ST012Q01TA.
library(tidyverse)
load("pisa_au.rda")
pisa_au <- pisa_au %>%
mutate(science = (PV1SCIE+PV2SCIE+PV3SCIE+PV4SCIE+
PV5SCIE+PV6SCIE+PV7SCIE+PV8SCIE+
PV9SCIE+PV10SCIE)/10) %>%
select(science, ST004D01T, JOYSCIE, ST013Q01TA, ST012Q01TA, SENWT)
2 / 10

Your turn

  • remove cases with missing values
  • change gender to be a factor
  • standardise the science scores
3 / 10

Model building

aus_glm <- glm(science_std~gender+JOYSCIE+nbooks+ntvs, data=aus_nomiss, weights=SENWT)
summary(aus_glm)
##
## Call:
## glm(formula = science_std ~ gender + JOYSCIE + nbooks + ntvs,
## data = aus_nomiss, weights = SENWT)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.42754 -0.29052 -0.00837 0.28447 1.97050
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.324459 0.043067 -7.534 5.27e-14 ***
## genderm 0.052947 0.015299 3.461 0.00054 ***
## JOYSCIE 0.277368 0.006645 41.744 < 2e-16 ***
## nbooks 0.216462 0.005469 39.576 < 2e-16 ***
## ntvs -0.113736 0.010720 -10.610 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.2542255)
##
## Null deviance: 4231.3 on 12355 degrees of freedom
## Residual deviance: 3139.9 on 12351 degrees of freedom
## AIC: 35006
##
## Number of Fisher Scoring iterations: 2
4 / 10

Classical confidence interval

β^4±tα/2,n(k+1)×S.E.(β^4)

coef <- summary(aus_glm)$coefficients
n <- nrow(aus_nomiss)# no. of observations
beta_4 <- coef[4, 1] # coefficient
se_4 <- coef[4, 2] # standard error
df <- n - 5 # degree of freedom = n - (k + 1)
t_crit <- qt(0.975, df) # t critical value
c(beta_4 - t_crit * se_4, beta_4 + t_crit * se_4) # (lower, upper)
## [1] 0.2057405 0.2271826
5 / 10

Confidence intervals via bootsrap

Step 1:

  • boot package can generate bootsrap samples for weighted data.
  • To use the boot function for drawing samples, you need a function to compute the statistic of interest.
  • Write the function to return the slope for nbooks after fitting a glm to a bootstrap sample.

The skeleton of the function:

calc_stat <- function(d, i) { # d-data, i - vector of indices of the bootstrap sample
x <- d[i,]
mod <- FILL IN THE NECESSARY CODE
stat <- FILL IN THE NECESSARY CODE
return(stat)
}
6 / 10

Confidence intervals via bootsrap (cont.)

Step 2:

  • Make a 95% bootsrap confidence interval
library(boot)
stat <- boot(aus_nomiss, statistic=calc_stat, R=1000,
weights=aus_nomiss$SENWT)
stat
7 / 10

Confidence intervals via bootsrap (cont.)

##
## WEIGHTED BOOTSTRAP
##
##
## Call:
## boot(data = aus_nomiss, statistic = calc_stat, R = 1000, weights = aus_nomiss$SENWT)
##
##
## Bootstrap Statistics :
## original bias std. error mean(t*)
## t1* 0.2164616 -0.001094138 0.006303779 0.2153674

The 95% bootsrap confidence interval is

c(sort(stat$t)[25],sort(stat$t)[975])
## [1] 0.2023871 0.2271530
8 / 10

Bootstrap confidence interval for predicted value

Step 1:

  • Write a function to calculate the predicted value for a girl, enjoys science (JOYSCIE=1.0) two TVs and 26-100 books in the home. The weight for a student like this is 0.2.

The skeleton of the function:

calc_pred <- function(d, i, newd) {
x <- d[i,]
mod <- FILL IN THE NECESSARY CODE
pred <- FILL IN THE NECESSARY CODE
return(pred)
}

Step 02:

  • Make a 95% bootsrap confidence interval
  • Be sure to convert the values back into the actual math score range
9 / 10

Prediction Interval

  • Compute the residuals from the fitted model
  • Bootstrap the residuals
  • Find the desired quantiles of the residuals
  • Compute prediction intervals by adding residual quantiles to fitted value
calc_res <- function(d, i) {
x <- d[i,]
mod <- FILL IN THE NECESSARY CODE
res <- FILL IN THE NECESSARY CODE
return(c(l=min(res), u=max(res)))
}
10 / 10

Data

Model building will be done using:

  • Response: science (standardised)
  • Explanatory variables: ST004D01T, JOYSCIE, ST013Q01TA, ST012Q01TA.
library(tidyverse)
load("pisa_au.rda")
pisa_au <- pisa_au %>%
mutate(science = (PV1SCIE+PV2SCIE+PV3SCIE+PV4SCIE+
PV5SCIE+PV6SCIE+PV7SCIE+PV8SCIE+
PV9SCIE+PV10SCIE)/10) %>%
select(science, ST004D01T, JOYSCIE, ST013Q01TA, ST012Q01TA, SENWT)
2 / 10
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow