week7.qmd

---
title: "Applied microeconometrics"
subtitle: "Weeks 7 and 8 - Instrumental variables"
author: "Josh Merfeld"
institute: "KDI School"
date: "2024-11-04"

date-format: long
format: 
  revealjs:
    self-contained: true
    slide-number: false
    progress: false
    theme: [serif, custom.scss]
    width: 1500
    height: 1500*(9/16)
    code-copy: true
    code-fold: show
    code-overflow: wrap
    highlight-style: github
execute:
  echo: true
  warnings: false
  message: false
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, dev = "png") # NOTE: switched to png instead of pdf to decrease size of the resulting pdf

def.chunk.hook  <- knitr::knit_hooks$get("chunk")
knitr::knit_hooks$set(chunk = function(x, options) {
  x <- def.chunk.hook(x, options)
  #ifelse(options$size != "a", paste0("\n \\", "tiny","\n\n", x, "\n\n \\normalsize"), x)
  ifelse(options$size != "normalsize", paste0("\n \\", options$size,"\n\n", x, "\n\n \\normalsize"), x)
})

knitr::knit_hooks$set(crop = knitr::hook_pdfcrop)

library(tidyverse)
library(kableExtra)
library(fixest)
library(ggpubr)
library(RColorBrewer)
library(haven)
library(fwildclusterboot)
library(modelsummary)
library(terra)
library(tidyterra)
library(cowplot)

```


## What are we doing today?

- Introduction to IVs
  - Requirements/assumptions

- IVs and RCTs

- In a world of LATE

- Weak instruments


## Instrumental variables

- Instrumental variables (IVs) are a way to estimate causal effects when we have endogeneity
  - The endogeneity can take many forms: omitted variables, measurement error, simultaneity, etc.

- Consider my paper: effects of pollution on agricultural productivity
  - What's the problem with simply regression productivity on pollution?


## Endogeneity in the pollution example


![](week7assets/pollution1.png){fig-align="center"}


## Endogeneity in the pollution example


![](week7assets/pollution2.png){fig-align="center"}


## Putting structure on this

- What we really want to estimate is this:
\begin{gather} \label{eq:iv1} productivity_{it} = \beta_0 + \beta_1 pollution_{it} + \epsilon_{it} \end{gather}
where $\beta_1$ is the causal effect of pollution on productivity.


- Endogeneity is defined as $cov(pollution_{it}, \epsilon_{it})\neq0$
  - That is, the error term is correlated with the endogenous variable
  - A common example is omitted variables


## Putting structure on this

\begin{gather} \tag{1} productivity_{it} = \beta_0^* + \beta_1^* pollution_{it} + \epsilon_{it}^* \end{gather}

- When we estimate this, due to the way OLS works, the residuals and pollution will be orthogonal
  - That is, $cov(pollution_{it}, \epsilon_{it}^*)=0$
  - This is a property of OLS


- However, the issue is that under endogeneity, $\beta^*_1\neq\beta_1$
  - That is, the OLS estimate of $\beta_1$ is biased *for the true structural parameter*


## Putting structure on this
- Another way to think about it is that what we want to estimate is this:
\begin{gather} productivity_{it} = \beta_0 + \beta_1 pollution_{it} + \beta_2 X + \epsilon_{it} \end{gather}

- But if we don't properly control for everything -- in this case $X$ -- we are really estimating this:
\begin{gather} \label{eq:iv2} productivity_{it} = \tilde{\beta_0} + \tilde{\beta_1} pollution_{it} + \eta_{it}, \end{gather}
where $\eta_{it} = \beta_2 X_{it} + \epsilon_{it}$.


## Differences in differences?

- One solution is to use a differences-in-differences (DiD) approach

- This requires the assumption of parallel trends
  - That is, the trends in the outcome variable would have been the same in the absence of the treatment

- But what if changing economic growth is leading to changes in both pollution and productivity?
  - Then the parallel trends assumption is violated since areas with more pollution are also experiencing faster economic growth


## Control for growth?

- If you're willing to make assumptions about what the omitted variables are, maybe you could control for theme

- But this is a strong assumption
  - No matter what we do, we'll have to make assumptions, though


## Enter: instruments

- Let's take a different approach

- We'll use an instrument
  - A variable that is correlated with the endogenous variable (pollution) but is not correlated with the error term


## Instrument in the pollution example


![](week7assets/pollution3.png){fig-align="center"}


## Requirements of an instrument

- I very purposefully created the example so that the instrument is correlated with pollution
  - But it's not *directly* correlated with productivity
  - And it's not correlated with the omitted variable (the error term... will show you this in a second)

- Let's look at these more formally


## Back to our problem

\begin{gather} \tag{3} productivity_{it} = \tilde{\beta_0} + \tilde{\beta_1} pollution_{it} + \eta_{it} \end{gather}

- Can we estimate a version of this equation -- that is, without controlling for $X_{it}$ -- and still get causal effects?

- Maybe, if we can find a valid instrument.

- So what makes an instrument valid?


## What else can instruments help with?

- It turns out IVs can also help with measurement error
  - If we have a variable that is measured with error, we can use an instrument to correct for this

- From Hansen, consider the model:
\begin{gather} X = Q + u, \end{gather}
where $X$ is the variable we observe, $Q$ is the variable we want to measure, and $u$ is measurement error.

- Assume that $cov(u, Q)=0$, so that the measurement error is *random*, i.e. uncorrelated with the true value of $Q$.
  - This is known as classical measurement error


## Classical measurement error and attenuation bias

- We want to estimate:
\begin{gather} Y = \beta_0 + \beta_1 Q + \epsilon, \end{gather}
but what we really estimate is:
\begin{gather} Y = \tilde{\beta}_0 + \tilde{\beta}_1 X + \tilde{\epsilon} = \tilde{\beta}_0 + \tilde{\beta}_1 (Q + u) + \tilde{\epsilon} \end{gather}


## Classical measurement error and attenuation bias

- This is what we get:
\begin{gather} \tilde{\beta}_1 = \beta_1\left(1-\frac{\mathbb{E}(u^2)}{\mathbb{E}(X^2)}\right) \end{gather}

- By definition, $\mathbb{E}(X^2)>\mathbb{E}(u^2)$, so $\tilde{\beta}_1<\beta_1$.
  - Why is this true? 
  - That is, the OLS estimate of $\beta_1$ is biased *towards zero*
  - This is called attenutation bias, but is only guaranteed with the measurement error is classical (random)


## Requirements for an instrument

\begin{gather} \tag{3} productivity_{it} = \tilde{\beta_0} + \tilde{\beta_1} pollution_{it} + \eta_{it} \end{gather}

1. The instrument must be correlated with the endogenous variable (pollution)

2. The instrument must not be correlated with the error term ($\eta_{it}$)
      - Note that this implies two things:
        - The instrument must not be correlated with any omitted variable (here $X_{it}$)
        - The instrument must not directly affect the outcome ($productivity_{it}$)


## Using an instrument

- If we can find a valid instrument, we can use it to estimate the causal effect of pollution on productivity

- The simplest example uses two stages:
  1. $pollution_{it} = \pi_0 + \pi_1 instrument_{it} + \nu_{it}$
  2. $productivity_{it} = \phi_0 + \phi_1 pollution_{it} + \zeta_{it}$

- We can then estimate $\phi_1$ using OLS
  - Note that only under certain circumstances will $\phi_1=\beta_1$
  - More on this later


## The intuition with venn diagrams

![](week7assets/iv1.png){fig-align="center"}


## The IV only affects productivity through pollution

![](week7assets/iv2.png){fig-align="center"}


## This doesn't work. Direct effects on productivity!

![](week7assets/iv3.png){fig-align="center"}


## This doesn't work. Correlated with growth!

![](week7assets/iv4.png){fig-align="center"}


## Back to our "two stages", redefining names

$$\text{Stage}\;1:\;T_{it} = \pi_0 + \pi_1 Z_{it} + \nu_{it}$$
$$\text{Stage}\;2:\;Y_{it} = \phi_0 + \phi_1 T_{it} + \zeta_{it}$$


- Requirements:
  - $cov(Z_{it}, T_{it}) \neq 0$
  - $cov(Z_{it}, \zeta_{it}) = 0$

- We first regress T on the instrument to get $\hat{T}_{it}$
- Then, we use the predicted values of T to estimate the effects on Y
  - If the IV is valid, these predicted values *are unrelated to the omitted variables!*


## Some comments
$$\text{Stage}\;1:\;T_{it} = \pi_0 + \pi_1 Z_{it} + \nu_{it}$$


\begin{gather}cov(Z_{it}, T_{it}) \neq 0\end{gather}

- This is the first requirement

- We can test this!
  - F-test of all *excluded instruments* in the first stages
  - I say all excluded instruments because you can technically have more than one


## Some comments
$$\text{Stage}\;1:\;T_{it} = \pi_0 + \pi_1 Z_{it} + \nu_{it}$$
$$\text{Stage}\;2:\;Y_{it} = \phi_0 + \phi_1 T_{it} + \zeta_{it}$$

\begin{gather}cov(Z_{it}, \zeta_{it}) = 0\end{gather}

- This is the second requirement

- We cannot explicitly test this
  - This is an identifying *assumption*
  - We need this to be true to attribute causality to the second stage


## Some comments
$$\text{Stage}\;1:\;T_{it} = \pi_0 + \pi_1 Z_{it} + \nu_{it}$$
$$\text{Stage}\;2:\;Y_{it} = \phi_0 + \phi_1 T_{it} + \zeta_{it}$$

\begin{gather}cov(Z_{it}, \zeta_{it}) = 0\end{gather}

- Note that we will use $Z_{it}$ to predict $T_{it}$.
  - We cannot actually observe $cov(Z_{it}, \zeta_{it})$

- So if $cov(Z_{it}, \zeta_{it})\neq0$...
  - Then this correlation will be contained in the predicted values, $\hat{T}_{it}$
  - i.e. the predicted values will still be endogenous


## IVs in supply and demand

- Economists have long been interested in supply and demand
  - Obviously...

- How does a change in supply affect prices?
  - Not a straightforward question to answer, because prices are determined jointly by supply and demand
  - We can't determine what is changing when we observe market prices
  - One option: an instrument that moves only one side of the market

- Small note: this is how IVs originally came about in economics


## Favara and Imbs, 2015 (*American Economic Review*)

- How does the availability of credit affect house prices?

- They use a change in deregulation of banks in the US
  - This deregulation led to an increase in credit supply
  - But it did not affect credit demand, since it was a supply-side change

- Idea: show the change in credit availability for banks affected by the change
  - And no change for banks not affected by the change


## Deregulation index across states and years

![](week7assets/deregulation1.png){fig-align="center"}


## Two stages: predict credit supply, then predict house prices
\begin{align} &\text{Stage 1: } credit_{ct} = \delta_0 + \delta_1 deregulation_{ct} + \delta_2 X_{ct} + \alpha_c + \gamma_t + \nu_{ct} \\
              &\text{Stage 2: } price_{ct} = \beta_0 + \beta_1 credit_{ct} + \beta_2 X_{ct} + \phi_c + \eta_t + \zeta_{ct} \end{align}

- They instrument for $credit$ using $deregulation$
  - $deregulation$ is correlated with $credit$ but not with $\zeta_{ct}$, according to the authors
  - (Let's ignore whether this is true for now since it's so contextual)

- They control for $X_{ct}$, which is a vector of controls
- This is also a two-way fixed effects specification:
  - $\alpha_c$ and $\gamma_t$ ($\phi_c$ and $\eta_t$ in stage 2) are county and year fixed effects


## Replication data: `week7files/hmda_merged.dta`
```{r rep1, echo = TRUE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
library(haven)
df <- read_dta("week7files/hmda_merged.dta")
head(df)

# key controls: LDl_hpi Dl_inc LDl_inc Dl_pop LDl_pop Dl_her_v LDl_her_v
# instrument: Linter_bra
# endogenous variables: Dl_nloans_b Dl_vloans_b Dl_lir_b
# weights: w1
# restriction: border counties only (border==1)
# county and year FE
# cluster on state
```


## Reduced form

- It is common to estimate the reduced form of the first stage
  - This is a regression of the outcome of interest on the instrument

- In this case, this equals
\begin{gather} price_{ct} = B_0 + B_1 deregulation_{ct} + B_2 X_{ct} + \cdots \end{gather}


## Reduced form
```{r rep1b, echo = TRUE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
bordercounties <- df |> filter(border==1)
summary(feols(Dl_hpi ~ Linter_bra + LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v | county + year, 
        data = bordercounties, weights = bordercounties$w1,
        cluster = "state_n"))
```


## First stage
```{r rep2, echo = TRUE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
bordercounties <- df |> filter(border==1)
reg1 <- feols(Dl_nloans_b ~ Linter_bra + LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v | county + year, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
reg2 <- feols(Dl_vloans_b ~ Linter_bra + LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v | county + year, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
reg3 <- feols(Dl_lir_b ~ Linter_bra + LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v | county + year, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
```


## First stage
```{r rep3, echo = FALSE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
bordercounties <- df |> filter(border==1)
reg1 <- feols(Dl_nloans_b ~ Linter_bra + LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v | county + year, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
reg2 <- feols(Dl_vloans_b ~ Linter_bra + LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v | county + year, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
reg3 <- feols(Dl_lir_b ~ Linter_bra + LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v | county + year, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
table <- etable(reg1, reg2, reg3,
                digits = 3, fitstat = c("n"), se.below = TRUE, depvar = FALSE,
                # change significance codes to the norm
                signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1))
colnames(table) <- c("", "Loans", "Loan volume", "Loan-to-inc. ratio")
table[c(1,3,5,7,9,11,13,15),1] <- c("IV", "House price (lag)", "Inc. p.c.", "Inc. p.c. (lag)", "Population", "Population (lag)", "Herf. index", "Herf. index (lag)")
table <- table[-c(17:21),]
tabletemp <- etable(reg1, reg2, reg3,
                digits = 10, fitstat = c("n"), se.below = TRUE, depvar = FALSE,
                coefstat = "tstat",
                # change significance codes to the norm
                signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1))
# extract t statistics for first coeffcicient only
tstats <- as_vector(tabletemp[2,2:4])
# remove parentheses
tstats <- gsub("\\(", "", tstats)
tstats <- gsub("\\)", "", tstats)
tstats <- as.numeric(tstats)
# square for F-test
tstats <- tstats^2
# round to three digits
tstats <- round(tstats, 3)
# add to bottom of table
table[18,] <- c("F-test for instrument", tstats)
kable(table, 
      align = "lccc", booktabs = TRUE, linesep = "", escape = FALSE, row.names = FALSE) |>
      row_spec(16, hline_after = TRUE) |>
      column_spec(1,width = "7cm") |>
      column_spec(c(2:4),width = "3.5cm") |>
      kable_styling() |>
      footnote("Note: F-test differs from results in paper due to differences in how xtreg calculates standard errors.", general_title = "",
                footnote_as_chunk = TRUE,
                escape = FALSE
                ) |>
      footnote("Standard errors clustered on state in parentheses.", general_title = "",
                footnote_as_chunk = TRUE,
                escape = FALSE
                ) |>
      kable_styling(font_size = 18)
```


## First stage predictions vs. actual values... what do you notice?
```{r rep4, echo = FALSE, eval = TRUE, message = FALSE, warning = FALSE, size = "tiny", out.width = "55%", fig.align = "center"}
bordercounties <- df |> filter(border==1)
reg1 <- feols(Dl_nloans_b ~ Linter_bra + LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v | county + year, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
# predict out of first stage
bordercounties$credit <- NA
bordercounties$credit[reg1$obs_selection$obsRemoved] <- predict(reg1)
bordercounties$credit2 <- NA
bordercounties$credit2[reg2$obs_selection$obsRemoved] <- predict(reg2)
bordercounties$credit3 <- NA
bordercounties$credit3[reg3$obs_selection$obsRemoved] <- predict(reg3)

# create plot with predicted vs. actual
ggplot(bordercounties, aes(x = Dl_nloans_b, y = credit)) +
  geom_point(alpha = 0.5) +
  labs(x = "Actual change", y = "Predicted change") +
  # same limits on both axes
  coord_cartesian(xlim = c(min(bordercounties$Dl_nloans_b, na.rm = T), max(bordercounties$Dl_nloans_b, na.rm = T)), 
                  ylim = c(min(bordercounties$Dl_nloans_b, na.rm = T), max(bordercounties$Dl_nloans_b, na.rm = T))) +
  theme_minimal()

```


## First stage predictions vs. actual values... what do you notice?

```{r rep5, echo = FALSE, eval = TRUE, message = FALSE, warning = FALSE}
sums <- bordercounties |> 
          dplyr::select(Dl_nloans_b, credit) |> 
          na.omit() |>
          summarize(across(everything(), list(min = min, max = max, sd = sd)))
sums <- rbind(sums, sums)
sums[2,1:3] <- sums[2,4:6]
sums <- sums[,1:3]
sums <- cbind(sums[,1], sums)
sums[,1] <- c("Actual", "Predicted")
colnames(sums) <- c("", "min", "max", "SD")
sums[,2:4] <- round(sums[,2:4], 3)
kable(sums,
      align = "lccc", booktabs = TRUE, linesep = "", escape = FALSE, row.names = FALSE
      ) |>
      kable_styling() |>
      kable_styling(font_size = 18)
```


- Note how much less variance there is in the predicted values than the actual values
  - This is the point of using an instrument!
  - We are able to isolate the variation in the endogenous variable that is not correlated with the error term
    - This is of course only a subset of the total variation in the endogenous variable

- This will be important later


## We cannot simply use the predicted values in the second stage... standard errors will be wrong!
```{r rep6, echo = TRUE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
# create a macro for the main regression controls (to avoid repetition and save space)
setFixest_fml(..controls = ~ LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v)
# Let's use feols to estimate the two stages
reg1 <- feols(Dl_hpi ~ ..controls | county + year | Dl_nloans_b ~ Linter_bra, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
reg2 <- feols(Dl_hpi ~ ..controls | county + year | Dl_vloans_b ~ Linter_bra, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
reg3 <- feols(Dl_hpi ~ ..controls | county + year | Dl_lir_b ~ Linter_bra, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
```


## `fixest` will give us the correct standard errors, however (first stage)
```{r rep7, echo = TRUE, eval = FALSE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
# first stage:
etable(
      reg1, reg2, reg3,
      stage = 1,
      se.below = TRUE,
      depvar = FALSE,
      signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
      digits = "r3",
      digits.stats = "r0",
      fitstat = c("ivwald", "n"), # make sure to use ivwald for first-stage F-test
      coefstat = "se",
      group = list(controls = "LDl_hpi"),
      keep = "Linter_bra"
    )
```


## `fixest` will give us the correct standard errors, however (first stage)
```{r rep8, echo = FALSE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
# first stage:
table <- etable(
              reg1, reg2, reg3,
              stage = 1,
              se.below = TRUE,
              depvar = FALSE,
              signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
              digits = "r3",
              digits.stats = "r3",
              fitstat = c("ivwald", "n"), # make sure to use ivwald for first-stage F-test
              coefstat = "se",
              group = list(controls = "LDl_hpi"),
              keep = "Linter_bra"
            )
table[1,1] <- "IV (deregulation index)"
colnames(table) <- c("", "Loans", "Loan volume", "Loan-to-inc. ratio")
table[4,2:4] <- ""
table <- table[-c(7:8),]
kable(table, 
      align = "lccc", booktabs = TRUE, linesep = "", escape = FALSE, row.names = FALSE) |>
      row_spec(6, hline_after = TRUE) |>
      column_spec(1,width = "3cm") |>
      column_spec(c(2:4),width = "2.5cm") |>
      kable_styling() |>
      footnote("Note: The Wald (similar to F-test) values do not equal the values in the paper due to differences in how xtreg calculates standard errors.", general_title = "",
                footnote_as_chunk = TRUE,
                escape = FALSE
                ) |>
      footnote("Standard errors clustered on state in parentheses.", general_title = "",
                footnote_as_chunk = TRUE,
                escape = FALSE
                ) |>
      kable_styling(font_size = 18)
```


## `fixest` will give us the correct standard errors, however (second stage)
```{r rep9, echo = TRUE, eval = FALSE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
# second stage:
etable(
      reg1, reg2, reg3,
      stage = 2,
      se.below = TRUE,
      depvar = FALSE,
      signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
      digits = "r3",
      digits.stats = "r3",
      fitstat = c("ivwald", "n"), # make sure to use ivwald for first-stage F-test
      coefstat = "se",
      group = list(controls = "LDl_hpi"),
      keep = c("Dl_nloans_b", "Dl_vloans_b", "Dl_lir_b")
    )
```


## `fixest` will give us the correct standard errors, however (second stage)
```{r rep10, echo = FALSE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
# first stage:
table <- etable(
              reg1, reg2, reg3,
              stage = 2,
              se.below = TRUE,
              depvar = FALSE,
              signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
              digits = "r3",
              digits.stats = "r3",
              fitstat = c("ivwald", "n"),
              coefstat = "se",
              group = list(controls = "LDl_hpi"),
              keep = c("Dl_nloans_b", "Dl_vloans_b", "Dl_lir_b")
            )
table[c(1, 3, 5),1] <- c("Loans", "Loan volume", "Loan-to-inc. ratio")
colnames(table) <- c("", "(1)", "(2)", "(3)")
table[8,2:4] <- ""
table <- table[-c(11:12),]
table[11,3] <- table[12,3]
table[11,4] <- table[13,4]
table <- table[-c(12:13),]
table[11,1] <- "Wald (1st stage)"
kable(table, 
      align = "lccc", booktabs = TRUE, linesep = "", escape = FALSE, row.names = FALSE) |>
      row_spec(10, hline_after = TRUE) |>
      column_spec(1,width = "3cm") |>
      column_spec(c(2:4),width = "2.5cm") |>
      kable_styling() |>
      footnote("Note: The Wald (similar to F-test) values do not equal the values in the paper due to differences in how xtreg calculates standard errors.", general_title = "",
                footnote_as_chunk = TRUE,
                escape = FALSE
                ) |>
      footnote("Standard errors clustered on state in parentheses.", general_title = "",
                footnote_as_chunk = TRUE,
                escape = FALSE
                ) |>
      kable_styling(font_size = 18)
```


## Note the syntax for `fixest`

`feols(y ~ x | fe1 + fe2 | endogenousvar ~ z, ...)`

`feols(y ~ x | fe1 + fe2 | endogenousvar1 + endogenousvar2 ~ z1 + z2, ...)`


- All controls should be in the first stage, as well as the second
  - `fixest` does this for us automatically

- The package also automatically calculates correct standard errors in the second stage
  - For the "generated regressor"


## Estimating it all together

- With just a single instrument and a single endogenous variable, there is a single first stage

- Let's continue with our outcome $Y$, our endogenous variable $X$, and our exogenous variables $Z$ (which includes the instrument)

- It turns out that we can write $\hat{\beta}_{IV}$ as:
\begin{gather} \hat{\beta}_{IV}=\left((Z'Z)^{-1}(Z'X)\right)^{-1}\left((Z'Z)^{-1}(Z'Y)\right) \end{gather}


## Estimating it all together

\begin{gather} \tag{14} \hat{\beta}_{IV}=\left((Z'Z)^{-1}(Z'X)\right)^{-1}\left((Z'Z)^{-1}(Z'Y)\right) \end{gather}

- We can immediately see two things:

  - The requirement that $Z$ predicts $X$ is necessary to invert the first term

  - The IV estimate *scales the reduced form by the first stage*


## Just a quick note that this simplifies
\begin{align} \tag{14} \hat{\beta}_{IV}&=\left((Z'Z)^{-1}(Z'X)\right)^{-1}\left((Z'Z)^{-1}(Z'Y)\right) \\
                                        &=(Z'X)^{-1}(Z'Z)(Z'Z)^{-1}(Z'Y) \\
                                        &=(Z'X)^{-1}(Z'Y) \end{align}


## Binary instrument and binary treatment

- Let's consider a binary instrument and a binary treatment
  - $Z$ and $D$ are binary, i.e. $Z,D\in\{0,1\}$

- It turns out there is a very real case where we can find a valid instrument that is binary
  - Treatment assignment in an RCT!


## RCTs and IV

- Banerjee et al. (2015): The Miracle of Microfinance? Evidence from a Randomized Evaluation (*AEJ: Applied*)

- They are interested in the effects of access to credit on outcomes
  - They randomly assign households (sort of) to microcredit *access*

- Z: whether or not the household was offered microcredit
  - This is a binary instrument
- X: whether or not the household received credit
  - This is a binary endogenous variable


## RCTs and IV

- Banerjee et al. (2015): The Miracle of Microfinance? Evidence from a Randomized Evaluation (*AEJ: Applied*)

- They are interested in the effects of access to credit on outcomes
  - They randomly assign households (sort of) to microcredit *access*

- Z: whether or not the household was offered microcredit
  - This is a binary instrument
- X: whether or not the household received credit
  - This is a binary endogenous variable


## Effects of the program on outcomes in endline 1
```{r micro1, echo = TRUE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
df <- read_dta("week7files/banerjeeetal.dta")
# create a macro for the main regression controls (to avoid repetition and save space)
setFixest_fml(..controls = ~ area_pop_base + area_debt_total_base + area_business_total_base + area_exp_pc_mean_base + 
                              area_literate_head_base + area_literate_base)
# they control for baseline values of NEIGHBORHOOD means of these variables
```
```{r microb, echo = FALSE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
df <- df |> dplyr::select(areaid, treatment, w1, starts_with("area_"), anymfi_1, anyloan_1, bizassets_1, bizprofit_1, any_biz_1)
# make sure we have same sample
df <- df[complete.cases(df),]
```


- They estimate:
\begin{gather} y_{in} = \beta_0 + \beta_1 Z_{n} + \sum_{k=1}^K\gamma_k X_k + \varepsilon_{n}, \end{gather}
where $Z_{i}$ is the treatment variable (microcredit access) and standard errors are clustered at the areaid (neighborhood)


## Reduced form
```{r micro2, echo = TRUE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
reg1 <- feols(any_biz_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg2 <- feols(bizassets_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg3 <- feols(bizprofit_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
table <- etable(reg1, reg2, reg3,
                digits = 3, fitstat = c("n"), se.below = TRUE, depvar = FALSE,
                # change significance codes to the norm
                signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
                group = list(controls = "area_pop_base"), keep = "treatment")
```


## Reduced form, clean table
```{r micro3, echo = FALSE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
reg1 <- feols(any_biz_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg2 <- feols(bizassets_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg3 <- feols(bizprofit_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
table <- etable(reg1, reg2, reg3,
                digits = 3, fitstat = c("n"), se.below = TRUE, depvar = FALSE,
                # change significance codes to the norm
                signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
                group = list(controls = "area_pop_base"), keep = "treatment")
colnames(table) <- c("", "Any biz?", "Biz week7assets", "Biz profits")
table <- table[-c(4:5),]
kable(table, 
      align = "lccc", booktabs = TRUE, linesep = "", escape = FALSE, row.names = FALSE) |>
      row_spec(3, hline_after = TRUE) |>
      column_spec(1,width = "2cm") |>
      column_spec(c(2:4),width = "1.5cm") |>
      kable_styling() |>
      footnote("Standard errors clustered on neighborhood in parentheses.", general_title = "",
                footnote_as_chunk = TRUE,
                escape = FALSE
                )
```


## First stage
```{r micro4, echo = TRUE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
reg1 <- feols(anymfi_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg2 <- feols(anyloan_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
table <- etable(reg1, reg2,
                digits = 3, fitstat = c("n"), se.below = TRUE, depvar = FALSE,
                # change significance codes to the norm
                signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
                group = list(controls = "area_pop_base"), keep = "treatment")
```


## First stage, clean table
```{r micro5, echo = FALSE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
reg1 <- feols(anymfi_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg2 <- feols(anyloan_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
table <- etable(reg1, reg2,
                digits = 3, fitstat = c("n"), se.below = TRUE, depvar = FALSE,
                # change significance codes to the norm
                signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
                group = list(controls = "area_pop_base"), keep = "treatment")
colnames(table) <- c("", "Any MFI loan?", "Any loan?")
table <- table[-c(4:5),]
kable(table, 
      align = "lcc", booktabs = TRUE, linesep = "", escape = FALSE, row.names = FALSE) |>
      row_spec(3, hline_after = TRUE) |>
      column_spec(1,width = "2cm") |>
      column_spec(c(2:3),width = "1.5cm") |>
      kable_styling() |>
      footnote("Standard errors clustered on neighborhood in parentheses.", general_title = "",
                footnote_as_chunk = TRUE,
                escape = FALSE
                )
```


## IV results
```{r micro6, echo = TRUE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
reg1 <- feols(any_biz_1 ~ ..controls | anymfi_1 ~ treatment, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg2 <- feols(bizassets_1 ~ ..controls | anymfi_1 ~ treatment, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg3 <- feols(bizprofit_1 ~ ..controls | anymfi_1 ~ treatment, 
              data = df, weights = df$w1,
              cluster = "areaid")

table <- etable(reg1, reg2, reg3,
                digits = 3, fitstat = c("ivwald", "n"), se.below = TRUE, depvar = FALSE,
                # change significance codes to the norm
                signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
                group = list(controls = "area_pop_base"), keep = "anymfi_1")
```


## IV results, clean table
```{r micro7, echo = FALSE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
reg1 <- feols(any_biz_1 ~ ..controls | anymfi_1 ~ treatment, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg2 <- feols(bizassets_1 ~ ..controls | anymfi_1 ~ treatment, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg3 <- feols(bizprofit_1 ~ ..controls | anymfi_1 ~ treatment, 
              data = df, weights = df$w1,
              cluster = "areaid")

table <- etable(reg1, reg2, reg3,
                digits = 3, fitstat = c("ivwald", "n"), se.below = TRUE, depvar = FALSE,
                # change significance codes to the norm
                signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
                group = list(controls = "area_pop_base"), keep = "anymfi_1")
colnames(table) <- c("", "Any biz?", "Biz week7assets", "Biz profits")
table <- table[-c(4:5),]
table[1,1] <- "Has MFI loan"
table[4,1] <- "Wald (1st stage)"
kable(table, 
      align = "lccc", booktabs = TRUE, linesep = "", escape = FALSE, row.names = FALSE) |>
      row_spec(3, hline_after = TRUE) |>
      column_spec(1,width = "2cm") |>
      column_spec(c(2:4),width = "1.5cm") |>
      kable_styling() |>
      footnote("Standard errors clustered on neighborhood in parentheses.", general_title = "",
                footnote_as_chunk = TRUE,
                escape = FALSE
                )
```


## Putting them together
```{r micro8, echo = TRUE, eval = TRUE, message = FALSE, warning = TRUE, size = "tiny", out.width = "55%", fig.align = "center"}
# reduced form
reg1 <- feols(any_biz_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
# first stage
reg2 <- feols(anymfi_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
# IV result
reg3 <- feols(any_biz_1 ~ ..controls | anymfi_1 ~ treatment, 
              data = df, weights = df$w1,
              cluster = "areaid")
```

- Coefficient on reduced form: `r paste0(reg1$coefficients[2])`

- Coefficient on first stage: `r paste0(reg2$coefficients[2])`

- Coefficient on IV: `r paste0(reg3$coefficients[2])`
  - Can you figure out how this is related to the RF and FS? \pause
  - This is a ratio: $\frac{\hat{\beta}_{RF}}{\hat{\beta}_{FS}} = \hat{\beta}_{IV}$
  - The IV result *scales the reduced form by the first stage*


## Putting them together, the intuition

- The IV estimate is a ratio of two coefficients
  - The reduced form coefficient and the first stage coefficient

- In this example, treatment increases MFI loan take-up by 8.2 percentage points.
  - In other words, the treatment effect is driven by a change in MFI loan take-up among 8.2 percent of households

- If the probability of owning a business goes up by 0.005 (0.5 p.p.), what is the change in probability of owning a business for those who take up the MFI loan?
  - 0.005/0.082! This is the IV estimate


## The Wald estimator

- This is sometimes referred to as the wald estimator (Wald 1940)
\begin{gather} \beta = \frac{\mathbb{E}\left[Y\mid Z=1\right]-\mathbb{E}\left[Y\mid Z=0\right]}{\mathbb{E}\left[X\mid Z=1\right]-\mathbb{E}\left[X\mid Z=0\right]} \end{gather}

- Note that these expectations are not observed
  - We estimate them with the reduced form and first stage


## Interpreting IV estimates

- So this IV estimate is driven by the change in MFI loan take-up among 8.2 percent of households
  - What does this mean for the effect of MFI loans on business ownership?

- Two worlds:
  - Homogeneous treatment effects
  - Heterogeneous treatment effects

- Remember how I said an IV identifies just certain kinds of variation?
  - This will come into play here


## Homogeneous treatment effects

- We had a similar discussion when we talked about DiD

- If everyone has the same treatment effect, then it doesn't matter what variation we isolate
  - All variation will be identifying the same effect

- In this case, the IV is estimated the average treatment effect

- But what if effects are not homogeneous?


## Heterogeneous treatment effects

- What if not everyone has the same treatment effect?
  - In other words, what if different types of variation are identifying different effects?

- Imagine a world in which we have an endogenous variable, $D$
  - Imagine we also have multiple *valid* instruments: $Z_1$ and $Z_2$

- If $Z_1$ and $Z_2$ are correlated with different "parts" of $D$, then they can be isolating different variation in $D$
  - This also means that they IV results can lead to different estimates, even though both instruments are valid!


## Defining the LATE

- We need to define four separate groups:
  - Compliers
  - Always-takers
  - Never-takers
  - Defiers

- Let's look at these four groups assuming a binary treatment


## Compliers

![](week7assets/late1.png){fig-align="center"}


## Never-takers

![](week7assets/late2.png){fig-align="center"}


## Always-takers

![](week7assets/late3.png){fig-align="center"}


## Defiers

![](week7assets/late4.png){fig-align="center"}


## In Hansen, where X is treatment assignment

![](week7assets/hansenlate.png){fig-align="center"}


## Comparing the four groups

![](week7assets/late5.png){fig-align="center"}


## What are we estimating?

- Never takers *never* take up the treatment
  - If we have no variation in treatment for them, we can't estimate the effect of the treatment on them
  - Same goes for always takers

- That leaves us with two groups: compliers and defiers
  - Let's make one more assumption: $P(X(1)-X(0)<0)=0$ (or $>0$)
  - i.e. there are no defiers


## What are we estimating?

- This is called the local average treatment effect (LATE)

- This is the *effect of the treatment on compliers*
  - i.e. the effect of the treatment on those who are induced to take up treatment because of the instrument

- Again, if treatment is homogeneous, the effect on compliers is the same on others
  - In this case, the LATE is the ATE
  - But, do we really think this is ever true?


## Different instruments, different effects

- One implication of LATE is that different instruments can identify different effects
  - In other words, the group of "compliers" can differ across instruments, even if all the instruments are valid

- Example:
  - Interested in the effects of going to college
  - Instrument 1: whether or not you live close to a college
  - Instrument 2: whether or not you have a scholarship


## This might be okay, though

- When we think about interventions, we often think about the *margins* of the intervention
  - In other words, we are interested in the effect of the intervention on those who are induced to take up the intervention

- If a government is considering a new program/policy, then the effects will always be driven by those who are induced to take up the program/policy
  - In other words, the compliers
  - So identifying a LATE might actually be policy relevant in some contexts!

- One final note:
  - The LATE interpretation also holds for non-binary instruments
  - Interpretation of what it means to be a "complier" is a bit more complicated, though


## Some notes on compliers under LATE

- The first stage tells us the complier share of the overall population (if it's binary)
  - A small note: the more compliers there are, the less problematic violations of the exclusion restriction are (Angrist et al., 1996)

- We can learn a bit about characteristics of compliers, too, using a similar intuition
  - Works with discrete characteristics


## Weak instruments

- Let's return to our discussion about the first stage: $Z$ must be correlated with $X$
  - If $Z$ is not correlated with $X$, then we cannot identify the effect of $X$ on $Y$

- We often think about this in terms of the first stage F-statistic
  - Is the F-statistic is high "enough"?
  - What is high "enough" in this context?]

- We used to think about $F>10$, but recent literature argues it should be even higher!
  - e.g. Plfueger and Wang (2013) closer to 23
  - Lee et al. (2020) argue for 100 or higher
    - Focus on t-statistic, not the coefficient
    - Lower F-statistics mean the critical value should actually be higher than 1.96
  - No "right" answer, but higher is better


## Compulsory school attendance and earnings

- Let's look at an example: Angrist and Krueger (1991)
  - They are interested in the returns to schooling

- Basic idea:
  - School attendance laws require students to stay in school until a certain age
  - Consider a school year that starts on August 1st
    - Someone who was born on July 31st will be one year older at the start of the school year than someone born on August 2nd

- Instrument for school attendance using the time of birth
  - "Individuals born in the beginning of the year start school at an older age, and can therefore drop out after completing less schooling than individuals born near the end of the year."


## Compulsory school attendance and earnings, year/quarter of birth

![](week7assets/angristkrueger1.png){fig-align="center"}


## Compulsory school attendance and earnings, reduced form

![](week7assets/angristkrueger2.png){fig-align="center"}


## The model

\begin{gather} y = \beta s + \varepsilon \\
                s = \gamma Z + \eta, \end{gather}

- $y$ is earnings
- $s$ is years of schooling
- $Z$ is the instrument
  - They  use interactions between year and quarter of birth


## Bias in OLS

- If $\varepsilon$ and s are correlated, then OLS gives biased estimates

- The bias is:
  \begin{gather} E\left[\hat\beta_{OLS}-\beta\right] = \frac{Cov(s,\varepsilon)}{Var(s)} \end{gather}

- Let's rename this ratio as $\frac{\sigma_{\varepsilon\eta}}{\sigma_{s}^2}$


## Bias in OLS and first stage F-statistics

- It turns out we can approximate the bias in 2SLS as:
\begin{gather}\frac{\sigma_{\varepsilon\eta}}{\sigma_{s}^2}\frac{1}{F+1} \end{gather}

- Note that if the first stage is weak, $F$ is closer to zero and the 2SLS bias is closer to the OLS bias
  - If the first stage is strong, $F$ is larger and the bias gets closer to zero


## Bound et al. (1995), *JASA*

- Bound et al. (1995) were the first to point this problem out
  - You see, Angrist and Krueger, added a *lot* of instruments to some of their specifications
  - The addition of more instruments can be a problem: it tends to decrease the first-stage F-statistic

- Let's take a look at their results


## Note what happens to the IV coefficient as F decreases

![](week7assets/bound1.png){fig-align="center"}


## A weak first stage won't necessarily lead to large standard errors

- I used to think a weak first stage would lead to large standard errors
  - This is not necessarily true

- Bound et al. do a simulation exercise where they create *completely random instruments*
  - In other words, by construction, the instruments should not predict the endogenous variable


## Random instruments and standard errors

![](week7assets/bound2.png){fig-align="center"}


## More problems with weak instruments
\begin{gather*} \hat{\beta}_{2SLS} = \frac{Cov(Y, Z)}{Cov(X, Z)} \end{gather*}

- We've seen this before: the IV estimate is a ratio of covariances (or the ratio of the reduced form and the first stage)

- With weak instruments, $Cov(X,Z)$ is small
  - This means that small changes in $Cov(Y,Z)$ can lead to large changes in $\hat{\beta}_{2SLS}$
  - Asymptotically, this isn't a problem. But in small samples...
  - We're back to something we've seen before: might need relatively large sample sizes to reliably estimate what you want to estimate!

- This is a problem with ratios more generally. Try bootstrapping a ratio where the numerator is small and see what happens.


## Example from Goldsmith-Pinkham's slides

- Rather than create my own, I'm going to use Paul's example
  - https://github.com/paulgp/applied-methods-phd

- Let's look at three things:
  - The behavior of the first stage when the instrument is weak (he calls this Pi hat)
  - The relationship between the first stage and the second stage
  - The behavior of the 2SLS estimator as a whole when the instrument is weak


## Marginally significant first stage, simulations

![](week7assets/weak_iv_tauhat_hist.png){fig-align="center"}


## Marginally significant first stage, simulations

![](week7assets/weak_iv_tauhat_betahat.png){fig-align="center"}


## Marginally significant first stage, simulations

![](week7assets/weak_iv_betahat_hist.png){fig-align="center"}


## Marginally significant first stage, simulations

- The distribution of $\hat\beta$ is absolutely not normal
  - Asymptotics won't save you here!

- Note that this problem can (mostly) disappear when the first stage is strong
  - For example, a larger sample size will lead to better behavior of the estimator

- Again, asymptotic approximations -- just like with the CLT and skewed distributions -- won't necessary apply


## Takewaways

- Looking at the second stage won't *necessarily* tell you if the first stage is weak

- Nowadays, it is very common to report the first stage F-statistic
  - You can't write a paper without reporting it

- The key idea is that many instruments can increase bias, even if it isn't obvious
  - Part of the problem is related to overfitting, which we'll cover in a few Weeks
  - In fact, Angrist and Kolesar (2023) argue that weak instruments may not be a huge problem in the just-identified (i.e. one instrument) case!

- Chernozhukov and Hansen (2008) detail a routine to calculate confidence intervals that are valid regardless of the strength of the first stage (in the just-identified case).
  - Packages in both Stata and R


## Overidentification tests

- In the previous case, we had many instruments
  - This is called overidentification

- With overidentification, it is possible to test the "validity" of the instruments...
  - ... if we are willing to assume at least one of the instruments is valid!

- The intuition: different instruments should give us the same result


## Overidentification tests

- Consider a single endogenous $X$ and two instruments, $Z_1$ and $Z_2$:
\begin{gather} \mathbb{E}\left[Z_1\right]=\mathbb{E}\left[Z_1X\right]\beta \\
                \mathbb{E}\left[Z_2\right]=\mathbb{E}\left[Z_2X\right]\beta \end{gather}

- Assumption for overidentification are saying that $\beta$ solves both equations simultaneously
  - In other words, $\beta$ is the same for both instruments

- If one instrument is valid and the other isn't, they should give us different results
  - We can test this!
  - Sometimes referred to as an overidentification test, a Sargan test (or Sargan's J), or a Sargan-Hansen test


## Overidentification tests

- Consider a single endogenous $X$ and two instruments, $Z_1$ and $Z_2$:
\begin{gather} \tag{23} \mathbb{E}\left[Z_1\right]=\mathbb{E}\left[Z_1X\right]\beta \\
                \tag{24} \mathbb{E}\left[Z_2\right]=\mathbb{E}\left[Z_2X\right]\beta \end{gather}

- But there's a problem...
  - And I already mentioned it. What's the problem?
\pause
- In a world of LATEs, the instruments can identify different effects
  - So we can't really test the validity of the instruments!

- TLDR: overidentification tests are not very useful (my take, anyway)


# Shift-share instruments

## Shift-share instruments (SSIV)

- Much of this is based on [Peter Hull's notes](https://about.peterhull.net/metrix)
  - He is an expert on this stuff! I am not.
  
- Often referred to as "Bartik shocks/instruments" ([Bartik 1991](https://research.upjohn.org/up_press/77/?ref=https://githubhelp.com))

- Can find a review in [Borusyak et al. (2024)](https://academic.oup.com/ectj/advance-article/doi/10.1093/ectj/utae003/7590835?login=true) (though it's a bit more general than just SSIV)


## Before the theory...

- Before getting into theory, let's look at an example

- [Autor et al. (2013)](https://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.103.6.2121)
  - The China Syndrome: Local Labor Market Effects of Import Competition in the United States
  
::: {.callout-note appearance="minimal"}
## Abstract
We analyze the effect of rising Chinese import competition between 1990 and 2007 on US local labor markets, **exploiting cross-market variation in import exposure stemming from initial differences in industry specialization and instrumenting for US imports using changes in Chinese imports by other high-income countries.**
:::

. . .

- Basic idea: use initial *shares* of import exposure
  - Instrument using change in Chinese imports in *other* high-income countries
  - This is the basic setup for a SSIV
  - They do more, so we just focus on the SSIV part


## Chinese exports and local labor markets

- Interested in wages ($W_i$), employment for traded goods ($L_{Ti}$), and employment for non-traded goods ($L_{Ni}$)

\begin{align} W_i =& \;\sum_j c_{ij}\frac{L_{ij}}{L_{Ni}}\left[\theta_{ijC}E_{Cj}-\sum_k \theta_{ijk}\phi_{Cjk}A_{Cj}\right] \\
              L_{Ti} =& \;\rho_i\sum_j c_{ij}\frac{L_{ij}}{L_{Ti}}\left[\theta_{ijC}E_{Cj}-\sum_k \theta_{ijk}\phi_{Cjk}A_{Cj}\right] \\
              L_{Ni} =& \;\rho_i\sum_j c_{ij}\frac{L_{ij}}{L_{Ni}}\left[-\theta_{ijC}E_{Cj}+\sum_k \theta_{ijk}\phi_{Cjk}A_{Cj}\right] \end{align}


## Chinese exports and local labor markets

\begin{align} W_i =& \;\sum_j c_{ij}\frac{L_{ij}}{L_{Ni}}\left[\theta_{ijC}E_{Cj}-\sum_k \theta_{ijk}\phi_{Cjk}A_{Cj}\right] \\
              L_{Ti} =& \;\rho_i\sum_j c_{ij}\frac{L_{ij}}{L_{Ti}}\left[\theta_{ijC}E_{Cj}-\sum_k \theta_{ijk}\phi_{Cjk}A_{Cj}\right] \\
              L_{Ni} =& \;\rho_i\sum_j c_{ij}\frac{L_{ij}}{L_{Ni}}\left[-\theta_{ijC}E_{Cj}+\sum_k \theta_{ijk}\phi_{Cjk}A_{Cj}\right] \end{align}

- $A_{Cj}$ is change in China's "export-supply capability" in each industry
- $E_{Cj}$ is change in China's change in expenditures *within China* in each industry
- $\theta_{ijC}$ is initial share of output in region $i$ that is shipped to China
- $\theta_{ijk}$ is initial share of output in region $i$ that is shipped to each market $k$
- $\phi_{Cjk}$ is initial share of imports from China in total purchases


## Chinese exports and local labor markets

\begin{align} W_i =& \;\sum_j c_{ij}\frac{L_{ij}}{L_{Ni}}\left[\theta_{ijC}E_{Cj}-\sum_k \theta_{ijk}\phi_{Cjk}A_{Cj}\right] \\
              L_{Ti} =& \;\rho_i\sum_j c_{ij}\frac{L_{ij}}{L_{Ti}}\left[\theta_{ijC}E_{Cj}-\sum_k \theta_{ijk}\phi_{Cjk}A_{Cj}\right] \\
              L_{Ni} =& \;\rho_i\sum_j c_{ij}\frac{L_{ij}}{L_{Ni}}\left[-\theta_{ijC}E_{Cj}+\sum_k \theta_{ijk}\phi_{Cjk}A_{Cj}\right] \end{align}

- "Positive shocks to China's export supply decrease region $i$'s wage and employment in traded goods and increase its employment in non-traded goods. Similarly, positive shocks to China's import demand increase region $i$'s wage and employment in traded goods and decrease its employment in non-traded goods." (p. 2127)


## What is endogenous here?

. . .

- Initial share is certainly endogenous
- The change for a specific region also certainly endogenous!

. . .

- "our main measure of local labor market exposure to import competition is the change in Chinese import exposure per worker in a region, where imports are apportioned to the region according to its share of national industry employment:" (p. 2128)

\begin{gather} \Delta IPW_{uit} = \sum_j\frac{L_{ijt}}{L_{ujt}}\frac{\Delta M_{ucjt}}{L_{it}} \end{gather}

- $\Delta M_{ucjt}$ is change in US imports from China in industry $j$


## The change is endogenous, too!

::: {.callout-note appearance="minimal"}
## p. 2128-2129
"A concern for our subsequent estimation is that realized US imports from China... may be correlated with industry import demand shocks, in which case the OLS estimate of how increased imports from China affect US manufacturing employment may understate the true impact, as both US employment and imports may be positively correlated with unobserved shocks to US product demand."
:::

. . .

- The solution?

. . .

- "[W]e instrument for growth in Chinese imports to the United States using the contemporaneous composition and growth of Chinese imports in eight other developed countries. Specifically, we instrument the measured import exposure variable $\Delta IPW_{uit}$ with a non-US exposure variable $\Delta IPW_{oit}$ that is constructed using data on contemporaneous industry-level growth of Chinese exports to other high-income markets:" (p. 2129)

\begin{gather} \Delta IPW_{oit} = \sum_j \frac{L_{ijt-1}}{L_{ujt-1}}\frac{\Delta M_{ocjt}}{L_{it-1}} \end{gather}


## Back to Hull's notes

- Same paper, different syntax. Instrument is

\begin{gather} z_\ell = \sum_n s_{\ell n}g_n \end{gather}

for the model

\begin{gather} y_\ell = \beta x_\ell + w'_\ell + \varepsilon_\ell \end{gather}

- $x_\ell$: growth of Chinese import comp. in location $\ell$
- $y_\ell$: growth of outcome of interest
- $g_n$: growth of Chinese exports in industry $n$ to non-US countries
- $s_{\ell n}$: initial share of employment (well, 10-year lags)
- $z_\ell$: instrument for $x_\ell$ (predicted growth of Chinese import comp.)


## What do we need?

Following Borusyak et al. (2024):

- "Quasi-random shock assignment": In our example, this is true when "expected growth of chinese imports $g_n$ is the same across industries with high vs. low [shock-level unobservables] $\bar{\varepsilon}_n$ (and [average exposure] $s_n$)"
- "Many uncorrelated shocks": In our example, "imposes many uncorrelated industry growth rates and sufficiently different industry specialization across locations"
  - Hull notes that this is basically a "shock-level law of large numbers"
  - Essentially, the expected value of $\sum_n s_n g_n \bar{\varepsilon}_n$ is zero


## What do we need?

- Important change: incomplete shares
  - Initial assumption is "constant sum-of-shares": $S_\ell=\sum_n s_{\ell n}=1\;\forall\;\ell$

- In our example, this is not true!
  - In practice, we can control for the sum-of-shares $S_\ell$
  - In panels, control for interaction between sum-of-shares and the year fixed effect (period effects)


## Back to the paper

- SSIV in the paper is $z_{\ell t}=\sum_n s_{\ell nt}g_{nt}$
  - $n$: 397 different industries $\times$ two periods
  - $g_{nt}$: growth of Chinese imports in non-US economics per US worker
  - $s_{\ell nt}$: lagged share of mfg. industry $n$ in total employment of location $\ell$


- In practice, Borusyak et al. (2024) suggest clustering by industry (since that is essentially the level of treatment)


## Check "balance"

- Can regress industry covariates on the shock. We expect null results.


- Borusyak et al. (Table 3):


```{r}
#| echo: false
#| eval: true

mat <- matrix(c("Production workers’ share of employment, 1991", "-0.011", "(0.012)",
  "Ratio of capital to value-added, 1991",	"–0.007",	"(0.019)",
  "Log real wage (2007 USD), 1991", "-0.005", "(0.022)",
  "Computer investment as share of total, 1990", "0.750", "(0.465)",
  "High-tech equipment as share of total investment, 1990", "0.532", "(0.296"), 
  ncol = 3, byrow = TRUE)
colnames(mat) <- c("Balance variable", "Coef.", "SE")

kable(mat, format = "html", booktabs = TRUE,
  align = "lcc", row.names = FALSE) |>
  column_spec(1,width = "16cm") |>
  column_spec(c(2:3),width = "3.5cm") |>
  footnote("The table is Panel A. of Table 3 in Borusyak et al. (2024).", general_title = "",
    footnote_as_chunk = TRUE,
    escape = FALSE
    ) |>
  kable_styling(font_size = 24, full_width = F)
```


- Key: "Shocks do not predict industry-level observables controlling for period FE"
  - (Can also check location-level characteristics, as Borusyak et al. do)


## What are we identifying?

- [Goldsmith-Pinkham, Sorkin, and Swift (2020)](https://www.aeaweb.org/articles?id=10.1257/aer.20181047)
  - See paper for more details
  
- Big takeaway: they show that the SSIV estimator is equivalent to using many different IVs, one for each industry/market
  - You can derive the weights!

- SSIV puts more weight on:
  - Share instruments with more extreme shocks $g_n$
  - Largest first stages


## Requirement: "share exogeneity"

- Share exogeneity means something a little different here: "all relevant unobservables are unforecastable from the shares" (Hull's notes)

- Key: Goldsmith-Pinkham, Sorkin, and Swift (2020) show that you can test it!
  - Check $n$ with high weights
  - Can do balance and pre-trend tests


# Recentered IV

## Recentered IV

- [Borusyak and Hull (2023)](https://onlinelibrary.wiley.com/doi/full/10.3982/ECTA19367)

- The idea:
  - Imagine a policy that rolls out over many years, like the building of roads
  - The location of roads might be endogenous, but maybe the exact completion data is not!
  - If the date of completion is somewhat random, we may be able to create an IV
  
- Example I'll use: roads in India


## Roads in India, by wave of NSS

- NSS has three waves of interest:
  - 2004-2005 (wave 61)
  - 2007-2008 (wave 64)
  - 2011-2012 (wave 68)


## We might be interested in the following

\begin{gather} y_{it} = \alpha_i + \delta_t + \beta roads_{it} + X_{it} + \varepsilon_{it} \end{gather}

- $y_{it}$ is some outcome of interest
- $\alpha_i$ is district FE
- $\delta_t$ is time FE
- $roads_{it}$ is the length of roads in the district, $\beta$ is the coefficient of interest
- But there is a concern... what?

. . .

- Perhaps roads are built in places that are trending in certain ways


## Roads in India, by wave of NSS

```{r}
#| echo: false
#| eval: true
#| crop: true
#| fig.align: center
library(terra)
districts <- vect("week7files/india_2011_district.shp")
districts$distfe <- paste0(districts$st_cen_cd, "-", districts$dt_cen_cd)
roads <- read_csv("week7files/actualroads.csv")
districts <- districts |>
  left_join(roads, by = "distfe") |>
  filter(!is.na(wave))
districts <- districts |>
  group_by(distfe) |>
  mutate(obs = n()) |>
  filter(obs==3) |>
  ungroup()
g1 <- ggplot() +
  geom_spatvector(data = districts, aes(fill = log(length))) +
  scale_fill_distiller("Length (log)", palette = "Spectral", direction = 1) +
  facet_wrap(~wave) +
  theme_bw() +
  labs(subtitle = "Actual roads") +
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))
g2 <- ggplot() +
  geom_spatvector(data = districts, aes(fill = log(expectedlength))) +
  scale_fill_distiller("Length (log)", palette = "Spectral", direction = 1) +
  facet_wrap(~wave) +
  theme_bw() +
  labs(subtitle = "Expected roads") +
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))
plot_grid(g1, g2, ncol = 1) +
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))
```


## Roads in India, by wave of NSS

```{r}
#| echo: false
#| eval: true
#| crop: true
#| fig.align: center
ggplot() +
  geom_spatvector(data = districts, aes(fill = log(length) - log(expectedlength))) +
  scale_fill_distiller("Length (log)", palette = "Spectral", direction = 1) +
  facet_wrap(~wave) +
  theme_bw() +
  labs(subtitle = "Deviation from expected") +
  theme(legend.position = "bottom") +
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))
```


## The basic idea

- The basic idea is similar to randomization inference

- Find "expected" value based on randomized completion date
  - Instrument is actual - expected
  - This is the recentered IV