Model-free Treatment Effect Estimators

The didnpreg command contains tools for computing both heterogenous and average treatment effects for the treated in a model-free differences-in-differences framework.

Usage

didnpreg(...)

# S3 method for class 'formula'
didnpreg(
  formula,
  data = stop("argument 'data' is missing"),
  subset,
  bws = NULL,
  bwmethod = "opt",
  boot.num = 399,
  TTx = "TTa",
  level = 95,
  print.level = 1,
  digits = 4,
  cores = 1,
  seed = 17345168,
  ...
)

# Default S3 method
didnpreg(
  outcome,
  regressors,
  time,
  treated,
  treatment_period,
  weights = NULL,
  bws = NULL,
  bwmethod = "opt",
  boot.num = 399,
  TTx = "TTa",
  level = 95,
  print.level = 1,
  digits = 4,
  cores = 1,
  seed = 17345168,
  ...
)

Arguments

formula: an object of class formula (or one that can be coerced to that class): a symbolic description of the model. The details of model specification are given under `Details'
data: name of the data frame; must be specified if the 'formula' method is used
subset: NULL, optional subsample of 'data'
bws: a bandwidth specification. A vector of bandwidths of length corresponding to the number of regressors.
bwmethod: bandwidth type. 2 options can be specified. "opt" is the default option, the plug-in is rule of thumb for continuous and basic for categorical. "CV" will calculate cross-validated bandwidths.
boot.num: an single value specifying the number of bootstrap replications. Default is 399.
TTx: Can take values 'TTa' or 'TTb'. 'TTa' is for estimating the Treatment Effect on the Treated by averaging over treated after the treatment. 'TTb' is the Unconditional Treatment Effect on the Treated. TTb estimates by averaging over all (before and after) treated. Depending on the sample, calcularing TTb may take some time. Default is 'TTb'.
print.level: the level of printing; larger number implies more output is printed. Default is 1. 0 suppresses all printing.
cores: Integer specifies the number of cores to be used for parallel computation.
seed: integer used for the random number generation for the replication purposes. Default is 17345168.
outcome: a vector, matrix, or data frame of length \(NT\). The outcome can be a continuous or dummy variable.
regressors: a data frame with \(NT\) rows that contains regressors. A data frame class is required to identify the type/class of each regressor.
time: a vector, matrix, or data frame of length \(NT\) that specifies in which period id is observed.
treated: a vector, matrix, or data frame of length \(NT\) with zeros for the control and ones for the treated observations.
treatment_period: a vector, matrix, or data frame of length \(NT\) with zeros for the period before treatment and ones for the period of treatment and after.
weights: NULL,

Value

didnpreg returns a list containing:

`NT`	Total number of observations

`esample`	A vector of TRUE/FALSE values identifying observations used in estimation. Relevant for the 'formula' method but complete cases will also be checked in the matrix method

`sample1`	A vector of TRUE/FALSE values identifying treated observations.

`sample11`	A vector of TRUE/FALSE values identifying treated observations right after the treatment

`sample10`	A vector of TRUE/FALSE values identifying treated observations just before the treatment

`sample01`	A vector of TRUE/FALSE values identifying observations in control group right after the treatment

`sample00`	A vector of TRUE/FALSE values identifying observations in control group just before the treatment

`n11`	A number of treated observations right after the treatment

`n10`	A number of treated observations just before the treatment

`n01`	A number of observations in control group right after the treatment

`n00`	A number of observations in control group just before the treatment

`regressor.type`	A vector of length 3 with number of continuous, unordered categorical, and ordered categorical regressors.

`bwmethod`	bandwidth type

`bw.time`	Time in seconds it took to calculate bandwidths. For bandwidth type "opt" is 0.

`bws`	Data frame with variable names, type of the regressor and bandwidths.

`boot.time`	Time in seconds it took to bootstrap the standard errors.

`boot.num`	Number of bootstrap replications.

`bw11`	Bandwidths calculated for the sample of treated right after the treatment.

`bw10`	Bandwidths calculated for the sample of treated just before the treatment.

`bw01`	Bandwidths calculated for the sample of of observations in control group right after the treatment.

`bw00`	Bandwidths calculated for the sample of observations in control group just before the treatment

`do.TTb`	TRUE/FALSE whether to perform TTb

`TTa.positions.in.TTb`	Positions of TTa observations in TTb. Only if `do.TTb`

`TTa`	the DiD estimator of the avarage unconditional TT

`TTa.i`	the DiD estimators of the unconditional TT

`TTb`	the DiD estimator of the avarage unconditional TT

`TTb.i`	the DiD estimators of the unconditional TT

`TTa.se`	the standard error of the DiD estimator of the avarage unconditional TT

`TTb.se`	the standard error of the DiD estimator of the avarage unconditional TT

`TTx`	the DiD estimators of the conditional TT (also known as CATET)

`TTa.i.boot`	Matrix of the size \(n_{11} \times boot.num\)

`TTb.i.boot`	Matrix of the size \(n_{1} \times boot.num\)

Details

The formula shell contain multiple parts separated by '|'. An example is

form1 <- y ~ x1 + x2 | time | treated | treatment_period | weights

weights can be omitted if not available

form1 <- y ~ x1 + x2 | time | treated | treatment_period

References

Daniel J. Henderson and Stefan Sperlich (2023). A Complete Framework for Model-Free Difference-in-Differences Estimation. Foundations and Trends in Econometrics, 12(3), 232-323 http://dx.doi.org/10.1561/0800000046.

Author

Oleg Badunenko oleg.badunenko@brunel.ac.uk,

Daniel J. Henderson djhender@cba.ua.edu,

Stefan Sperlich stefan.sperlich@unige.ch

Examples

if (FALSE) { # \dontrun{
  data(DACAsub, package = "didnp")
  # will get a data frame 'DACAsub' with 330106 rows and 18 columns

  # get the subsample
  DACAsub$mysmpl <- mysmpl <-
    DACAsub$a1922==1 & !is.na(DACAsub$a1922) &
    DACAsub$htus==1 & !is.na(DACAsub$htus)

  # generate 'treatment_period'
  DACAsub$treatment_period <- ifelse(DACAsub[,"year"]>2011,1,0)

  # define formula with the weight
  form1 <- inschool ~ fem + race + var.bpl + state + age + yrimmig +
    ageimmig | inschool | year | elig | treatment_period | perwt

  # or without the weight
  form11 <- inschool ~ fem + race + var.bpl + state + age + yrimmig +
    ageimmig | inschool | year | elig | treatment_period

  ## Syntax using formula
  # suppress output
  tym1a <- didnpreg(
    form1,
    data = DACAsub,
    subset = mysmpl,
    bwmethod = "opt",
    boot.num = 399,
    TTb = FALSE,
    print.level = 0,
    cores = 4)

  # Print the summary
  summary(tym1a)

  ## Use CV bandwidths
  tym1aCV <- didnpreg(
    form1,
    data = DACAsub,
    subset = mysmpl,
    bwmethod = "CV",
    boot.num = 399,
    TTb = FALSE,
    print.level = 1,
    cores = 4)

  # Print the summary
  summary(tym1aCV)

  ## Calculate also TTb (will take longer)
  tym1bCV <- didnpreg(
    form1,
    data = DACAsub,
    subset = mysmpl,
    bwmethod = "CV",
    boot.num = 399,
    TTb = TRUE,
    print.level = 1,
    cores = 4)

  # Print the summary
  summary(tym1bCV)

  ## Syntax using matrices

  tym1aM <- didnpreg(
    outcome = DACAsub[mysmpl,"inschool"],
    regressors = DACAsub[mysmpl,c("fem", "race", "var.bpl", "state", "age", "yrimmig", "ageimmig")],
    id = DACAsub[mysmpl,"inschool"],
    time = DACAsub[mysmpl,"year"],
    treated = DACAsub[mysmpl,"elig"],
    treatment_period = ifelse(DACAsub[mysmpl,"year"]>2011,1,0),
    weights = DACAsub[mysmpl,"perwt"],
    bwmethod = "opt",
    boot.num = 399,
    TTb = FALSE,
    print.level = 1,
    cores = 4)

  # Print the summary
  summary(tym1aM)

} # }