Model-free Treatment Effect Estimators
didnpreg.Rd
The didnpreg
command contains tools for computing both heterogenous and average treatment effects for the treated in a model-free differences-in-differences framework.
Usage
didnpreg(...)
# S3 method for class 'formula'
didnpreg(
formula,
data = stop("argument 'data' is missing"),
subset,
bws = NULL,
bwmethod = "opt",
boot.num = 399,
TTx = "TTa",
level = 95,
print.level = 1,
digits = 4,
cores = 1,
seed = 17345168,
...
)
# Default S3 method
didnpreg(
outcome,
regressors,
time,
treated,
treatment_period,
weights = NULL,
bws = NULL,
bwmethod = "opt",
boot.num = 399,
TTx = "TTa",
level = 95,
print.level = 1,
digits = 4,
cores = 1,
seed = 17345168,
...
)
Arguments
- formula
an object of class formula (or one that can be coerced to that class): a symbolic description of the model. The details of model specification are given under `Details'
- data
name of the data frame; must be specified if the 'formula' method is used
- subset
NULL, optional subsample of 'data'
- bws
a bandwidth specification. A vector of bandwidths of length corresponding to the number of regressors.
- bwmethod
bandwidth type. 2 options can be specified. "opt" is the default option, the plug-in is rule of thumb for continuous and basic for categorical. "CV" will calculate cross-validated bandwidths.
- boot.num
an single value specifying the number of bootstrap replications. Default is 399.
- TTx
Can take values 'TTa' or 'TTb'. 'TTa' is for estimating the Treatment Effect on the Treated by averaging over treated after the treatment. 'TTb' is the Unconditional Treatment Effect on the Treated. TTb estimates by averaging over all (before and after) treated. Depending on the sample, calcularing TTb may take some time. Default is 'TTb'.
- print.level
the level of printing; larger number implies more output is printed. Default is 1. 0 suppresses all printing.
- cores
Integer specifies the number of cores to be used for parallel computation.
- seed
integer used for the random number generation for the replication purposes. Default is 17345168.
- outcome
a vector, matrix, or data frame of length \(NT\). The outcome can be a continuous or dummy variable.
- regressors
a data frame with \(NT\) rows that contains regressors. A data frame class is required to identify the type/class of each regressor.
- time
a vector, matrix, or data frame of length \(NT\) that specifies in which period
id
is observed.- treated
a vector, matrix, or data frame of length \(NT\) with zeros for the control and ones for the treated observations.
- treatment_period
a vector, matrix, or data frame of length \(NT\) with zeros for the period before treatment and ones for the period of treatment and after.
- weights
NULL,
Value
didnpreg
returns a list containing:
NT | Total number of observations |
esample | A vector of TRUE/FALSE values identifying observations used in estimation. Relevant for the 'formula' method but complete cases will also be checked in the matrix method |
sample1 | A vector of TRUE/FALSE values identifying treated observations. |
sample11 | A vector of TRUE/FALSE values identifying treated observations right after the treatment |
sample10 | A vector of TRUE/FALSE values identifying treated observations just before the treatment |
sample01 | A vector of TRUE/FALSE values identifying observations in control group right after the treatment |
sample00 | A vector of TRUE/FALSE values identifying observations in control group just before the treatment |
n11 | A number of treated observations right after the treatment |
n10 | A number of treated observations just before the treatment |
n01 | A number of observations in control group right after the treatment |
n00 | A number of observations in control group just before the treatment |
regressor.type | A vector of length 3 with number of continuous, unordered categorical, and ordered categorical regressors. |
bwmethod | bandwidth type |
bw.time | Time in seconds it took to calculate bandwidths. For bandwidth type "opt" is 0. |
bws | Data frame with variable names, type of the regressor and bandwidths. |
boot.time | Time in seconds it took to bootstrap the standard errors. |
boot.num | Number of bootstrap replications. |
bw11 | Bandwidths calculated for the sample of treated right after the treatment. |
bw10 | Bandwidths calculated for the sample of treated just before the treatment. |
bw01 | Bandwidths calculated for the sample of of observations in control group right after the treatment. |
bw00 | Bandwidths calculated for the sample of observations in control group just before the treatment |
do.TTb | TRUE/FALSE whether to perform TTb |
TTa.positions.in.TTb | Positions of TTa observations in TTb. Only if do.TTb |
TTa | the DiD estimator of the avarage unconditional TT |
TTa.i | the DiD estimators of the unconditional TT |
TTb | the DiD estimator of the avarage unconditional TT |
TTb.i | the DiD estimators of the unconditional TT |
TTa.se | the standard error of the DiD estimator of the avarage unconditional TT |
TTb.se | the standard error of the DiD estimator of the avarage unconditional TT |
TTx | the DiD estimators of the conditional TT (also known as CATET) |
TTa.i.boot | Matrix of the size \(n_{11} \times boot.num\) |
TTb.i.boot | Matrix of the size \(n_{1} \times boot.num\) |
Details
The formula shell contain multiple parts separated by '|'. An example is
form1 <- y ~ x1 + x2 | time | treated | treatment_period | weights
weights can be omitted if not available
form1 <- y ~ x1 + x2 | time | treated | treatment_period
References
Daniel J. Henderson and Stefan Sperlich (2023). A Complete Framework for Model-Free Difference-in-Differences Estimation. Foundations and Trends in Econometrics, 12(3), 232-323 http://dx.doi.org/10.1561/0800000046.
Author
Oleg Badunenko oleg.badunenko@brunel.ac.uk,
Daniel J. Henderson djhender@cba.ua.edu,
Stefan Sperlich stefan.sperlich@unige.ch
Examples
if (FALSE) { # \dontrun{
data(DACAsub, package = "didnp")
# will get a data frame 'DACAsub' with 330106 rows and 18 columns
# get the subsample
DACAsub$mysmpl <- mysmpl <-
DACAsub$a1922==1 & !is.na(DACAsub$a1922) &
DACAsub$htus==1 & !is.na(DACAsub$htus)
# generate 'treatment_period'
DACAsub$treatment_period <- ifelse(DACAsub[,"year"]>2011,1,0)
# define formula with the weight
form1 <- inschool ~ fem + race + var.bpl + state + age + yrimmig +
ageimmig | inschool | year | elig | treatment_period | perwt
# or without the weight
form11 <- inschool ~ fem + race + var.bpl + state + age + yrimmig +
ageimmig | inschool | year | elig | treatment_period
## Syntax using formula
# suppress output
tym1a <- didnpreg(
form1,
data = DACAsub,
subset = mysmpl,
bwmethod = "opt",
boot.num = 399,
TTb = FALSE,
print.level = 0,
cores = 4)
# Print the summary
summary(tym1a)
## Use CV bandwidths
tym1aCV <- didnpreg(
form1,
data = DACAsub,
subset = mysmpl,
bwmethod = "CV",
boot.num = 399,
TTb = FALSE,
print.level = 1,
cores = 4)
# Print the summary
summary(tym1aCV)
## Calculate also TTb (will take longer)
tym1bCV <- didnpreg(
form1,
data = DACAsub,
subset = mysmpl,
bwmethod = "CV",
boot.num = 399,
TTb = TRUE,
print.level = 1,
cores = 4)
# Print the summary
summary(tym1bCV)
## Syntax using matrices
tym1aM <- didnpreg(
outcome = DACAsub[mysmpl,"inschool"],
regressors = DACAsub[mysmpl,c("fem", "race", "var.bpl", "state", "age", "yrimmig", "ageimmig")],
id = DACAsub[mysmpl,"inschool"],
time = DACAsub[mysmpl,"year"],
treated = DACAsub[mysmpl,"elig"],
treatment_period = ifelse(DACAsub[mysmpl,"year"]>2011,1,0),
weights = DACAsub[mysmpl,"perwt"],
bwmethod = "opt",
boot.num = 399,
TTb = FALSE,
print.level = 1,
cores = 4)
# Print the summary
summary(tym1aM)
} # }