Machine Learning for Set-Identified Linear Models (Job Market Paper; Chapter 1 of MIT Doctoral Thesis)
Abstract: Set-identified models often restrict the number of covariates leading to wide identified sets in practice. This paper provides estimation and inference methods for set-identified linear models with high-dimensional covariates where the model selection is based on modern machine learning tools. I characterize the support function of the identified set using a semiparametric moment condition. Combining Neyman-orthogonality and sample splitting ideas, I construct a root-N consistent,uniformly asymptotically Gaussian estimator of the support function. I also prove the validity of the Bayesian bootstrap procedure to conduct inference about the identified set. I provide a general method to construct a Neyman-orthogonal moment condition for the support function. I apply this result to estimate sharp non-parametric bounds on the average treatment effect in Lee (2008)’s model of endogenous selection and substantially tighten the bounds on this parameter in Angrist et al. (2006)’s empirical setting. I also apply this result to estimate sharp identified sets for two other parameters - a new parameter, called a partially linear predictor, and the average partial derivative when the outcome variable is interval-censored.
Journal submitted version: Semenova_Machine_Learning_Set_Identified_Models.pdf
Code for ABK (2006) replication can be downloaded here https://drive.google.com/open?id=1_9L_N-bj5nYoTBZZHYEEl-4XpzRK-TSY
Machine Learning for Dynamic Discrete Choice (Chapter 2 of MIT Doctoral Thesis)
Abstract: Dynamic discrete choice models often discretize the state vector and restrict its dimension in order to achieve valid inference. I propose a novel two-stage estimator for the set-identified structural parameter that incorporates high-dimensional state space into the dynamic model of imperfect competition. In the first stage, I estimate the state variable’s law of motion and the equilibrium policy function using machine learning tools. In the second stage, I plug the first-stage estimates into a moment inequality and solve for the structural parameter. The moment function is presented as the sum of two components, where the first one embodies the equilibrium assumption and the second one is a bias correction term that makes the sum insensitive (i.e., Neyman-orthogonal) to first-stage bias. The proposed estimator uniformly converges at the root-N rate and is used to construct confidence regions. The results developed here can be used to incorporate high-dimensional state space into classic dynamic discrete choice models, for example, those considered in Rust (1987), Bajari et al. (2007), and Scott (2013).
Orthogonal Machine Learning for Demand Estimation: High Dimensional Causal Inference in Dynamic Panels (with Matt Goldman, Victor Chernozhukov, and Matt Taddy) (Chapter 3 of MIT Doctoral Thesis)
Abstract: This paper provides estimation and inference methods for a large number of heterogeneous effects in the presence of a high-dimensional vector of control variables in a panel data setting. We allow the number of heterogeneous groups and the number of controls to exceed the sample size. To make informative inference possible, we assume that the treatment effect has a sparse representation and estimate it in two stages. In the first stage, we estimate the reduced form for treatment and outcome on an auxiliary sample. In the second stage, we estimate the treatment effect by l1-regularized least squares regression of outcome residuals on the treatment residuals. The proposed estimator presents an improved convergence rate over a one-stage alternative where the treatment effect and the control function are estimated jointly. In addition, we provide inference methods for single and multiple coefficients of the heterogeneous effects. We use a correlated random effects approach to model unobserved unit heterogeneity by extending the approaches from Mundlak (1978) and Chamberlain (1982) to a high-dimensional setting. Using the data from a major food distributor, we apply our method to estimate price elasticities for a large number of heterogeneous products and validate our estimates using experimental price variation.
Simultaneous Inference for Best Linear Predictor of the Conditional Average Treatment Effect and Other Structural Functions (with Victor Chernozhukov)
Abstract: We propose estimation and inference methods for the best linear approximation to conditional average treatment effect, conditional average partial derivative, and other structural functions in the presence of a large number of controls. We select the most important controls using modern machine learning tools. We approximate the structural function using a linear form of the basis functions (e.g., polynomial series or splines) and characterize the least squares parameter as the solution to a semiparametric moment equation. Combining Neyman-orthogonality and sample splitting ideas, we propose an estimator for the best linear approximation that features (a) pointwise Gaussian approximation, (b) simultaneous Gaussian approximation, and (c) the optimal rate of simultaneous convergence. When the approximation error of the linear form decays sufficiently fast, the Gaussian approximations automatically hold for the target structural function itself. Using our method, we estimate the average price elasticity conditional on income using Yatchew and No (2001) data and provide uniform confidence bands for the conditional average price elasticity.
Journal Submission VersionChernozhukov_Semenova_JoE_Submission.pdf
Code is available at https://drive.google.com/open?id=1jhXPPddusytejcREns6qxVJxilqIEhn7
Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models (with Victor Chernozhukov, Denis Nekipelov, and Vasilis Syrgkanis)
Abstract: We develop a theory for a two-stage estimator of a high-dimensional sparse parameter defined as a minimizer of a population loss function, which, in addition to the target, can depend on a potentially infinite dimensional nuisance parameter. In the first stage, we estimate the nuisance parameter on an auxiliary sample. In the second stage, we plug the estimated nuisance parameter into an l-1 regularized sample loss function and solve for its minimizer. We define a population loss to be Neyman-orthogonal if its gradient has zero derivative with respect to the nuisance parameter when evaluated at the true values of the target and nuisance parameters. By virtue of orthogonality, the first-stage nuisance error has only a second-order impact on the convergence rate of the target parameter. Our result enables oracle convergence rates for the target parameter assuming that the first- stage rate is of the order o(n-1/4). We develop a general method to construct an orthogonal loss starting from a model defined by conditional moment restrictions. We apply our theory to high-dimensional versions of standard estimation problems in statistics and econometrics such as: estimating conditional moment models with missing data, estimating structural utilities in games of incomplete information and estimating treatment effects in regression models with non-linear link functions.