Selecting Informative Moments via LASSO (Job Market Paper)
Abstract: The traditional optimal GMM estimator is biased when there are many moments. This paper considers the case when information is sparse, i.e., only a small set of the moment conditions is informative. We propose a LASSO based selection procedure in order to choose the informative moments and then, using the selected moments, conduct optimal GMM or other procedures such as GEL. Our method can significantly reduce any second-order bias of the optimal GMM estimator while retaining most of the information in the full set of moments. The LASSO selection procedure allows the number of moments to be much larger than the sample size. We establish bounds and asymptotics of the LASSO and post-LASSO estimators. Under our approximate sparsity condition, both estimators are asymptotically normal and nearly efficient when the number of strongly informative moments grows at a slow rate compared to the sample size. The formulation of LASSO is a convex optimization problem and thus the computational cost is low compared to other moment selection procedures such as those described in Donald, Imbens and Newey (2008). In the IV setting with homoskedastic errors, our procedure reduces to the IV-LASSO method stated in Belloni, Chen, Chernozhukov and Hansen (2012). We propose penalty terms using data-driven methods. Since the construction of the penalties requires the knowledge of unknown parameters, we propose adaptive algorithms which perform well globally. These algorithms can be applied to more general L1-penalization problems when the penalties depend on the parameter to be estimated. In simulation experiments, the LASSO-based GMM estimator performs well when compared to the optimal GMM estimator and CUE.
(with Victor Chernozhukov)
Abstract: For partial effects under consideration, the usual modus operandi is to report the average partial effects. In this paper, instead of the average, we consider sorting the effects into an increasing function. We derived the asymptotic properties of the sorted curve of partial effects given any probability measure. In particular, we proved the asymptotic properties of the rearranged curve for the empirical measure, thereby providing inferences for each quantile of the partial effect with i.i.d data. Though the analytical confidence interval is often difficult to compute, we suggest a bootstrap method which proves to be valid for inference. The technique we develop in this paper demonstrates the entire image of the partial effects rather than simply the average as is usual in econometrics. We present Monte-Carlo examples to show how the procedure works. We apply our method to study the returns-to-education problem using 1980 PUMS data.
Shape Regularization In Econometrics (with Victor Chernozhukov and Ivan Fernandez-Val)
Abstract: In many economic applications, one encounters functions with certain shapes. We study a group of functional operators including Legendre Transformation, rearrangement and projection, which enforce certain shapes such as convexity and monotonicity on functions. Suppose there is an estimator of the true function. If the true function has certain shape properties, by applying the corresponding functional operators to the estimator, one can construct a function of the desired shape that is closer to the true function under some distance. We show that the inference can be done similarly by applying the corresponding operators on the boundary of the uniform confidence bands of the true function (with slight modifications). We provide conditions for such a simple inference strategy to be valid.
(with Jerry Hausman and Christopher Palmer)
Abstract: The usual quantile regression estimator of Koenker and Bassett (1978) is biased if there is an additive error term in the dependent variable. We analyze this problem as an errors-in-variables problem where the dependent variable suffers from the classical measurement error and develop a sieve maximum-likelihood approach that is robust to left-hand side measurement error. After describing sufficient conditions for identification, we employ global optimization algorithms (specifically, a genetic algorithm) to solve the MLE optimization problem. We also propose a computationally tractable EM-type algorithm that accelerates the computational speed in the neighborhood of the optimum. When the number of knots in the quantile grid is chosen to grow at an adequate speed, the sieve maximum-likelihood estimator is asymptotically normal. We verify our theoretical results with Monte Carlo simulations and illustrate our estimator with an application to the returns to education highlighting important changes over time in the returns to education that have been obscured in previous work by measurement-error bias.
Core Determining Class: Construction, Approximation and Inference (with Hai Wang)
Abstract: The relations between unobserved events and observed outcomes in partially identified models can be characterized by a bipartite graph. We estimate the probability measure on the events given observations of the outcomes based on the graph. The feasible set of the probability measure on the events is defined by a set of linear inequality constraints. The number of inequalities is often much larger than the number of observations. The set of irredundant inequalities is known as the Core Determining Class. We propose an algorithm that explores the structure of the graph to construct the exact Core Determining Class when data noise is not taken into consideration. We prove that if the graph and the measure on the observed outcomes are non-degenerate, the Core Determining Class does not depend on the probability measure of the outcomes but only on the structure of the graph. For more general problem of selecting linear inequalities under noise, we investigate the sparse assumptions on the full set of inequalities, i.e., only a few inequalities are truly binding. We show that the sparse assumptions are equivalent to certain sparse conditions on the dual problems. We propose a procedure similar to the Dantzig Selector to select the truly informative constraints. We analyze the properties of the procedure and show that the feasible set defined by the selected constraints is a nearly sharp estimator of the true feasible set. Under our sparse assumptions, we prove that such a procedure can significantly reduce the number of inequalities without throwing away too much information. We apply the procedure to the Core Determining Class problem and obtain a stronger theorem taking advantage of the structure of the bipartite graph.We design Monte-Carlo experiments to demonstrate the good performance of our selection procedure, while the traditional CHT inference is difficult to apply.
Semi-Parametric Partially Linear Models with Interval Data
(With Victor Chernozhukov and Denis Chetverikov)
Abstract: Conditional moment inequalities induce infinitely many inequality constraints on the parameter of interest. One important case is that the observed dependent variables are intervals in the linear regression model, which is often referred as interval regression. We discover a new geometric method of depicting the identification region for interval regression and semi-parametric partially linear models. Using the Legendre Transformation, we establish a simple duality between the sharp identification region of the parameter and the convex hulls of the conditional mean functions of the upper and lower bounds of the dependent variable. The order-preserving property of the Legendre Transformation provides an additional duality between the confidence set of the parameter and the confidence set of the convex hull of the conditional mean functions of the upper and lower bounds of the dependent variable. The inference procedure is developed based on traditional non-parametric uniform inference. The procedure is computationally tractable and easy to implement. We propose a similar procedure to estimate and perform inference of the identified region when the conditional moment inequalities are semi-parametric and partially linear in the parameter of interest. We demonstrate the performance of our procedure in Monte-Carlo experiments.
Research in Progress
Functional Inequalities: Optimal Discretization, Estimation and Inference (with Hai Wang)
Abstract: This paper extends the Core-Determining-Sets problem to the case with continuous correspondence mapping instead of discrete mapping. This setting allows partially-identified models to be estimated non-parametrically under certain smooth assumptions. We discuss optimal discretization of the correspondence mapping and derive the optimal number of cells being discretized. Our method adaptively adjusts this number in order to balance the approximation error and the estimation error given a specific model and observed data.
Adaptive Non-Parametric Smoothing on Discrete Variables (with Martin Spindler)
Abstract: Non-parametric smoothing proposed in Li and Racine (2001) may substantially improve the performance of non-parametric methods such as kernel estimation when there exist discrete variables. While Li and Racine (2001) makes smoothing weights equally the same across different values of the discrete variables, we develop a method using data-driven weights based on test statistics proposed in Hardle and Mammen (1993) and Dette and Neumeyer (2003). Preliminary results show that our procedure performs better than Li and Racine (2001) and traditional non-parametric methods, especially when the sample size is relatively small.