Maimonides Rule Redux (with Josh Angrist, Victor Lavy, and Adi Shany)
NBER Working Paper No. 23486, June 2017
Abstract: We use the discontinuous function of enrollment known as Maimonides Rule as an instrument for class size in large Israeli samples from 2002-2011. As in the 1991 data analyzed by Angrist and Lavy (1999), Maimonides Rule still has a strong first stage. In contrast with the earlier Israeli estimates, however, Maimonides-based instrumental variables estimates using more recent data show no effect of class size on achievement. The new data also reveal substantial enrollment sorting near Maimonides cutoffs, with too many schools having enrollment values that just barely produce an extra class. A modified rule that uses data on students’ birthdays to compute statutory enrollment in the absence of enrollment manipulation also generates a precisely estimated zero. In older data, the original Maimonides Rule is unrelated to socioeconomic characteristics, while in more recent data, the original rule is unrelated to socioeconomic characteristics conditional on a few controls. Enrollment manipulation therefore appears to be innocuous: neither the original negative effects nor the recent data zeros seem likely to be manipulation artifacts.
Measuring Corruption: Unmasking Strategic Data Manipulation (with Jean Ensminger, Caltech)
Abstract: Efficient measurement and identification of corruption are important for combatting it. We develop new tests that uncover the strategic nature of intent to defraud with data manipulation, and we apply these methods to a World Bank project in Africa. Digit analysis exploits the fact that humanly produced data follow different patterns than naturally occurring data. Our tests are based on Benford’s Law of natural digit distributions, and include new statistical techniques to accommodate smaller sample sizes, capture deviations based upon their monetary profitability, and expose efforts to subvert detection. A forensic audit of the same project by the World Bank provides external validity.
Computer-Assisted Reading and Discovery for Student-Generated Text in Massive Open Online Courses (with Reich, Tingley, Roberts, and Stewart)
Journal of Learning Analytics, 2(1), 2015: 156–184.
Dealing with the vast quantities of text that students generate in Massive Open Online Courses (MOOCs) and other large‐scale online learning environments is a daunting challenge. Computational tools are needed to help instructional teams uncover themes and patterns as students write in forums, assignments, and surveys. This paper introduces to the learning analytics community the Structural Topic Model, an approach to language processing that can 1) find syntactic patterns with semantic meaning in unstructured text, 2) identify variation in those patterns across covariates, and 3) uncover archetypal texts that exemplify the documents within a topical pattern. We show examples of computationally aided discovery and reading in three MOOC settings: mapping students’ self‐reported motivations, identifying themes in discussion forums, and uncovering patterns of feedback in course evaluations.
Structural Topic Models for Open Ended Survey Responses (with Roberts, Stewart, Tingley, Lucas, Gadarian, Albertson, and Rand)
American Journal of Political Science, 58(4), October 2014: 1064–1082
Collection and especially analysis of open-ended survey responses are relatively rare in the discipline and when conducted are almost exclusively done through human coding. We present an alternative, semiautomated approach, the structural topic model (STM) (Roberts, Stewart, and Airoldi 2013; Roberts et al. 2013), that draws on recent developments in machine learning based analysis of textual data. A crucial contribution of the method is that it incorporates information about the document, such as the author’s gender, political affiliation, and treatment assignment (if an experimental study). This article focuses on how the STM is helpful for survey researchers and experimentalists. The STM makes analyzing open-ended responses easier, more revealing, and capable of being used to estimate treatment effects. We illustrate these innovations with analysis of text from surveys and experiments.