Angrist Data Archive

Follow these links to data sets and programs from a number of my papers. In some cases, I've taken advantage of the opportunity to make minor corrections. Some old SAS programs have been converted to Stata (but SAS is good for you, so I've left some that way). Feel free to use these files for teaching or research (with attribution!).

Many of these replication files are also available in my IQSS Dataverse; use the Dataverse for online variable selection, automatic subsetting, and to download files in alternative formats.

DATA NEWS: January 2019

We've posted the .do file from Angrist and Rokkanen (2015)

DATA NEWS: June 2017

We've published in JBES. Latest paper has been posted: Angrist, Jordà and Kuersteiner (2016).

DATA NEWS: February 2016

We've posted data for Angrist, Oreopoulos and Williams (2014)

DATA NEWS: NOVEMBER 2015

We've posted data for Angrist and Lavy (2001).

DATA NEWS: DECEMBER 2014

I've uploaded a do file to replicate Tables 1 and 2 in Angrist and Fernandez-Val (2010).  For Tables 3 and 4, see Ivan Fernandez-Val's data archive.

DATA NEWS: JULY 2011

I've uploaded data and programs to replicate the cross-district analysis of Metco in Angrist and Lang (2004).

DATA NEWS: FEBRUARY 2011

Adriana Kugler and I have posted the Colombian rural household survey data for our Rural Windfall paper (May 2008 ReStat).  Download 'em quick, before we get busted!

DATA NEWS: FEBRUARY 2009

Data sets and programs from many of the papers by other authors referenced in Mostly Harmless Econometrics are posted in the MHE Data Archive.

DATA NEWS: JUNE 2008

The posting for Angrist and Krueger (1991) (below, but not in the dataverse) now includes all of the 1970 and 1980 census cohorts used in the paper, including all covariates.  The smaller 1980 census extract for men born 1930-39 is also still available as an ASCII file.

Causal Effects of Monetary Shocks: Semiparametric Conditional Independence Tests with a Multinomial Propensity Score

Notes: The zip and .pdf files below contain data, programs, and documentation for "Causal Effects of Monetary Shocks: Semiparametric Conditional Independence Tests with a Multinomial Propensity Score", forthcoming in the Review of Economics and Statistics. An Auxiliary Appendix containing proofs for the results presented in the main part of the paper is also included.

Programs and Data: Click the link to download the zip file containing the data and programs:

AK2010_Code_Data.zip

Documentation: Click the link to download the instructions to create the tables in the paper:

AK2010_Instructions.pdf

Auxiliary Appendix: Click the link to download the Auxiliary Appendix:

TScausalMV_Final_Auxiliary Appendix_20100511.pdf

The zip and word files below contain data, programs, and documentation for "The Effects of High Stakes High School Achievement Awards: Evidence from a Randomized Trial", in the AER, September 2009.

readme.doc

AngristLavy_AERdata.zip

Incentives and Services for College Achievement: Evidence from a Randomized Trial

Notes: The following .zip file reproduces all figures and tables 1-8 of the published paper. The file also contains the public use data set.

Rural Windfall or a New Resource Curse? Coca, Income and Civil Conflict in Colombia

Aggregate statistics and effects on mortality

Notes: The following programs and data files recreate Tables 1, 2, 7, 8, and 9 and Figures 3, 4, and 5 in the published paper.

Programs: Click on the links to download the program files:

Data: Click on the links to download the data:

  • Colombia_Paper_Data.zip zipped file; contains all data needed to construct the tables.
  • GSP.dta contains data on GSP by department; used to create column 6 in Table 2.
  • data00.dta used to create Table 7, and Figures 3, 4, and 5.
  • data_urban_alt.dta used to create Tables 8 and 9.
  • pop_sex.dta used to create Tables 7, 8, and 9, and Figures 3, 4, and 5 estimates using the rural survey)

Estimates using the rural survey

Data and programs to replicate Tables 4, 5, and 6: drugsdata.zip.

Is Spanish-Only Schooling Responsible for the Puerto Rican Language Gap?

Notes: The following programs and datasets create all statistical tables in the paper.

Programs: Click the links to download program files:

Data: Click on the link to download the data:

Instrumental Variables Methods in Experimental Criminological Research: What, Why and How

Notes: The data in this paper are from the Berk and Sherman Minneapolis Domestic Violence Experiment, available on ICPSR, supplemented by published statistics from Berk and Sherman (1988, Tables 4 and 6)

Programs: Click the links to download the program files:

Data: Click the links to download the data sets:

  • file1.txt contains the raw data used by the SAS program posted above.
  • formats.sas7bcat contains the SAS formates used by the SAS program posted above. When running the program, you may get an error message stating that the formats were created on another operating system. If you have this problem, simply comment out all format-related statements and the program will run properly.

Long-Term Consequences of Secondary School Vouchers: Evidence from Administrative Records in Colombia

Notes: The data used in this paper are administrative records from Colombia's PACES program. Below you can find  the data and programs used to generate Figures 1 and 2 and Tables 1 though 5.

Programs and Data: Click here to download the Stata data set and programs. The data set contains the following variables:

  • Dummy for winning voucher
  • Dummy for male name
  • Dummy for ID match
  • Dummy for ID and City match
  • Dummy for ID and 7-letter match
  • age
  • Dummy for having a phone
  • Dummy for having valid ID
  • Language score on ICFES
  • Dummy for ID and 7-letter and City match
  • Reading score censored at 10th percentile
  • Math score censored at 1st percentile
  • Math score censored at 10th percentile

Quantile Regression under Misspecification, with an Application to the U.S. Wage Structure

Notes: Click here for added technical details related to the data, variable definitions, and estimation. The paper has two empirical components: estimation of quantile regression weighting schemes and robust inference on the quantile regression process for earnings equations. Both rely on Census microdata for 1980, 1990, and 2000. The original raw data are available from the Integrated Public Use Microdata Series (IPUMS) web site and our Stata extracts are available here. In addition to a description of the data and variables, this supplement includes all Stata and R (version 2.0.1) command files used to construct Figures 1 and 2, and Table I.

Programs: Click here to download the Stata and R (version 2.0.1) command files used to construct Figures 1 and 2, and Table I.

Data: Click here to download the data sets used in this paper. This file contains Stata data sets from the 1980, 1990, and 2000 censuses, with the following variables:

  • age
  • years of schooling
  • log weekly wage
  • individual sampling weight
  • years of potential experience
  • potential experience squared
  • dummy for reported race of black

Does School Integration Generate Peer Effects? Evidence from Boston's Metco Program

Notes: Data and programs for Tables 1 and 2 (cross-district analysis) can be found here.

Protective or Counter-Productive? Labour Market Institutions and the Effect of Immigration on EU Natives

Notes: The following programs and data reproduce Tables 1-6 from the paper. Note that the SAS and STATA datasets have the same content.

Programs:

Data:

  • pristina.dta contains data on distance from Pristina by country. It is used by t3new.do and t4new.do to make Tables 3 and 4.
  • rsa8399n.dta contains most variables used in the paper. It is used by table1.do t3new.do and t4new.do to make Tables 1, 3 and 4.
  • kugcob.dta.gz is used by table2.do to construct Table 2. This file must be unzipped before using.
  • pristina.sas7bdat is used by t5new.sas and t6new.sas to make Tables 5 and 6.
  • rsa8399n.sas7bdat is used by t5new.sas and t6new.sas to make Tables 5 and 6.
  • bar_ent.sas7bdat is used by t6new.sas to make Table 6.

Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings

Notes: The following MATLAB files and ASCII data recreate the quantile estimates in Tables 2 and 3. You must have the files qregkb.m and quantlsf.m in the same working directory as the data. You then run qeffectsfinal.m to obtain the published estimates.

Programs: Click the links to download the program files:

  • qeffectsfinal.m creates the quantile estimates for Tables 2 and 3
  • qregkb.m must be placed in the same working directory as the data
  • quantlsf.m must be placed in the same working directory as the data

Data: Click the links to download the data:

  • jtpa.raw is used to construct all the estimates.

How Do Sex Ratios Affect Marriage and Labor Markets? Evidence from America's Second Generation

Notes: The following programs create Tables 1 and 2 from the paper.

Programs: Click the links to download the program files:

Data: Click on the links to obtain the data.

  • data11b.zip is used by all of the programs posted above.
  • irates3.sas7bdat is used to create Table 4-6. You must also download the core CPS dataset from the FTP site to run the relevant programs.

Vouchers for Private Schooling in Colombia: Evidence from a Randomized Natural Experiment

Notes: The following programs and data files recreate Tables 1 through 7 in the published paper. Some variables were updated or corrected slightly for Angrist, Bettinger, Kremer (2006), so these files do not always produce an exact match to the 2002 results.

Programs: Click the links to download the program files:

  • table1_final.sas creates Table 1. Note that panel B for test takers is unavailable.
  • table2_final.sas creates Table 2.
  • table3_final.sas creates Table 3. The results for "ever used a scholarship" are created by the program that makes Table 7.
  • table4_final.sas creates Table 4.
  • table5_final.sas creates Table 5.
  • table6_final.sas creates Table 6. Note that in order to obtain the marginal effects for the probit, you must multiply the coefficient estimate by the phi1 estimate and switch the sign.
  • table7_final.sas creates Table 7 and the "ever used a scholarship" estimates in Table 3.

Data: Click the links to download the data:

  • aerdat4.sas7bdat is the core dataset. This file is used by the programs that creates Tables 1, 2, 3, 4, and 6.
  • tab5v1.sas7bdat is used by the programs that create Tables 2 and 5.
  • tab7.sas7bdat is used by table7_final.sas to create Table 7.
  • tab7test.sas7bdat is used by table7_final.sas to create Table 7.

Consequences of Employment Protection? The Case of the Americans with Disabilities Act

Notes: The linked programs recreate Table 1, Table 2, and Table 3 in the published paper.

Programs: Click the links below to download program files:

  • table1.sas creates Table 1
  • table2.sas creates Table 2.
  • AngAceADA_table3.sas creates Table 3. Note that the replicated coefficients differ slightly from the published values. This is likely due to the way in which SAS drops dummy variables when the matrix of dummies and a constant is not of full rank. This file also contains a minor correction to the original results, which takes account of small cell sizes.
  • AngAceADA_table4.sas creates Table 4, columns 1-3.

Data: Click the links below to download the data:

Estimation of Limited Dependent Variable Models with Dummy Endogenous Regressors: Simple Strategies for Empirical Practice

Notes: The programs and data sets available here will produce the results in Tables 1-3. There is code available for all estimates except the standard errors in column 4 of Table 2. Note that many of the standard errors in this paper as well as some coefficient estimates were constructed with a stochastic element that cannot be reproduced exactly (recoding 1% of twins; bootstrap samples).

Programs: Click the links to download the program files:

  • table1.zip contains the Stata .do file and log files that produce the results in Table 1
  • table2_post.rar contains the Stata .do files and SAS files that produce the results in Table 2. See enclosed program key for further detail.
  • table3.rar contains the Stata .do files that produce Table 3
  • Standard errors.rar contains programs that create standard errors for Table 2 columns 3,5,6 and Table 3 columns 5,6,8. See enclosed "TABLE KEY" file for additional detail.

Data: Click the links to download the data sets:

  • pums80m.dta.Z contains the 1980 census file read by most of the programs
  • abadie5.rar contains data for coefficients in Table 2, column 4
  • abadie.dta.Z contains data for coefficients in Table 3
  • bstrap.dta.Z contains data to create bootstrapped standard errors

How Large are Human Capital Externalities? Evidence from Compulsory Schooling Laws

Notes: The following programs create the estimates reported in two tables. Estimates other than those in the appendix were constructed similarly using the same data (three.rar). Coding and documentation for compulsory attendance laws are in a separate archive.

Note also that the compulsory attendance (CA) first stage is mislabeled in Tables 4 and 5.  The effects reported in the tables are for CA9, CA10, and CA11.  (Thanks to Steve Pischke for pointing this out.)

Programs: Click the links below to download program files:

Data: Click the links below to download data:

  • CompSchoolLaws.rar contains data and documentation on compulsory attendance laws by state. CSLDOCs.txt explains the contents of the two data files
  • three.rar is the micro data set used by the programs posted above. This is a large file
  • average4.sas7bdat has the average schooling data used by table6_0.sas; average4.dta is a STATA version
  • moulton3.sas clusters standard errors (a legacy program, to be sure, but you have to use this one to get our exact numbers)

Jackknife Instrumental Variables Estimation

Notes:
The ASCII data set posted here includes the 1980 census extract used by Angrist, Imbens, and Krueger without covariates. This is the same data set used in Angrist and Krueger (1991). The SAS program produces the estimates in Table II, row 1 (30 instrument case).

Programs: Click on the links below to download the program files:

  • Click here to download the SAS program for Table II, row 1
  • Jive180.sas, a few more examples of JIVE and SSIV estimation using the quarter-of-birth data

Data: Click here to download the data.

Using Maimonides' Rule To Estimate the Effect of Class Size on Scholastic Achievement

Notes: These programs produce Tables II-V in the published paper. The program mmoulton_post.do implements a Moulton (1986) clustering adjustment for OLS and 2SLS and is used by the other .do files. 

These are STATA translations of the original SAS programs. The switch in software generates slightly different RMSEs.

Programs: 

Data:

Estimating the Labor Market Impact of Voluntary Military Service Using Social Security Data on Military Applicants

Notes: The following programs and data files reproduce Tables 2 and 5 in the published paper.

NOVEMBER 2009: HEADS-UP DATA USERS! I recently learned that the CSV file is truncated at year-of-earnings=82 and Stat Transfer does not do a good job on my old SAS data set.  If you really want to avoid SAS, go to my IQSS dataverse and do it the easy way!

Programs: Click the links below to download program files:

  • Angrist1998_Table 2.zip contains programs used to create the estimates in Table 2. The programs tabl4rev.sas and tabl4zer.sas create columns 1, 2, 5, and 6 in the paper. The programs covalpha.sas and covalphz.sas create the estimates in columns 3, 4, 7, and 8.
  • Angrist1998_Table5core.zip contains programs used to create the estimates in Table 5. The README file included in the .zip file explains the file structure and outlines how to recreate the table.

Data: Click the links below to download the data:

Children and Their Parents’ Labor Supply: Evidence from Exogenous Variation in Family Size

Notes: The 1980 and 1990 Census extracts used in the paper are posted here.

Programs: Click the links below to download the program files:

  • Click here to download program files for 1980 results. This program produces the 1980 panel in Tables 2 and 6, and columns (1) - (6) in Table 7.
  • Click here to download program files for 1990 results. This program produces the 1990 panel of Table 6, column (1).

Data: Click the links to download the data:

  • AngEv98.zip is used by the programs posted above to construct all the estimates.

Short-Run Demand for Palestinian Labor

Notes: The following programs create Table 1 and Table 4 from the paper. Note that the sample used to create Table 4 has one more observation than the sample from the original paper, so in some cases, the estimates differ very slightly from published values.

Programs: Click the links below to download the program files.

Data: Click the link below to download the data file.

  • data8191.rar contains the core STATA data extract.
  • index1.dta contains additional data needed to create Table 4.

The Economic Returns to Schooling in the West Bank and Gaza Strip

Notes: The following programs create Tables 1, 2, and 3 from the paper. The sample produced by this version of the Table 2 program contains 5 more observations than the sample that generated the published table. This leads to some very slight differences relative to the published version (time passes . . .).

Programs: Click the links below to download the program files.

Data: Click the link below to download the data file.

Split-Sample Instrumental Variables Estimates of the Return to Schooling

Notes: This paper uses two data sets:

1. A 1980 census extract, also used in Angrist and Krueger (1991). Below you can download the ASCII file containing 329,509 observations on the following variables:

  • log weekly wage
  • quarter of birth (1-4)
  • year of birth (30-39)
  • place of birth (1980 census state codes)
  • education (highest grade completed)

2. A CPS extract which contains 30,967 observations on men born 1944-53 from the 1979 and 1981-85 March CPS, matched to lottery number dummies for groups of 25 lottery numbers. Below you can download the Stata data set. There are 72 variables including all covariates used in the JBES article and in our NBER working paper. Follow the sample selection rules in the notes to the tables to reproduce the 25, 781 observation working sample. The CPS data were first used in Angrist and Krueger's unpublished 1992 NBER working paper. These data were also used in Alberto Abadie's (2002) JASA paper.

Programs: Click the links to download the program files:

  • samplcps.do, a sample Stata program that analyzes the CPS data set.
  • samplcps.log, the log file produced by running samplcps.do using the CPS extract.
  • ssivex1.sas , a sample SAS program that uses the 1980 census extract (which you can download below) to produce some example SSIV estimates in the spirit of Tables 1 and 2. The regressions in the paper include covariates which are not in the data set posted here, so this will not replicate exactly the results in the paper.

Data: Click the links to download the data sets:

Data summary:

The Effect of Veterans Benefits on Education and Earnings

Notes: This paper uses the 1987 National Survey of Veterans. The variable listing should be self-explanatory (see below). Variables are named after question numbers and items on the survey.

Click here to view a list of variable names and summary statistics.

Programs: Click the links to download the program files:

Data: Click the links to download the data sets:

  • soviii_ang93b.zip is used to create all the tables. This is an extract of the 1987 Survey of Veterans, known as SOV-III. The program creates the extract described in the footnote to Table 1 in the paper. The raw data set includes all 3,337 veterans in SOV-III with Vietnam-era or later service (excludes all Korean War veterans). The extract contains 2,388 Vietnam-era and AVF veterans.

Does Compulsory School Attendance Affect Schooling and Earnings?

Notes: This posting includes three data sets. The first is a minimal 1980 ASCII extract without covariates.  This data set was used in Angrist and Krueger (1995) and Angrist, Imbens, and Krueger (1999). Second, an ASCII data set in the file QOB.rar, which contains the 1980 census extract from Angrist and Krueger (1991) with covariates (men born 1930-39 and 1940-49). Third, NEW7080.rar, a larger Stata data set with the complete original Angrist and Krueger 1970 and 1980 extracts, and all cohorts (men born 1920-29 in 1970, men born 1930-39 in 1980, and men born 1940-49 in 1980).

Data: Click on the link to download the data:

  • asciiqob.zip contains a minimal 1980 ASCII file;  You can use this file - ak91.sas - to read it
  • QOB.rar includes an ASCII file with the 1980 census extract. This file - Descriptive Statistics QOB.txt - shows what's in it.
  • NEW7080.rar includes a Stata file with the original 1970 and 1980 census data including all cohorts and covariates

Programs: The Stata programs below produce published tables from Angrist and Krueger (1991) using the data in NEW7080.rar

Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records

Notes: The programs and data sets available here will produce the results in tables 1 through 4, as well as figure 3. The code for tables 1, 3, and 4 also contain a correction that affects standard errors. Correcting this error reduces the standard errors somewhat relative to those published. Uncommenting the correction as indicated in the programs reproduces the published results. There is also a very minor discrepancy in Table 3, probably due to the use of a different CPI than was used for the published paper. The figures in the original paper were not typeset in the proper order. Here is the published correction.

Programs: Click the links to download the program files:

Data: Click the links to download the data sets:

  • cwhsa.dta is used by Angrist1990_Table1.do to create Table 1 and Angrist1990_Table2DMDC.do to create part of Table 2
  • cwhsb.dta is used by Angrist1990_Table1.do to create Table 1
  • sipp2.dta is used by mysipp2.do to generate the SIPP(84) panel of Table 2
  • dmdcdat.dta is used by Angrist1990_Table2DMDC.do to create part of Table 2
  • cpi_angrist1990.dta is used by Angrist1990_Table3.do to obtain current dollar values from real values contained on cwhsc_new.dta
  • cwhsc_new.dta is used by Angrist1990_Table3.do to create Table 3 and Angrist1990_Table4.do to create Table 4
  • Draft_Lottery_Numbers.xls contains draft lottery numbers by birthdate for the 1969-1972 lotteries