Brandon Stewart, Princeton University

CCPR Seminar Room, 4240 Public Affairs Building, Los Angeles, CA, 90095, United States 101 Sumner Ave, United States

How to Make Causal Inferences Using Texts

Texts are increasingly used to make causal inferences: either with the document serving as the treatment or the outcome. We introduce a new conceptual framework to understand all text-based causal inferences, demonstrate fundamental problems that arise when using manual or computational approaches applied to text for causal inference, and provide solutions to the problems we raise.  We demonstrate that all text-based causal inferences depend upon a latent representation of the text and we provide a framework to learn the latent representation.  Estimating this latent representation, however, creates new risks: we may unintentionally create a dependency across observations or create opportunities to fish for large effects.  To address these risks, we introduce a train/test split framework and apply it to estimate causal effects from an experiment on immigration attitudes and a study on bureaucratic responsiveness.  Our work provides a rigorous foundation for text-based causal inferences, connecting two previously disparate literatures. (Joint Work with Egami, Fong, Grimmer and Roberts)

Susan Athey, Stanford University

CCPR Seminar Room, 4240 Public Affairs Building, Los Angeles, CA, 90095, United States 101 Sumner Ave, United States

Estimating Heterogeneous Treatment Effects and Optimal Treatment Assignment Policies

Abstract: This talk will review recently developed methods for estimating conditional average treatment effects and optimal treatment assignment policies in experimental and observational studies, including settings with unconfoundedness or instrumental variables. Multi-armed bandits for learning treatment assignment policies will also be considered.

Eloise Kaizar, Ohio State University

1434A Physics and Astronomy 1434A Physics and Astronomy, Los Angeles, CA, United States

Eloise Kaizar, Ohio State University Randomized controlled trials are often thought to provide definitive evidence on the magnitude of treatment effects. But because treatment modifiers may have a different distribution […]

Lan Liu, University of Minnesota at Twin Cities

Lan Liu, University of Minnesota at Twin Cities “Parsimonious Regressions for Repeated Measure Analysis”  Abstract: Longitudinal data with repeated measures frequently arises in various disciplines. The standard methods typically impose […]

Adeline Lo, Princeton University

Adeline Lo, Princeton University Abstract: High dimensional (HD) data, where the number of covariates and/or meaningful covariate interactions might exceed the number of observations, is increasing used in prediction in […]

Kosuke Imai, Harvard University

CCPR Seminar Room, 4240 Public Affairs Building, Los Angeles, CA, 90095, United States, 5201 Sumner Ave, United States, 101 C St, United States 301 C St, United States

Title: Matching Methods for Causal Inference with Time-Series Cross-Section Data Abstract: Matching methods aim to improve the validity of causal inference in observational studies by reducing model dependence and offering […]

Kosuke Imai, Harvard University

CCPR Seminar Room, 4240 Public Affairs Building, Los Angeles, CA, 90095, United States 101 Sumner Ave, United States

Title: Matching Methods for Causal Inference with Time-Series Cross-Section Data Abstract: Matching methods aim to improve the validity of causal inference in observational studies by reducing model dependence and offering […]

Rocio Titiunik, University of Michigan

4240 Public Affairs Building 4240 Public Affairs Building, Los Angeles, CA, United States

Internal vs. external validity in studies with incomplete populations

Researchers working with administrative data rarely have access to the entire universe of units they need to estimate effects and make statistical inferences. Examples are varied and come from different disciplines. In social program evaluation, it is common to have data on all households who received the program, but only partial information on the universe of households who applied or could have applied for the program. In studies of voter turnout, information on the total number of citizens who voted is usually complete, but data on the total number of voting-eligible citizens is unavailable at low levels of aggregation. In criminology, information on arrests by race is available, but the overall population that could have potentially been arrested is typically unavailable. And in studies of drug overdose deaths, we lack complete information about the full population of drug users.

In all these cases, a reasonable strategy is to study treatment effects and descriptive statistics using the information that is available. This strategy may lack the generality of a full-population study, but may nonetheless yield valuable information for the included units if it has sufficient internal validity. However, the distinction between internal and external validity is complex when the subpopulation of units for which information is available is not defined according to a reproducible criterion and/or when this subpopulation itself is defined by the treatment of interest. When this happens, a useful approach is to consider the full range of conclusions that would be obtained under different possible scenarios regarding the missing information. I discuss a general strategy based on partial identification ideas that may be helpful to assess sensitivity of the partial-population study under weak (non-parametric) assumptions, when information about the outcome variable is known with certainty for a subset of the units. I discuss extensions such as the inclusion of covariates in the estimation model and different strategies for statistical inference.

Co-sponsored with the Political Science Department, Statistics Department and the Center for Social Statistics 

Erin Hartman, University of California Los Angeles

CCPR Seminar Room, 4240 Public Affairs Building, Los Angeles, CA, 90095, United States 101 Sumner Ave, United States

Title: Covariate Selection for Generalizing Experimental Results Abstract: Researchers are often interested in generalizing the average treatment effect (ATE) estimated in a randomized experiment to non-experimental target populations. Researchers can estimate the […]