An Open Discussion on Promoting Transparency in Social Science Research 5

By Edward Miguel (Economics, UC Berkeley)

This CEGA Blog Forum builds on a seminal research meeting held at the University of California, Berkeley on December 7, 2012. The goal was to bring together a select interdisciplinary group of scholars – from biostatistics, economics, political science and psychology – with a shared interest in promoting transparency in empirical social science research.

There has been a flurry of activity regarding research transparency in recent years, within the academy and among research funders, driven by a recognition that too many influential research findings are fragile at best, if not entirely spurious or even fraudulent.  But the increasingly heated debates on these critical issues have until now been “siloed” within individual academic disciplines, limiting their synergy and broader impacts. The December meeting (see presentations and discussions) drove home the point that there is a remarkable degree of commonality in the interests, goals and challenges facing scholars across the social science disciplines.

This inaugural CEGA Blog Forum aims to bring the fascinating conversations that took place at the Berkeley meeting to a wider audience, and to spark a public dialogue on these critical issues with the goal of clarifying the most productive ways forward.   This is an especially timely debate, given: the American Economic Association’s formal decision in 2012 to establish an online registry for experimental studies; the new “design registry” established by the Experiments in Governance and Politics, or EGAP, group; serious discussion about a similar registry in the American Political Science Association’s Experimental Research section; and the emergence of the Open Science Framework, developed by psychologists, as a plausible platform for registering pre-analysis plans and documenting other aspects of the research process. Yet there remains limited consensus regarding how exactly study registration will work in practice, and about the norms that could or should emerge around it. For example, is it possible – or even desirable – for all empirical social science studies to be registered? When and how should study registration be considered by funders and journals?


Bayes’ Rule and the Paradox of Pre-Registration of RCTs 29

By Donald P. Green (Political Science, Columbia)

Not long ago, I attended a talk at which the presenter described the results of a large, well-crafted experiment.  His results indicated that the average treatment effect was close to zero, with a small standard error.  Later in the talk, however, the speaker revealed that when he partitioned the data into subgroups (men and women), the findings became “more interesting.”  Evidently, the treatment interacts significantly with gender.  The treatment has positive effects on men and negative effects on women.

A bit skeptical, I raised my hand to ask whether this treatment-by-covariate interaction had been anticipated by a planning document prior to the launch of the experiment.  The author said that it had.  The reported interaction now seemed quite convincing.  Impressed both by the results and the prescient planning document, I exclaimed “Really?”  The author replied, “No, not really.”  The audience chuckled, and the speaker moved on.  The reported interaction again struck me as rather unconvincing.

Why did the credibility of this experimental finding hinge on pre-registration?  Let’s take a step back and use Bayes’ Rule to analyze the process by which prior beliefs were updated in light of new evidence.  In order to keep the algebra to a bare minimum, consider a stylized example that makes use of Bayes’ Rule in its simplest form.


Targeted Learning from Data: Valid Statistical Inference Using Data Adaptive Methods 1

By Maya Petersen, Alan Hubbard, and Mark van der Laan (Public Health, UC Berkeley)

Statistics provide a powerful tool for learning about the world, in part because they allow us to quantify uncertainty and control how often we falsely reject null hypotheses. Pre-specified study designs, including analysis plans, ensure that we understand the full process, or “experiment”, that resulted in a study’s findings. Such understanding is essential for valid statistical inference.

The theoretical arguments in favor of pre-specified plans are clear. However, the practical challenges to implementing such plans can be formidable. It is often difficult, if not impossible, to generate a priori the full universe of interesting questions that a given study could be used to investigate. New research, external events, or data generated by the study itself may all suggest new hypotheses. Further, huge amounts of data are increasingly being generated outside the context of formal studies. Such data provide both a tremendous opportunity and a challenge to statistical inference.

Even when a hypothesis is pre-specified, pre-specifying an analysis plan to test the hypothesis is often challenging. For example, investigation of the effect of compliance to a randomly assigned intervention forces us to specify how we will contend with confounding.  What identification strategy should we use? Which covariates should we adjust for? How should we adjust for them? The number of analytic decisions and the impact of these decisions on conclusions is further multiplied when losses to follow up, biased sampling, and missing data are considered.


Transparency and Pre-Analysis Plans: Lessons from Public Health 3

By David Laitin (Political Science, Stanford)

My claim in this blog entry is that political science will remain principally an observation-based discipline and that our core principles of establishing findings as significant should consequently be based upon best practices in observational research. This is not to deny that there is an expanding branch of experimental studies which may demand a different set of principles; but those principles add little to confidence in observational work. As I have argued elsewhere (“Fisheries Management” in Political Analysis 2012), our model for best practices is closer to the standards of epidemiology than to that of drug trials. Here, through a review of the research program of Michael Marmot (The Status Syndrome, New York: Owl Books, 2004), I evoke the methodological affinity of political science and epidemiology, and suggest the implications of this affinity for evolving principles of transparency in the social sciences.

Two factors drive political science into the observational mode. First, as with the Center for Disease Control that gets an emergency call describing an outbreak of some hideous virus in a remote corner of the world, political scientists see it as core to their domain to account for anomalous outbreaks (e.g. that of democracy in the early 1990s) wherever they occur. Not unlike epidemiologists seeking to model the hazard of SARS or AIDS, political scientists cannot randomly assign secular authoritarian governments to some countries and orthodox authoritarian governments to others to get an estimate of the hazard rate into democracy.  Rather, they merge datasets looking for patterns; theorizing about them; and then putting the implications of the theory to test with other observational data. Accounting for outcomes in the real world drives political scientists into the observational mode.


Freedom! Pre-Analysis Plans and Complex Analysis 6

By Gabriel Lenz (UC Berkeley)

Like many researchers, I worry constantly about whether findings are true or merely the result of a process variously called data mining, fishing, capitalizing on chance, or p-hacking. Since academics face extraordinary incentives to produce novel results, many suspect that “torturing the data until it speaks” is a common practice, a suspicion reinforced by worrisome replication results (1,2).

Data torturing likely slows down the accumulation of knowledge, filling journals with false positives. Pre-analysis plans can help solve this problem. They may also help with another perverse consequence that has received less attention: a preference among many researchers for very simple approaches to analysis.

This preference has developed, I think, as a defense against data mining. For example, one of the many ways researchers can torture their data is with control variables. They can try different sets of control variables, they can recode them in various ways, and they can interact them with each other until the analysis produces the desired result. Since we almost never know exactly which control variables really do influence the outcome, researchers can usually tell themselves a story about why they chose the set or sets they publish. Since control variables could be “instruments of torture,” I’ve learned to secure my wallet whenever I see results presented with controls. Even though the goal of control variables is to rule out alternative explanations, I often find bivariate results more convincing. My sense is that many of my colleagues share these views, preferring approaches that avoid control variables, such as difference-in-differences estimators. In a sense, avoiding controls partially disarms the torturer.


The Need for Pre-Analysis: First Things First 1

By Richard Sedlmayr (Philanthropic Advisor)

When we picture a desperate student running endless tests on his dataset until some feeble point finally meets statistical reporting conventions, we are quick to dismiss the results. But the underlying issue is ubiquitous: it is hard to analyze data without getting caught in a hypothesis drift, and if you do not seriously consider the repercussions on statistical inference, you too are susceptible to picking up spurious correlations. This is also true for randomized trials that otherwise go to great lengths to ensure clean causal attribution. But experimental (and other prospective) research has a trick up its sleeve: the pre-analysis plan (PAP) can credibly overcome the problem by spelling out subgroups, statistical specifications, and virtually every other detail of the analysis before the data is in. This way, it can clearly establish that tests are not a function of outcomes – in other words, that results are what they are.

So should PAPs become the new reality for experimental research? Not so fast, say some, because there are costs involved. Obviously, it takes a lot of time and effort to define the meaningful analysis of a dataset that isn’t even in yet. But more importantly, there is a risk that following a PAP backfires and actually reduces the value we get out of research: perhaps one reason why hypothesis drift is so widespread because it is a cost-effective way of learning, and by tying our hands, we might stifle the valuable processes can only take place once data is in. Clearly, powerful insights that came out of experimental work – both in social and biomedical research – have been serendipitous. So are we stuck in limbo, “without a theory of learning” that might provide some guidance on PAPs?


Transparency-Inducing Institutions and Legitimacy 2

By Kevin M. Esterling (Political Science, UC Riverside)

Whenever I discuss the idea of hypothesis preregistration with colleagues in political science and in psychology, the reactions I get typically range from resistance to outright hostility.  These colleagues obviously understand the limitations of research founded on false-positives and data over-fitting.  They are even more concerned, however, that instituting a preregistry would create norms that would privilege prospective, deductive research over exploratory inductive and descriptive research.  For example, such norms might lead researchers to neglect problems or complications in their data so as to retain the ability to state their study “conformed” to their original registered design.

If a study registry were to become widely used in the discipline, however, it would be much better if it were embraced and seen as constructive and legitimate.  One way I think we can do this is by shifting the focus away from monitoring our colleagues’ compliance with registration norms, which implicitly privileges prospective research, and instead towards creating institutions that promote transparency in all styles of research, with preregistration being just one element of the new institutions for transparency.

Transparency solves the same problems that preregistration is intended to address, in that transparency helps other researchers to understand the provenance of results and enables researchers to value contributions for what they are.  If scholars genuinely share the belief that data driven research has scientific merit, then there really should be no stigma for indicating that is how one reached one’s conclusions.  Indeed, creating transparency should enable principled inductive research since it creates legitimacy for this research and it removes the awkward need to state inductive research as if it had been deductive.