For those following our Berkeley Initiative for Transparency in the Social Sciences (BITSS), and others who appreciate good scientific humor, the following XKCD comic cleverly illustrates the problem that CEGA researchers and our concerned colleagues seek to address:
“Significant,” courtesy of XKCD
And for those who love green jelly beans and are prone to acne: it’s probably safe to keep eating them in large quantities.
CEGA is hosting a series of opinion pieces by leading scholars, all addressing issues of transparency in empirical social sciences. CLICK HERE to read and comment.
By Donald P. Green (Political Science, Columbia)
Not long ago, I attended a talk at which the presenter described the results of a large, well-crafted experiment. His results indicated that the average treatment effect was close to zero, with a small standard error. Later in the talk, however, the speaker revealed that when he partitioned the data into subgroups (men and women), the findings became “more interesting.” Evidently, the treatment interacts significantly with gender. The treatment has positive effects on men and negative effects on women.
A bit skeptical, I raised my hand to ask whether this treatment-by-covariate interaction had been anticipated by a planning document prior to the launch of the experiment. The author said that it had. The reported interaction now seemed quite convincing. Impressed both by the results and the prescient planning document, I exclaimed “Really?” The author replied, “No, not really.” The audience chuckled, and the speaker moved on. The reported interaction again struck me as rather unconvincing.
Why did the credibility of this experimental finding hinge on pre-registration? Let’s take a step back and use Bayes’ Rule to analyze the process by which prior beliefs were updated in light of new evidence. In order to keep the algebra to a bare minimum, consider a stylized example that makes use of Bayes’ Rule in its simplest form.
By Maya Petersen, Alan Hubbard, and Mark van der Laan (Public Health, UC Berkeley)
Statistics provide a powerful tool for learning about the world, in part because they allow us to quantify uncertainty and control how often we falsely reject null hypotheses. Pre-specified study designs, including analysis plans, ensure that we understand the full process, or “experiment”, that resulted in a study’s findings. Such understanding is essential for valid statistical inference.
The theoretical arguments in favor of pre-specified plans are clear. However, the practical challenges to implementing such plans can be formidable. It is often difficult, if not impossible, to generate a priori the full universe of interesting questions that a given study could be used to investigate. New research, external events, or data generated by the study itself may all suggest new hypotheses. Further, huge amounts of data are increasingly being generated outside the context of formal studies. Such data provide both a tremendous opportunity and a challenge to statistical inference.
Even when a hypothesis is pre-specified, pre-specifying an analysis plan to test the hypothesis is often challenging. For example, investigation of the effect of compliance to a randomly assigned intervention forces us to specify how we will contend with confounding. What identification strategy should we use? Which covariates should we adjust for? How should we adjust for them? The number of analytic decisions and the impact of these decisions on conclusions is further multiplied when losses to follow up, biased sampling, and missing data are considered.
By David Laitin (Political Science, Stanford)
My claim in this blog entry is that political science will remain principally an observation-based discipline and that our core principles of establishing findings as significant should consequently be based upon best practices in observational research. This is not to deny that there is an expanding branch of experimental studies which may demand a different set of principles; but those principles add little to confidence in observational work. As I have argued elsewhere (“Fisheries Management” in Political Analysis 2012), our model for best practices is closer to the standards of epidemiology than to that of drug trials. Here, through a review of the research program of Michael Marmot (The Status Syndrome, New York: Owl Books, 2004), I evoke the methodological affinity of political science and epidemiology, and suggest the implications of this affinity for evolving principles of transparency in the social sciences.
Two factors drive political science into the observational mode. First, as with the Center for Disease Control that gets an emergency call describing an outbreak of some hideous virus in a remote corner of the world, political scientists see it as core to their domain to account for anomalous outbreaks (e.g. that of democracy in the early 1990s) wherever they occur. Not unlike epidemiologists seeking to model the hazard of SARS or AIDS, political scientists cannot randomly assign secular authoritarian governments to some countries and orthodox authoritarian governments to others to get an estimate of the hazard rate into democracy. Rather, they merge datasets looking for patterns; theorizing about them; and then putting the implications of the theory to test with other observational data. Accounting for outcomes in the real world drives political scientists into the observational mode.
By Richard Sedlmayr (Philanthropic Advisor)
When we picture a desperate student running endless tests on his dataset until some feeble point finally meets statistical reporting conventions, we are quick to dismiss the results. But the underlying issue is ubiquitous: it is hard to analyze data without getting caught in a hypothesis drift, and if you do not seriously consider the repercussions on statistical inference, you too are susceptible to picking up spurious correlations. This is also true for randomized trials that otherwise go to great lengths to ensure clean causal attribution. But experimental (and other prospective) research has a trick up its sleeve: the pre-analysis plan (PAP) can credibly overcome the problem by spelling out subgroups, statistical specifications, and virtually every other detail of the analysis before the data is in. This way, it can clearly establish that tests are not a function of outcomes – in other words, that results are what they are.
So should PAPs become the new reality for experimental research? Not so fast, say some, because there are costs involved. Obviously, it takes a lot of time and effort to define the meaningful analysis of a dataset that isn’t even in yet. But more importantly, there is a risk that following a PAP backfires and actually reduces the value we get out of research: perhaps one reason why hypothesis drift is so widespread because it is a cost-effective way of learning, and by tying our hands, we might stifle the valuable processes can only take place once data is in. Clearly, powerful insights that came out of experimental work – both in social and biomedical research – have been serendipitous. So are we stuck in limbo, “without a theory of learning” that might provide some guidance on PAPs?
By Kevin M. Esterling (Political Science, UC Riverside)
Whenever I discuss the idea of hypothesis preregistration with colleagues in political science and in psychology, the reactions I get typically range from resistance to outright hostility. These colleagues obviously understand the limitations of research founded on false-positives and data over-fitting. They are even more concerned, however, that instituting a preregistry would create norms that would privilege prospective, deductive research over exploratory inductive and descriptive research. For example, such norms might lead researchers to neglect problems or complications in their data so as to retain the ability to state their study “conformed” to their original registered design.
If a study registry were to become widely used in the discipline, however, it would be much better if it were embraced and seen as constructive and legitimate. One way I think we can do this is by shifting the focus away from monitoring our colleagues’ compliance with registration norms, which implicitly privileges prospective research, and instead towards creating institutions that promote transparency in all styles of research, with preregistration being just one element of the new institutions for transparency.
Transparency solves the same problems that preregistration is intended to address, in that transparency helps other researchers to understand the provenance of results and enables researchers to value contributions for what they are. If scholars genuinely share the belief that data driven research has scientific merit, then there really should be no stigma for indicating that is how one reached one’s conclusions. Indeed, creating transparency should enable principled inductive research since it creates legitimacy for this research and it removes the awkward need to state inductive research as if it had been deductive.