The Need for Pre-Analysis: First Things First 1

By Richard Sedlmayr (Philanthropic Advisor)

When we picture a desperate student running endless tests on his dataset until some feeble point finally meets statistical reporting conventions, we are quick to dismiss the results. But the underlying issue is ubiquitous: it is hard to analyze data without getting caught in a hypothesis drift, and if you do not seriously consider the repercussions on statistical inference, you too are susceptible to picking up spurious correlations. This is also true for randomized trials that otherwise go to great lengths to ensure clean causal attribution. But experimental (and other prospective) research has a trick up its sleeve: the pre-analysis plan (PAP) can credibly overcome the problem by spelling out subgroups, statistical specifications, and virtually every other detail of the analysis before the data is in. This way, it can clearly establish that tests are not a function of outcomes – in other words, that results are what they are.

So should PAPs become the new reality for experimental research? Not so fast, say some, because there are costs involved. Obviously, it takes a lot of time and effort to define the meaningful analysis of a dataset that isn’t even in yet. But more importantly, there is a risk that following a PAP backfires and actually reduces the value we get out of research: perhaps one reason why hypothesis drift is so widespread because it is a cost-effective way of learning, and by tying our hands, we might stifle the valuable processes can only take place once data is in. Clearly, powerful insights that came out of experimental work – both in social and biomedical research – have been serendipitous. So are we stuck in limbo, “without a theory of learning” that might provide some guidance on PAPs?