The Replicability Crisis As It Applies To Big Datasets

The hallmark of science is that its results can be reproduced. If I shine light through a prism, and I describe what I see, you should be able to read what I wrote and do the experiment again, yourself, and get the same results. That’s why science isn’t opinion, and why it labels itself empirical. Unfortunately, reproducibility rates aren’t the 95% we hoped for; they’re nowhere close.

People like to talk about this. It’s a very important topic. And, a series of solutions have been developed that seek to improve replicability, or to at least mark the studies that are more likely to replicate (preregistration, registered replication reports, the 21-word solution). But, unfortunately for people like me that do personality development research with large, already-collected archival datasets, these discussions don’t apply very well. Often, near-term replication is impossible, because there aren’t any other datasets have collected the same variabiles as the ones you are examining (much of this is a function of personality development being a relatively new science; there aren’t too many long-running studies with Big Five variables collected at multiple time points). Researchers who have found that some personality traits codevelop with obesity over time (Sutin et al., 2009) don’t have another dataset to replicate this finding with.

The easy solution to this problem is to pre-register. If a person reports their intended procedure and their theories ahead of time, there is no wiggle room for them to flip thousands of coins and only report the string of four heads in a row. Unfortunately, this solution isn’t viable to most personality development researchers for two reasons. First, we don’t know what we’re working with until we peek at the data, and after we peek at the data, we can’t pre-register any longer, because we know what’s in the data to some extent. A friend of mine was interested in looking at schizoid tendencies and personality, until she looked at the data and discovered that too few participants actually reported schizoid tendencies, precluding any further statistical tests. Pre-registration, for her, would have been a huge waste of time. Second, we have so few datasets to work with that we often re-use datasets in future studies. As part of one study, I analyzed openness. In the process, I learned about how that scale functioned in the dataset — older people were less open, and everyone answered the questions similarly across the lifespan. So now, I can’t pre-register any study using that openness variable and that dataset, because I know to some extent how openness functioned, and that will influence my subsequent analytic decisions and bias them towards significance.

So, what do I do if I can’t replicate because there aren’t any longitudinal datasets for me to replicate with, and I can’t preregister because I’ve peeked at the data? I can tailor the strength of my conclusions to try to account for the inevitable P-hacking I’ve done. Thankfully, I already try to do that, and so do other personality researchers for the most part. Everything is explicitly labeled as exploratory, and all causal language is preceded by the words “may” or “potentially.” But, we all lie to ourselves when we say we don’t P-hack. We definitely tailor our analytical decisions to what we see in the data. I found that openness and time spent reading books weren’t nearly as related as I thought they would be, and I explained this by mentioning all the bestselling Shades of Gray that everyone is reading that aren’t doing anything for their openness. But, had I found something, I would probably have continued analyses. And that’s something that i CAN admit to myself; I probably made countless other decisions like this that biased my findings towards significance, and all I can do is walk it back with my words.

Long story short, if there is a replication crisis in personality development, it’ll be a long time before we will even know. So let’s do the research as best as possible the first time: build analysis plans for our data after checking the initial descriptive statistics, stow em in our labs somewhere, and don’t deviate from them. Report all tests and all data exclusions. And, if other datasets show up with the variables you’ve analyzed, give em a shot, and cross your fingers!