There’s nothing to get a scientist’s heart pumping like a good, old-fashioned statistical debate. When it comes to topics like finding Earth analogues or hints of a biosignature in an atmosphere, those statistical debates could have real world consequences, both for the assignment of additional observational resources, but also for humanity’s general understanding of itself in the Universe. A new paper from two prominent exoplanet hunters, David Kipping from Columbia and Björn Benneke from UCLA, argues that their colleagues in the field of exoplanet detection have been doing statistics all wrong for decades, and make a argument for how better to present their results to the public.
While statistics might seem an arcane part of the overall process of space exploration, it is absolutely critical for the advancement of science. Proving a phenomena (or a planet) exists beyond a shadow of a doubt requires the data to support a certain amount of “statistical significance”. There’s a mathematical formula for this, known as Bayes theorem, but also a basic human understanding, and the confusion seems to be in how to translate the math into something the public can understand and accept about a scientific finding.
The translation detailed in the paper is between Bayesian (i.e. the likelihood that something happened vs it not happening) and “frequentist” statistics (i.e. how surprising it is that this happened). In frequentist terms, this is commonly called a “sigma” value after the Greek letter used in its mathematical description. And that sigma value lies at the core of the conflict, according to the paper.
Fraser discusses what a finding of DMS means for our understanding of exoplanets.Sigma values gained prominence for its role as part of the discovery of the Higgs boson at the Large Hadron Collider in 2012. Its statistical significance of “five sigma” introduced the concept of frequentist statistics into the public discourse about science, and has served as an anchoring point for those conversations ever since.
There is a mathematical formula to translate Bayesian statistics into frequentist statistics, and the method usually adopted by exoplanet hunters was laid out in a paper in 2001 by a group of statisticians. A follow-up paper from 2013, more adapted to the needs of exoplanet hunters specifically (and co-authored by one of the authors of the new paper - Dr. Benneke) further cemented the use of this conversion in the academic literature. However, a typographical error in the 2013 paper could have contributed to a mis-interpretation of the significance of the statistics - it mentioned “at least” a sigma value, whereas it should have said “at most”.
Whatever the cause of the disconnect, the authors argue that, since the earliest days of exoplanet hunting, its practitioners have been incorrectly representing the likelihood of their discoveries by misconstruing the conversion factor from Bayesian to frequentist language. One particular example they note is the recent (admittedly already controversial) detection of dimethyl sulfide (DMS) in the atmosphere of exoplanet K2-18b. They argue, given the limitations of the Bayes factors, that the title of the paper presenting evidence for that finding should have said “less than 3-sigma” significance.
Paper author David Kipping discusses the statistics of finding alien life. Credit - Cool Worlds YouTube ChannelWhile that might seem like a minor quibble, part of the point is to demonstrate that the significance might be significantly less than three sigma, calling into question the whole finding to begin with. That might not be the case for this particular finding, but the sloppy statistical methodology could lead to confusing results in the future.
So what to do? There are several, more rigorous statistical methods to convert between Bayesian and frequentist statistics, but to the authors it’s much easier to just use Bayesian factors themselves. The premise that the public is unaccustomed to their use isn’t true - gambling traditionally uses Bayesian factors, though they’re described as “odds” in that language. If exoplanet scientists start to use that familiar language, maybe their results will be more widely accepted. Or maybe another camp or rival exoplanet hunters will publish a meme-filled journal article about the need for frequentist statistics. Either way, science will continue to progress with the collection of more data, and there will continue to be debates about what that data means as long as there are scientists to argue about it.
Learn More:
D Kipping & B Benneke - Exoplaneteers Keep Overestimating Sigma Significances
UT - Is There Life on an Alien Planet? Fresh Findings Revive the Debate
UT - Did You Hear Webb Found Life on an Exoplanet? Not so Fast…
UT - There are Many Ways to Interpret the Atmosphere of K2-18 b