One of my students sent me this article because we spend some time in class covering Type 1 and Type 2 errors.
All the .05 threshold means is that you have a false positive 1/20 times. A .005 threshold would say you’re getting a false positive 1/200 times. So by moving to a .005 threshold, you’re less likely to get a false positive. That’s good, right? In common parlance, we’d be less likely to send an innocent person to jail.
Well, that depends. At the .005 threshold, we’re more likely to get a false negative than you would at the .05 level. That means we’d be more likely to get a guilty person go free. (Indeed, the only guaranteed way to send no innocent people to jail would be to send nobody to jail. I, for one, am happy that folks like Charles Manson are behind bars.)
It isn’t as easy as saying, oh we should just switch to .005. When you adjust the p-value you’re making trade-offs between type 1 and type 2 error. With a lower p-value threshold you’re going to be getting a lot more false negatives even with fewer false positives. What we always need to be cognizant of when we’re doing policy is that significance isn’t everything– we also have to think about what the damage is if this information turns out to be incorrect. For example, doctors recommend that pregnant women should heat up cold cuts if they’re worried about listeria, which is a very low probability event but if it happens it’s horrible. It’s pretty easy to avoid room temperature cold-cuts for 9 months, so unless there’s some other difficulties attached to diet, women will probably follow this recommendation. (And if one accidentally eats room temperature coldcuts while pregnant, one shouldn’t freak out because the probability of getting listeria is very low!) But if we’re talking about something like doing chemotherapy or surgery, that’s a much more onerous action and we might want to be more sure we need it before going ahead with it.
Another thing to note is that the article talks about how physics and genetics have already made this switch, while most social sciences haven’t. One big difference between the fields that have made the switch and the fields that have not is how easy it is to get large samples. A larger sample size will make it so your sample behaves more and more like the population that you’re trying to study. We can reduce both Type 1 and Type 2 errors simply by increasing the sample size. So why don’t we do that? Well, it turns out that increasing the sample size can be very very expensive when you’re dealing with people and behavior. Sometimes doing the study with a large enough sample to get 80% power and an alpha of .005 might be more expensive than just throwing that same money at the intervention you’re trying to decide about, whether or not it actually works. There probably is some resistance because people in these fields want to be able to publish their 5% results, but that’s not the main or only reason we haven’t yet made the switch. Research is complicated and expensive and we have to make trade-offs.
The context for these really does matter, and you shouldn’t necessarily put off making policy choices just because your sample size is too small to get significance (or to make policy changes just because you have significance). You always have to be aware of the costs and the benefits.
(Incidentally, in case he comes across this, Hi Dan! I’m assuming that the reporter greatly simplified your arguments here because I know you must know this stuff.)