Just because some research says X is good and some says X is bad, doesn’t mean we don’t know if X is good or bad.

Research quality is also important.

Correlation is easy to measure.  When X and Y are related, there are many methods we can use to figure out how much they’re related, how much they covary.  Causation is not so easy.   Is it X causing Y, Y causing X or some third factor Z that causes both?

The gold standard of getting at causation is the randomized controlled experiment.  When done well, randomized controlled experiments are internally valid.  In the setting tested, we can say that X causes Y if when X is varied, Y varies as well.

Randomized controlled experiments may not be externally valid.  The subject pool may not act the same as all people who aren’t undergraduate psychology majors.  The general equilibrium effects may be different if adding money for one intervention takes away money from another intervention, rather than leaving everything else the same.  Additionally, an intervention may work great on a small set of people but may flounder with a much larger set (ex.  training out of work people to be welders– great when it’s a small number or people, not so good when every unemployed person can now weld).

We can’t always do a randomized controlled experiment.  Sometimes the interventions would be illegal, unethical, inappropriate for a lab, or just too expensive.  Social scientists have a number of ways to get at causality when that’s the case.  Notably, economists use “natural experiments” — exogenous shocks to the treatment that, with some fancy math, can be used to isolate the causal mechanism from what is correlational but not causal.  Popular methods include something called “differences-in-differences” which is a way to subtract out bias by using two (or more) imperfect treatments (changing state laws over time are popular), and “instrumental variables” in which you use a Z variable that is related to your X variable but is only related to your Y variable through X, so you know that the Z part of X is causally affecting Y.  There are other techniques that can be used such as regression discontinuity design or propensity score matching that have various positives and drawbacks.

It doesn’t matter if 20 published education papers find that X and Y are related and then make the claim that X causes Y.  That doesn’t mean that X causes Y.  Standards of publication for causal claims are different in different fields.  But if the same claim is published in a high quality psychology journal, then you can be pretty sure that they did a randomized controlled experiment to figure out causation, and they probably got it right, at least from an internal validity standpoint.

If the same claim is published in a high quality economics journal, then they may not have done a randomized controlled experiment, but they probably did the best that can be done with a high quality quasi-experiment or natural experiment.  (Ignoring the subset of theory papers that can prove anything and are still published in high quality journals…)  These economics findings may be more likely to be externally valid than the psychology findings, but it will depend on what kind of natural experiment the authors exploited.  If they only studied teen moms, then the findings may not be relevant to single men over the age of 50.

So just because research is mixed on a topic doesn’t mean we don’t actually know the answer.  If some of the research is crap, and some of it is good, then you can ignore the crap part and just focus on what is good.  How can you tell what is good?  Well, that’s a bit harder, but keeping in mind that correlation is not causation and looking hard for what the authors are actually measuring is a good first step.

Do you get frustrated when reporters report on research without having any idea about the quality of the research?  How do you winnow out the wheat from the chaff?

34 Responses to “Good vs. bad research”

1. Liz Says:

In the fields that I am most familiar with, a publication will only ever become a news report if it is published in a very high impact journal in the field, as those are the publications that the institution’s media office will typically promote and is their approach to separate wheat from chaff. With that level of journal, I would hope that it is very clear in the original publication that the authors are not concluding that X causes Y if there hasn’t been the most rigorous assessment of causation. If this is not the case, I think there is a failing of the peer review process to let overgenralized conclusions be made. If the lack of certain causation is clear in the scientific publication but becomes ignored in the news report, than that is a failing of scientists communicating with journalists and, in my experience, the failing could have been with either side, depending on the individuals involved.

• nicoleandmaggie Says:

Sometimes the journalists don’t communicate with the scientists. I’ve found myself quoted in print by people I’ve never talked to. It’s hard to communicate when you have no idea that you’re having a conversation!

I have my students bring in examples of news articles assuming that correlation means causation each year. Probably 70-80% of these articles are either about medicine/health or education.

2. Cloud Says:

I get very frustrated by reporting on studies, particularly ones on health/environmental risks. It drives me nuts that people suspect all sorts of terrible motives out of just about any company, but will believe a study from a “public policy group” that they’ve never heard of without question. I have ranted on this repeatedly on my blog.

A related, but less rant-inducing problem, is the fact that people don’t understand that our knowledge is almost always incomplete. This is particularly annoying in the parenting arena, where people will point to a study and say “SEE. This is how we’re supposed to do it!” without regard to the limits of that study. In the past, there were some truly awful child-rearing practices that were based on the best science of the day. It just happens, that science was incomplete, and later studies made it clear that other practices would be better. Do we really think we’re that much better at the science now? I don’t. That’s why I’ll take the research as one component of my decision about what to do, but will also think it through for myself, based on what it seems my child needs.

• nicoleandmaggie Says:

Of course a lot of parenting “science” is not based on anything scientific at all, just random unproven theories. Book writers just talk as if they’re experts when in reality many of them are just crackpots with an agenda to sell books by harnessing parental guilt.

Did I say parental guilt? Silly me. Maternal guilt.

3. bogart Says:

First let me say that I think @Cloud’s “I have ranted on this repeatedly on my blog” should probably become a t-shirt. Or an index. Maybe both.

As for me, where this drives me nuttiest (by far) is clinicians, e.g. the conversation I had with my GP (who for the record I like and trust, overall) pre- and post- screening mammogram (as a woman in her 40s with no identified risk factors, I didn’t want one and she thought I should have one. As evidenced by the “post-,” I ended up yielding). Her point was that she has seen women like me have breast cancer be diagnosed and treated that way, and of course mine was that diagnosis and treatment are not necessarily good things — that they may damage quality of life while failing to prolong (or even shortening) it.

(Then there were the women on the online mom’s board telling me I was nuts to consider not getting a mammogram because it is important to use available preventive measures. Mammograms. Prevent. Nothing.)

I don’t expect much of the press (other than that they give me enough information to find the original study). Said expectations (with the possible exception of the parenthetical one) are frequently met.

• chacha1 Says:

I am noncompliant re: mammograms too. :-) I have no family history, no risk factors, and no reason to believe that THAT is what’s gonna kill me. Once every 5 years is plenty.

And I also don’t expect much from the “press”. A lot of people working as journalists today, *especially* on the web news services, seem to be functionally illiterate. Their work is poorly (if at all) edited, and given that, I have no reason to believe there is any fact-checking at all going on.

Dan at Casual Kitchen wrote on this very topic recently. :-)

• bogart Says:

Yeah, I guess I think of myself as semi-compliant. I allowed myself to be talked into it this time partly to acquire a baseline, and I imagine I might start going every 3-5 years or so, for now (and obviously barring new and alarming information).

I’ll admit that by virtue of being a mom to a small child, I am sympathetic to the thought that if, heaven forbid, I did develop breast cancer, even if mammograms wouldn’t actually have made any difference to the final outcome, I might still prefer to know I’d “done everything I could.” And as I was being annoyed about the uncertainty (should I or shouldn’t I), I realized that if we’re to the point where the last woman in line as a candidate for a mammogram doesn’t know whether she should or shouldn’t, then we’re actually at the right point (there will never be a clear cut point). Still …

• nicoleandmaggie Says:

That would make an awesome t-shirt.

Jon Gruber gave a really good explanation about Type 1 and Type 2 error and how preventative diagnostics don’t save money in the Q&A for a talk he did at College of the Holy Cross. It’s available on CSpan on the internet.

• nicoleandmaggie Says:

p.s. I have risk factors so will be doing mammograms on the recommended schedule.

• I am still pondering a post on the different types of errors… The police responded to my home alarm going off the other morning. We hadn’t tripped it so I had them come look around. They told me they get thousands of false alarms per year. I imagine that responding to these alarms drains resources from preventing more probably crimes. But people feel safe with alarms…even though in aggregate with distracted police we might be less safe. Hmmm….

• nicoleandmaggie Says:

Go for it!

The mammogram is a pretty common example in statistics classes for the cost of false positives. I think someone has run the actual numbers on it for different ages (which is why they changed the guidelines) and in addition there’s the anxiety and fear from a false positive. Of course, false negatives aren’t much fun either.

4. First Gen American Says:

Word.

I can’t even elaborate because it’ll lead to a multi-page rant about how many stupid consumers will blindly follow some latest “health or safety trend” like sheep without knowing the facts behind the research.

5. Debbie M Says:

Yes, I can’t even stand to read pop social science or nutrition articles anymore. Even when they do explain what they did, it usually makes me angry. (For example, since when is “hot chocolate” made of nothing but cocoa and water? I do not care about any research involving the health effects of that!)

After getting degrees in social science and then becoming a typist for zoology professors I even occasionally wanted to shake them. “The results were not statistically significant, but …” I told them I did not want to type the part starting with “but.” The difference is not statistically significant–therefore you have no evidence that there is a real difference, so don’t conclude that your difference is just small. Maybe it is, maybe it isn’t. Grr!

• nicoleandmaggie Says:

I have to admit that I am guilty of “suggestive results”… because sometimes your sample size is just too small to really say anything conclusive but at least the magnitude is the right direction and merits further research. (Never with main results, only with side results, usually when looking at sub-populations.) Statistical significance isn’t everything!

• bogart Says:

… yeah, personally I’m good with that as long as the “… but … ” is along the lines of ” … but further research is needed …” or, sometimes, “… but given the difficulty [or impossibility] of conducting further research in this area, we recommend that …” or whatever.

• Debbie M Says:

I can be cool with suggestive conclusions for side results or preliminary studies. But if your sample size is too small for your main results, then a) you shouldn’t have wasted your resources on the study to begin with or b) you should admit that the difference is smaller than you originally thought you would care about.

You will (virtually) always see differences between groups; the whole point of statistics is to have a better guess at whether these differences are real or flukes. So when stats say “probably a fluke” … (I also have a problem with people changing their minds about which type of error they care about AFTER they have seen the results.)

• nicoleandmaggie Says:

With the kind of research we do, we don’t get to choose the sample size, especially of subgroups. We mostly have secondary data. And sometimes the sample size is too small to say anything conclusive… but it’s large enough to say something suggestive. It’s not like we can do a power test and pick that number of subjects.

• Debbie M Says:

PS, I’m not a real scientist, just a bureaucrat, so these beliefs do not affect my livelihood or anything at all. (Except that everything came out statistically insignificant in my master’s thesis except two odd factors that didn’t mean much.)

6. chacha1 Says:

This may be one reason I prefer reading history to other forms of nonfiction. Historians may speculate about when, how, and particularly *why* a certain event happened, but there is generally no getting around the fact that something DID happen. :-)

Generally speaking, in my field of interest (health, fitness & nutrition) think anything reported in a publication that is on the corner newsstand is suspect. I think anything reported in a non-peer-reviewed newsletter supported by sales of the author’s own miracle product is even more suspect. I am a cynic and a skeptic, and I think if a prescription drug is being advertised on television, that is a drug I never want to take. … There is some good solid research out there but very little of it is presented for the lowest-common-denominator reader.

Men’s Health magazine does the best of any mass-market publication I’ve looked at (much better than their sister publication Women’s Health, which is insultingly fluffy by comparison), but people who want to get digests of the original studies really have to read the professional journals, and let’s face it … they are no fun to read.

• nicoleandmaggie Says:

A lot of the health stuff published in peer-reviewed medical journals is garbage too. You would not believe some of the stuff we’ve seen. One of my friends begged her student’s MD dad not to thank her on a truly dreadful paper he got published in a medical journal because she didn’t want her name associated with his ridiculously terrible methodology. I can think of other examples from medical journals related to my own research where you’re just like… really? When they ask this same question in bio or social science they actually do more than just speculation.

• chacha1 Says:

Maybe every MD doing research is just trying to get tenure (or a raise) by publishing *something,* no matter how crappy it is? It really is pretty bad out there.

I kind of just assume that every health study is funded by a pharmaceutical company or supplement manufacturer. Since my goal in life is to avoid taking drugs of any kind, and since the science behind *that* is largely anecdotal, and since each person’s health is 100% unique, I rely on mindful attention to my own body and don’t go looking for trouble. (In fact that may be part of the problem with health reporting. People are always looking for answers but they are always looking OUTSIDE.)

Even in the realm of anecdote, it seems to be very rare that a serious illness is entirely asymptomatic. Therefore, as long as I continue to BE entirely asymptomatic, I will blunder along on my whole-food-eating, wine-drinking, yoga-posing way. :-)

• nicoleandmaggie Says:

Many MDs don’t actually have to take statistics classes in med school, and the quality of the classes they take prior to med school for the stats requirement can vary quite a bit. (Think high school level stats at a community college.) So even if they’re not being funded by industry, there are some pretty terrible studies out there.

7. Do you get frustrated when reporters report on research without having any idea about the quality of the research?

No, I don’t. This is because my expectations of the debased journalistic profession are zero, and I would sooner drive nails through my motherf*cken dicke than read anything some shittehead science journalist has to say about actual scientific research.

Honestly, I’ve simply given up. I just can’t care anymore about the wholesale rejection of reality that all of Western society is currently stricken by. I am keeping my head down and directing my own research program for as long as it is still possible.

8. The abuse of statistics and study results is sad but I do think some reporters try to get it right. People are also up against the problem that certain narratives make coherent sense, and so people prefer them even if they’re not, technically, true (that heuristic…). So we like the narrative of someone getting a mammogram, catching their breast cancer early and hence getting treated and then going into remission. When the guidelines came down that in the aggregate perhaps lives weren’t being saved by the recommended schedule, you could see people sputtering. And plenty of physicians practice medicine by anecdote.

• nicoleandmaggie Says:

Yup, one of the main findings of the Dartmouth Atlas project is that many physicians practice medicine by anecdote (specifically what everyone else was doing during their residency)… and that accounts for a chunk of increased medical costs in the US (some estimates up to 1/3 of medical spending could be cut if all physicians used “best practice” rather than regional differences).

• gavinpandion Says:

I was starting to suspect something like that was going on. It’s really startling how big the disconnect between research and practice is even in a field where you would naturally expect best practices to be a moving target.

9. gavinpandion Says:

I appreciate the need to vent about people being too quick to assume that all research findings are equally valid, but I did want to chime in and add that I have seen more than one horrifyingly dumb argument play out in the letters section of Lancet, and I suspect that this is one leading journal that is taken very, very seriously yet is not above entertaining a false controversy, once interest groups get entrenched on either side of a non-issue and become noticeably adversarial towards each other, beyond simply having conflicting results. I don’t browse the ranking journals enough to have more of an opinion about it than that, but I have misgivings about the argument that the more highly esteemed the journal is, the less you have to worry about evaluating the quality of the research being presented in its pages. I think ultimately people are going to have to get better at evaluating the appropriateness of the methods and integrity of the arguments themselves, because there will always be pandering, corruption, and undue influence for various reasons compromising the value of authority within academia.

10. Revanche Says:

A little off topic but an interesting thing I learned about the psychology journals: In recent years, there’s a movement to note whether or not a highly cited article making a particular claim hasn’t been replicated within a certain number of years, perhaps particularly by the original authors – I wasn’t clear on this point. There’s a site where those in the field can nominate and perhaps vote (the name slips my mind) on an annual basis the most egregious offenders, and the value of those articles are penalized for lack of rigor. I think there may be safeguards, or there should be, against salami slicing, but it should be a step in the right direction to make sure that if a controversial or newish branch of research has been explored and a claim is posited as supported, then it must be replicated within a reasonable timeframe thereafter or the single claim by a single group cannot continually be held up as the single point of reference for that argument.

I liked that – and think it should be more widespread.

And to speak to gavinpandion’s point: there is a fair amount of politics that is played out in academia which is a small enough world in many specialties and there are flaws in the assumption that a “high ranking” journal is the measure by which one should judge the value of the material published within. To simplify it greatly, the ranking of the journal itself (when you’re talking about the Impact Factor) is based on number of citations, at its core, against the number of the papers published (the actual algorithm is more complicated) and if the journal requires a high ranking to be elite, that immediately introduces an inherent need to eliminate papers that won’t garner high interest and citations.

It happens to be the biggest acknowledged game in town but people are quite interested in other forms of metrics to challenge the Impact Factor because it can be manipulated, and journals do both legitimately and illegitimately do so.