Data and bias

This tweet recently made the rounds of twitter:

Justin Wolfers has since deleted his defense.

But… here’s my 2 cents as someone who isn’t bringing in over half a million per year in salary from the University of Michigan (Justin Wolfers and Betsey Stevenson’s salary info is available online as state employees):

1.  100K is a lot, and if you don’t think it’s a lot, there’s a problem.  To speak in terms that the top 2% can understand, that’s a whole new personal assistant.

2.  The motivation of 100K is not really as big a deal as getting stunning data. Just the data by themselves are incentive to not bite the hand that feeds the researcher (in this case Uber).  And the Uber data are stunning.  They’ve helped us learn a lot about human behavior and contingent labor markets, and probably lots of other stuff that’s more industrial organization.

Does that mean that you can’t trust anything that comes out of the Uber data, or any other study where the company has generously provided data?

No.

But it does mean that you need to think really hard about the studies that do come out of the data (and the studies that don’t come out as well).

Ask yourself:

Does the company (or in some cases, government agency) benefit from the study results?  If not, then it’s probably ok.

There are plenty of amazing studies using the Uber data that tell us about the type of employee who uses the contingent labor market and what their preferences are.  Uber has no reason to benefit from or to suppress this information.  The studies are orthogonal to influences that Uber might be giving (purposefully or not) to grateful researchers.  These results are probably trustworthy, that is, they can be evaluated on the merits of their own internal validity.

If the company would have cause to benefit from the results– then you might be more cautious.  Not that a good economist would purposefully fudge data or results.  They don’t need to.  With any research project there are a lot of decisions that need to be made about specifications and samples and data cleaning.  Researchers just have to unconsciously feel grateful to the company to bias themselves with these choices, particularly if they don’t have a pre-analysis plan.  (And even if they do have a pre-analysis plan, they might still choose what they unconsciously think will benefit, or at least not hurt, the company).

On top of that, there’s selection bias in the choice of research question.  Even excellent economists will choose to just not go places that might make the company look bad when said company has provided data.

Similarly, negative results can be suppressed by the data provider.  I know of a case where the US government suppressed one of my colleague’s research findings that made their agency look bad after providing him with data (though they did allow someone else to publish the same negative findings later under a new, less fascist, government regime).  Any time that clearance is required to share results, that can be a problem.

To sum:

Just data provision is enough to bias research results.  If a company provides data, then results that show the company in a positive light will be shown and results that show the company in a negative light will not be shown to the public.  Results that don’t affect the company one way or the other are probably fine and can be evaluated on their own merits.

There’s a lot to be said for data that come from legal requirements (ex. FOIA), are available from third parties, or from internet leaks.

It is important to know who provided the data, not just who provided the funding, when doing disclosures.

8 Responses to “Data and bias”

  1. teresa Says:

    I know exactly nothing about this story but the tweets feel so much like “No way, we’re not influenced by the pharmaceutical wining&dining/funding/etc at all, we just really love this med/device/whatever.” In my field at least there’s been a growing push to make all data available on request *and* to publish negative results *and* to be extremely explicit about what aspects of data generation, study design, analysis, and paper writing/editing a company was involved with, all of which is great and also far from perfect for all the same reasons you described.

  2. CG Says:

    This makes me think, not for the first time, that I entered the wrong branch of social science! No one has ever offered to pay me any dollars to do a study about them!

  3. undine Says:

    $100,000? (laughs in humanities, where I know I will never see a six-figure salary).

  4. Revanche Says:

    I’ll never not be amazed when researchers act appalled at the idea that they might have a bias that should be declared, as if their pinky promise that they have actual data, or that it’s good clean not fiddled with in any way data, should be good enough and that goes ten times for data collected by a *company* that has anything to do with the data in any substantial way.

    I don’t even think raw data available upon request is going far enough, since even researchers have a version of the dog ate my homework (hi Retraction Watch), I think it should be required at the time of publication as a baseline standard. Doesn’t change anything about the data itself, one hopes, but maybe it’ll be a minor deterrent for those acting not in good faith.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: