Tuesday, June 14, 2011

Getting smart about Wise Crowds, or Some stuff even really smart people don't know about science

I found a discussion between scientists, science writers, and journalists online that I found really fascinating, I thought I would share thoughts here. It relates to a few common themes of mine: that scientific thinking is unnatural, and that statistical thinking is unnatural, the role of content knowledge in critical thinking, and that we should have some role for expertise in interpreting scientific results.

The discussion began with a column by neuroscience writer Jonah Lehrer, about a phenomenon called "the wisdom of crowds." Basically, the idea is that when estimating something very uncertain, sometimes the average crowd response can be "wiser" or more accurate, than even most expert responses. The classic example is from Francis Galton, who observed that the average crowd response for estimating the weight of a steer was better than most of the butchers who guessed. Lehrer's column was about a paper which showed that this wisdom of crowds phenomenon can be reduced (i.e. crowds get less wise) by making the crowd interact with each other.

Ok, so we're fine so far. The brouhaha begins when Peter Freed, M.D., an actual neuroscientist, writes a ranty blog post entitled "Jonah Lehrer is Not a Neuroscientist." He calls Lehrer to task for using "for example" when cherry picking a certain description of data from the paper. In other words, Lehrer acted as if the number he was citing was representative, when it was not. Freed uses this as a way of highlighting the difference between scientists, who look at data all day, and science writers, who do not. But Freed himself makes an error (and he confesses, and makes it part of his blog post) in confusing the median and the mean as measures of central tendency in this case.

Ok, now, maybe I have lost some of you, and I am going to back up just a second, because this is where I think it gets interesting. It depends on a (relatively) nuanced understanding of the three words I used above: median, mean and central tendency.

The vast majority of science that I am aware of takes a set of observations, and looks to describe those observations using some sort of quantitative measure. We don't want to compare apples and oranges, but once we are measuring all apples, we have a set of numbers, and we summarize those numbers to describe the group as a whole. Most of us are familiar with the word "average," and we don't give it any thought. It seems as if an average (adding up all of the numbers and dividing by the number of observations) is a natural, real, description of the set of apples we have in front of us.

But there are different kinds of ways to describe a group of numbers. We can say the most frequent response (called the mode, as in, "9 out of 10 dentists ranked it first"), or the middle response (called the median, when you take the SAT three times, and get a 2000, 2100, and a 1000, you want to count the median response, 2000, not the mean, which would be 1700). The most common is the mean, which most of us consider synonymous with average., as in, "When Bill Gates is in the room, everyone in that room is, on average, a millionaire." This is not even getting into the difference between the geometric mean and the arithmetic mean.

So first, a key point here is that none of these descriptions are more "true" or accurate descriptions of a set of data, they are simply one of many descriptions. This is a mistake made by Nicholas Carr  in his blog post discussing Lehrer's column, as he argues that
As soon as you start massaging the answers of a crowd in a way that gives more weight to some answers and less weight to other answers, you're no longer dealing with a true crowd, a real writhing mass of humanity. You're dealing with a statistical fiction. You're dealing, in other words, not with the wisdom of crowds, but with the wisdom of statisticians. There's absolutely nothing wrong with that - from a purely statistical perspective, it's the right thing to do - but you shouldn't then pretend that you're documenting a real-world phenomenon.

Carr draws the line between the real-world itself (which he identifies with the arithmetic mean) and a statistical fiction. None other than Kevin Kelly (among others) takes issue with Carr on this point in the comments, which I think are really worth a read. 

Given that we accept that all descriptions of a set of data are to some degree interpretation, not simply observation, where do we go from here? In a follow up post, Freed describes the lambasting he received from some of his critics, and relates a funny (fictional) story about his third grade class as it relates to the choice between median and mean.  The critical moment is when the principal writes to the statistical research firm: "You may know a lot about statistics, but you don’t know anything about third graders. You get an F, for Fired"

Which leads me to my main point. Statistics is not a purely "scientific" position or a purely aesthetic decision, as Freed claims in a comment on another blog post, by the physicist Chad Orzell. The statistics that scientists use reflects both a basic understanding of distributions of data (what it means when there are a lot of extreme responses, leading to skew, or other non-normal conditions). But the statistics used also depend critically on some knowledge of the phenomena itself. When cognitive psychologists analyze reaction time data, they not only look at the distribution of responses (which will just about always be right skewed) but also consider the task that they are reacting to. What does it mean when most people take 1 second to respond, but a few take 2 minutes? Is that "real" data, or did they fall asleep, or answer their cell phone?

As Orzel points out, there are valid criticisms that one can make about use of statistics in pop science, but at some point, you have to engage with the actual science of the article. In this case, there is a rich literature of making a decision under uncertainty, and the phenomenon of the wisdom of crowds. To criticize the pop science writing, you need to know something about that science, which Freed seems not to. Not only that, Freed celebrates it:
Here’s my deep point. I don’t care about straight psychology – straight psychology is, not to pull punches, over.  I care about neuroscience. And Lehrer was not trying to be a neuroscientist in this article.  This was a straight-up psychology article.  But modern neuroscience, his chosen wheelhouse – particularly the subfields of behavioral and affective and cognitive and social neuroscience – is radically more complex than straight psychology.

Where are the lessons in all of this?
Lesson #1: Even really smart people make mistakes about science. Even scientists make mistakes about science, especially when not directly in their area of expertise. We should be wary of treading into another field of science, proclaiming it "over" and declaring that our domain is "radically more complex." The scientists working in that field are smart people who have found it to be plenty complex. 
Lesson #2: When smart authors and their critics engage in a well-moderated comment section, it can be an amazing way to learn. The commenters on the posts above are generally articulate and educated about the topics. Lehrer responds to Freed in his comment section. Nicholas Carr's post in particular features an honest and thoughtful exploration and arguing with people. To me, this supports the recent theory that people are more reasonable when they argue because human reason exists not to discover truth, but to persuade other people.
Lesson #3: You can learn a lot by reading a few good posts and comment sections, but there is still great value in subject matter expertise. In this case, as Orzel entreats us, actually read the article by the scientists who did the study in the first place. But we should also recognize that many of the sentences in that article are the result of thousands of thousands of hours of work by many experts in this field.

Oh, and also, Jonah Lehrer pretty much had it right in the first place. Which he does most of the time, in the limited amount of space he has for a newspaper column. I remain a fan.

A little postscript: Someone pointed out that I was too hard on Freed, who was gracious in engaging his commenters, and offered a good model for how a scientist reads a paper. He also pointed out that Lehrer brought up the wisdom of crowds paper as a way to join the whole "internet is making us stupid" crowd, which I agree is not true. I am not an Eagleman/Kelly/Shirky, "the internet is going to save civilization" optimist either. But I find many of the traditional journalists decrying the internet (twitter, blogs etc) mostly unconvincing. He also pointed out that this was old news (like two weeks ago!) and that there was a lot to it that I missed. So, now I am on twitter. We'll see how that goes.

No comments: