Some Bad Math

Mark Chu-Carroll runs a blog called Good Math, Bad Math and generally he exposes instances of bad math. However, in regards to use statistics Mark gives example of some bad math in two posts. It all started with a post about the Duke rape case and how on another blog a commenter was using some crime data that was dubious in nature to argue that there was a good chance the accuser is lying. Mark persuasively argued why there are problems with that data (see the first post linked above). The problem is that Mark then went on to make some inaccurate statements about how we can use statistics. Namely,

Looks pretty reasonable to most people, which is why I'm writing this post. The above is simply false at least if we take a Bayesian view of statistics. Consider the follwing information on breast cancer and mamograms and a question.

Percentage of women over 40 who have breast cancer: 1%

80% of women age forty with breast cancer who get a mamogram will get a positive result.

9.6% of the women (age forty) who get a mamogram without breast cancer will also get a positive test.

You (or your wife and who just so happens to be 40) have (has) gotten a positive mamogram test for breast cancer; what is the probability your wife has breast cancer?

The answer is 7.8% The result follows from an application of Bayes theorem, which Bayesian statisiticians view as an important way of evaluating data and making inferences. Mark's position is that we can't apply the population number (1%) to an individual and come up with the new answer of 7.8%.

In comments Mark has argued about that we can only make probability statements like the above is we have a measure of uncertainty. Frankly, I'm not even sure what that is. The probabilities above are statements of uncertainty, and we don't need some extraneous measure hanging around gumming up the works.

Here is another way of looking at this. Suppose Mark and I are out for a walk in New York city and come across a couple of games of chance. Suppose further, that we know that such games are almost always unfair. We have plenty of experience with this to set an initial probability that the game is unfair at 80%. Then we observe the following sequence of outcomes of the game (W = win, L = Loss) for the supposed "Mark":

Based on this, and our population statistics, (with the additional assumption that if the game is rigged the "Mark" will only win 25% of the time), then we can conclude with a probability of 97.9% that the game is actually fair. Even if that initial probability is wildly innaccurate it isn't nearly as devastating for the Bayesian as enough data will swamp that initial probability assessment. For example, if we take the same situation, but this time with the initial probability that only 10% of such games are rigged and a sequence of events,

Our final probability assessment would be 97.2% that the game is actually rigged. So even though I might have a wildly optimistic (and wrong) initial probability assessment that such games are fair enough (accurate) data can overcome this initial "bias" and lead us to a much more accurate estimate.

One commenter put it in a much more amusing manner. Suppose we want to know the probability that a randomly selected individual has a penis. A probability of 50% would be a good starting point. If we then find out that the randomly selected person is a male, we'd revise our initial probability assessment to 100% (of course, using Bayes theorem for this is like driving a thumb tack with a sledge hammer).

So the bottom line is yes, we can use statistics to make probability statements about individuals of that population. The more data we get about the individual we are interested in, the better our statements can be. And since probability statements about individuals are a form of reasoning, Mark Chu-Carroll is simply wrong in his view point, at least from the Bayesian viewpoint. Since his blog is about good math and bad math, he should ideally point this out.

Life expectancy

Some Bad Math