Bayes law, Nate Silver and voodoo economics

Written by Michael Roberts

Thursday, 22 November 2012 02:54

Nate Silver is the new hero of the liberal left in the US. This mathematician and statistician correctly forecast Obama’s victory in the presidential election and in the Senate and the result for the electoral college in all 50 states. On the morning of the 6 November 2012, the final update of Silver’s model gave President Barack Obama a 90.9% chance of winning a majority of the 538 electoral votes. Both in summary tables and in an electoral map, Silver forecast the winner of each state. Silver’s model correctly predicted the winner of every one of the 50 states. In contrast, individual pollsters were less successful. For example, Rasmussen Reports, widely quoted by the right-wing “missed on six of its nine swing-state polls”.

Silver has now published a new book that is already a best seller and he now regularly appears on TV talk shows. Silver brilliantly exposed the biased commentaries of the right-wing TV channels and papers whose pundits regularly appeared on screen or in print to say that they ‘had a hunch’ that Romney would win or that the polls were ‘biased’ against the Republican candidates. Silver, in the meantime, quietly presented a statistical analysis of the polls and concluded the probability of Obama winning was over 80% and rising. His forecast was dead right. On November 12th, his new book, The Signal and the Noise (print edition) was named Amazon’s Best Book of the Year for 2012.

The evidence is that statistical analysis is way better at forecasting things than ‘hunches’ or human intuition. Indeed, out of the one hundred studies comparing the accuracy of actuarial statistics (probability analysis) and intuition, there has not been one case humans doing better (Stuart Sutherland, Irrationality, p200). Indeed, in most studies, actuarial analysis was way better. Take bank loans, nowadays 90% of loan applications are reviewed by computers taking into account client details against aggregate evidence on bank accounts, jobs etc to gauge risk. Loans granted by computer using statistical probabilities turn out to have far less defaults than those borrowers chosen by bankers on their own judgement. Insurance companies have applied to risk in life expectancy and accidents for many years. So when somebody tells you that their intuition delivers better results, they are talking out of their hats. Why would you not choose statistical methods to raise your chances of getting things right even if nothing is 100% certain?

Take the stock market. We are continually told in investment adverts by expensive investment advisers that they can make your money work for you more than just tracking a stock index, like the S&P-500. In other words, they can beat the market. But a host of statistical studies prove the opposite. Sure, some advisers can do better than the index for a few years, but eventually, they all come a cropper. It’s just so much snake oil voodoo investing.

But everything is not entirely random. If you were to read Nicholas Taleb’s book, Black Swan (see my book, The Great Recession, chapter 31), you would think that it was. Or to be more exact, even the most unlikely can happen under the law of chance. It was assumed that there were only white swans until Europeans got to Australia and found black ones. It was the ‘unknown unknowns’, to quote Bush’s neo-con Secretary of State, Donald Rumsfeld. The most unlikely can happen but you cannot know everything. For Taleb, the Great Recession was one such event that could not have been predicted and therefore bankers, politicians and above all, economists are not at fault. This was the excuse used by bankers when giving evidence to the US Congress and to the UK parliament.

But modern statistical methods do have predictive power – all is not random. In his book, Silver offers detailed case studies from baseball, elections, climate change, the financial crash, poker and weather forecasting. Using as much data as possible, statistical techniques can provide degrees of probability, like “the probability of Obama winning the electoral college is 83% and the probability of him winning the popular vote is 50.1%”. This is different from much statistical method in colleges and universities today that rely on idealized modelling assumptions that rarely hold true. Often such models reduce complex questions to overly simple “hypothesis tests” using arbitrary “significance levels” to “accept or reject” a single parameter value. In contrast, the practical statistician needs a sound understanding of how baseball, poker, elections or other uncertain processes work, what measures are reliable and which not, what scales of aggregation are useful, and then to utilize the statistical tool kit as well as possible. You need extensive data sets, preferably collected over long periods of time, from which one can then use statistical techniques to incrementally change probabilities up or down relative to prior data.

This is the modern form of what is called the Bayesian approach, named after the 18th century minister Thomas Bayes who discovered a simple formula for updating probabilities using new data. The essence of the Bayesian approach is to provide a mathematical rule explaining how you should change your existing beliefs in the light of new evidence. In other words, it allows scientists to combine new data with their existing knowledge or expertise.

What constitutes Bayes approach that led to Nate Silver’s accurate forecasts? Let me try and explain as best I can, using the help of examples provided by Eliezer Yudkowsky in his excellent blog (http://yudkowsky.net/).

Suppose it is an established fact through other studies that 1% of women at age forty who participate in routine screening have breast cancer. Second, 80% of women with breast cancer will get positive mammographies. But 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group has a positive mammography in a routine screening. What is the probability that she actually has breast cancer? The correct answer is 7.8%, obtained as follows: out of 10,000 women, 100 have breast cancer; 80 of those 100 have positive mammographies. From the same 10,000 women, 9,900 will not have breast cancer and of those 9,900 women, 950 will also get positive mammographies. This makes the total number of women with positive mammographies 950+80 or 1,030. Of those 1,030 women with positive mammographies, 80 will have cancer. Expressed as a proportion, this is 80/1,030 or 0.07767 or 7.8%. So the answer is not 1% who do have cancer or the 80% with a positive mammo.

The original proportion of patients with breast cancer is known as the prior probability. The chance that a patient with breast cancer gets a positive mammography and the chance that a patient without breast cancer gets a positive mammography are known as the two conditional probabilities. Collectively, this initial information is known as the priors. The final answer – the estimated probability that a patient has breast cancer, given that we know she has a positive result on her mammography – is known as the revised probability or the posterior probability. The mammography doesn’t increase the probability that a positive-testing woman has breast cancer by increasing the number of women with breast cancer – of course not; if mammography increased the number of women with breast cancer, no one would ever take the test! However, requiring a positive mammography is a membership test that eliminates many more women without breast cancer than women with cancer. The number of women without breast cancer diminishes by a factor of more than ten, from 9,900 to 950, while the number of women with breast cancer is diminished only from 100 to 80. Thus, the proportion of 80 within 1,030 is much larger than the proportion of 100 within 10,000. The evidence of the positive mammography slides the prior probability of 1% to the posterior probability of 7.8%.

Actually, priors are true or false just like the final answer – they reflect reality and can be judged by comparing them against reality. For example, if you think that 920 out of 10,000 women in a sample have breast cancer and the actual number is 100 out of 10,000, then your priors are wrong. In this case, the priors might have been established by three studies – a study on the case histories of women with breast cancer to see how many of them tested positive on a mammography, a study on women without breast cancer to see how many of them test positive on a mammography, and an epidemiological study on the prevalence of breast cancer in some specific demographic.

Let’s say you’re a woman who’s just undergone a mammography. Previously, you figured that you had a very small chance of having breast cancer; we’ll suppose that you read the statistics somewhere and so you know the chance is 1%. When the positive mammography comes in, your estimated chance should now shift to 7.8%. There is no room to say something like, “Oh, well, a positive mammography isn’t definite evidence, some healthy women get positive mammographies too. I don’t want to despair too early, and I’m not going to revise my probability until more evidence comes in. Why? Because I’m an optimist.” And there is similarly no room for saying, “Well, a positive mammography may not be definite evidence, but I’m going to assume the worst until I find otherwise. Why? Because I’m a pessimist.” Your revised probability should go to 7.8%, no more, no less.

What’s so great about Bayes’ theorem is that it can be used for reasoning about the physical universe. But I think Bayes law also shows two other things that are useful to remember in economic analysis.

The first is the power of data or facts over theory and models. Neoclassical mainstream economics is not just voodoo economics because it is ideologically biased, an apology for the capitalist mode of production. But in making assumptions about individual consumer behaviour, about the inherent equilibrium of capitalist production etc, it is also based on theoretical models that bear no relation to reality: the known facts or priors. In contrast, a scientific approach would aim to test theory against the evidence on a continual basis, not just to falsify it (as Karl Popper would have it) but also to strengthen its explanatory power – unless a better explanation of the facts comes along. Newton’s theory of gravity explained very much about the universe and was tested by the evidence, but then Einstein’s theory of relativity came along and better explained the facts (or widened our understanding to things that could not be explained by Newton’s laws). This approach using statistical methods like Bayes law is what mainstream economics does not do.

The second thing we can glean from the use of Bayes law and Nate Silver’s results is the power of the aggregate. The best economic theory and explanation comes from looking at the aggregate, the average and its outliers. Data based on a few studies or data points provide no explanatory power. That may sound obvious but it seems that many political pundits were prepared to forecast the result of the US election based on virtually no aggregated evidence. It’s the same with much of economic forecasting. Sure, what happened in the past is no certain guide to what may happen in the future, but aggregated evidence over time is a hell of sight better than ignoring history.

JComments

THE NEW STREAMLINED RSN LOGIN PROCESS: Register once, then login and you are ready to comment. All you need is a Username and a Password of your choosing and you are free to comment whenever you like! Welcome to the Reader Supported News community.