Sunday, July 07, 2013

The Premium on Prediction

The following is a back of the envelope analysis that attempts to shed some light on why theories that make correct predictions enhance their chances of being right.

Before I proceed, however, I must add this caveat. It is not always possible to use our theoretical constructions in a way that makes predictions; historical theories in particular are not easy to test at will and sometimes we have little choice but to come to terms with the post-facto fitting of a theory to the data samples we have in hand. In fact with grand theories that attempt to embrace the whole of life with a world view synthesis, abduction and retrospective “best fit” analysis be may be the only epistemic option available. If we are dealing with objects whose complexity and level of accessibility make prediction impossible, then this has much less to do with “bad science” than it has with an ontology that is not readily amenable to the scientific epistemic.

However, in this post I'm going to look at the case where predictive testing is assumed to be possible and show why there is a scientific premium on it. To this end I'm going to use a simple illustrative model: Credit card numbers. Imagine that valid credit card numbers are created with an algorithm that generates a very small fraction of the numbers available to, say, a twenty digit string. Let us imagine that someone claims to know this algorithm. This person’s claim could be put to the test by asking him/her to predict a valid credit card number, or better a series of numbers. If this person repeatedly gets the prediction right then we will intuitively feel that (s)he is likely to be in the know. But why do we feel that? Is there a sound basis for this feeling?

I’m going to use Bayes' theorem to see if it throws any light on the result we are expecting – that is, that there is a probabilistic mathematical basis for the intuition that a set of correct predictions increases the likelihood that we are dealing with an agent who knows the valid set of numbers.

In this post I derived Bayes' theorem from frequency concepts. In  this second post I considered an example taken from the book “Reason and Faith” by Forster and Marsden where they use Bayes' theorem  to derive the probability of God. There are certainly issues with the interpretation of the terms used by Forster and Marsden, and, as noted in my post, which compromised the meaningfulness of their result. The current problem has many similarities with F&M's “probability of God” calculation, but in this more mundane application of Bayes' theorem the terms are less cloudy in meaning. Both the Venn diagram and the mathematics used in my previous posts can be taken off the peg, and Forster and Marsden’s terms reinterpreted.

Being a visual person I like Venn diagrams: As soon as we draw a Venn diagram representing a situation it becomes a lot clearer, at least in my opinion. The Venn diagram I'll be using is taken from my "probability of God post" and is shown below:

The outer circle represents all the agents who potentially could attempt to predict the correct numbers. The set G is the subset of agents who genuinely (hence “G”) know these numbers say via an algorithm in their possession.  The set H is the set of agents who actually makes correct predictive hits (hence “H”).  Using a frequentist interpretation of probability we can turn the frequency values represented by the above Venn diagram into probabilities simply by turning them into fractions. For example, we are interested in the probability P(G|H), that is the probability of G given H. If the Venn diagram is drawn to scale then this probability will be equal to the fraction of points in the area that have the property G.

Below is a derivation from Bayes theorem of an equation I will be using.

The left hand side of this last equation (5) gives us the probability we are interested in, P(G|H); that is, the probability we have  an agent who knows the credit card algorithm given that this agent has hit the right number or a set of right numbers. As you can see from equation 5, just how close P(G|H)  is to unity depends entirely on the relative magnitudes of the two terms in the denominator on the right-hand side of equation 5. If the second term in the denominator is very small in relation to the first term, P(G|H) becomes nearly unity; in this case there is then a high probability that our agent is in possession of the facts! So, expressed mathematically the condition that interests us is:

P(H|G) P(G)  >> P(H|!G) P(!G)       

Under what conditions might the forgoing expression hold? Comparing similar terms on respective sides of the above inequality it seems that P(G) is very likely going to be smaller than P(!G); in  most circumstances agents in the know are going to be in a minority.  That is

P(G) < P(!G)

But it is when we compare P(H|G) and P(H|!G) that we find a basis of support for the intuition that an agent who gets hits is likely to be an agent in the know. In many predictive cases P(H|!G) is exceedingly small; for example, if the set of valid credit card numbers is small the chance of an unknowing agent hitting several right numbers in succession is negligible, whereas in comparison the chance of a knowing agent who pro-actively makes predictions getting a succession of hits may be near unity. It doesn't follow from the latter comment that P(H|G) is itself is near unity because some knowing agents may not choose to use their predictive powers. Even so, it is likely that a knowing agent is going to make use of his knowledge and therefore P(H|G)  could be significant, especially when compared to P(H|!G). Therefore in this instance we conclude:

P(H|G)  >> P(H|!G)

…in which case inequality (6) is a likely scenario. Therefore this implies that P(G|H) is near unity, justifying the feeling that a succession of successful predictions implies a knowing agent.

Caveat: It is possible, however, to imagine circumstances where the inference that a series of hits implies a knowing agent is false. For example, knowing agents may, for some reason, choose to make very few predictions thus implying a small value of P(H|G).  Or perhaps P(H|!G) is not negligible because nearly all the possible numbers are valid.  So, our intuition that a successful predictor implies a knowing agent has more the character of a heuristic than an absolutely true rule. This heuristic is based on the likely scenario embodied in inequalities (7) and (8). However, drawing up a version of Bayes' theorem that embraces all possible scenarios is probably futile because that would be inclusive of a maximum disorder cosmos and in such a cosmos prediction is not going to work.

An example of a situation where, in spite of successful hits, it is not clear whether we are dealing with a genuine predictive agent is Bode’s law : This law predicted the asteroid belt and the orbit of Uranus, but broke down for Neptune. The Orbits of Mercury Venus, Earth, Mars and Jupiter and Saturn were fitted retrospectively by juggling the variables of the equation. Reading the corresponding Wiki article it seems that even today  there is no consensus as to whether or not “Bode’s Law” is just an accident.

No comments: