The following is a back of the
envelope analysis that attempts to shed some light on why theories that make
correct predictions enhance their chances of being right.
Before I proceed, however, I
must add this caveat. It is not always possible to use our theoretical constructions in a way that makes predictions; historical theories in particular
are not easy to test at will and sometimes we have little choice but to come to
terms with the post-facto fitting of a theory to the data samples we have in
hand. In fact with grand theories that attempt to embrace the whole of life with a world view synthesis, abduction and retrospective “best fit” analysis be may be
the only epistemic option available. If we are dealing with objects whose
complexity and level of accessibility make prediction impossible, then this has
much less to do with “bad science” than it has with an ontology that is not
readily amenable to the scientific epistemic.
However, in this post I'm going
to look at the case where predictive testing is assumed to be possible and show
why there is a scientific premium on it. To this end I'm going to use a simple
illustrative model: Credit card numbers. Imagine that valid credit card numbers
are created with an algorithm that generates a very small fraction of the
numbers available to, say, a twenty digit string. Let us imagine that someone
claims to know this algorithm. This person’s claim could be put to the test by
asking him/her to predict a valid credit card number, or better a series of
numbers. If this person repeatedly gets the prediction right then we will
intuitively feel that (s)he is likely to be in the know. But why do we feel
that? Is there a sound basis for this feeling?
I’m going to use Bayes' theorem
to see if it throws any light on the result we are expecting – that is, that
there is a probabilistic mathematical basis for the intuition that a set of correct
predictions increases the likelihood that we are dealing with an agent who knows
the valid set of numbers.
In this post I derived Bayes' theorem from
frequency concepts. In this second post I considered an example
taken from the book “Reason and Faith” by Forster and Marsden where they use
Bayes' theorem to derive the probability
of God. There are certainly issues with the interpretation of the terms used by
Forster and Marsden, and, as noted in my post, which compromised the meaningfulness
of their result. The current problem has many similarities with F&M's “probability
of God” calculation, but in this more mundane application of Bayes' theorem the terms are
less cloudy in meaning. Both the Venn diagram and the mathematics used in my
previous posts can be taken off the peg, and Forster and Marsden’s terms
reinterpreted.
Being a visual person I like
Venn diagrams: As soon as we draw a Venn diagram representing a situation it
becomes a lot clearer, at least in my opinion. The Venn diagram I'll be using is taken from my "probability
of God post" and is shown below:
The outer circle represents all
the agents who potentially could attempt to predict the correct numbers. The
set G is the subset of agents who genuinely (hence “G”) know these numbers say
via an algorithm in their possession.
The set H is the set of agents who actually makes correct predictive
hits (hence “H”). Using a frequentist
interpretation of probability we can turn the frequency values represented by
the above Venn diagram into probabilities simply by turning them into
fractions. For example, we are interested in the probability P(G|H), that is the
probability of G given H. If the Venn diagram is drawn to scale
then this probability will be equal to the fraction of points in the area H that have the property G.
The left hand side of this last
equation (5) gives us the probability we are interested in, P(G|H); that is, the probability we
have an agent who knows the credit card
algorithm given that this agent has hit the right number or a set of right
numbers. As you can see from equation 5, just how close P(G|H) is to unity depends entirely on the relative magnitudes of the two terms in the
denominator on the right-hand side of equation 5. If the second term in the
denominator is very small in relation to the first term, P(G|H) becomes nearly unity; in this case there is then a high probability that our agent is in
possession of the facts! So, expressed mathematically the condition that interests
us is:
P(H|G)
P(G)
>> P(H|!G) P(!G)
(6)
Under what
conditions might the forgoing expression hold? Comparing similar terms on
respective sides of the above inequality it seems that P(G) is
very likely going to be smaller than P(!G); in most circumstances agents in the know are
going to be in a minority. That is
P(G) < P(!G)
(7)
But it is when we compare P(H|G)
and P(H|!G) that we find a
basis of support for the intuition that an agent who gets hits is likely to be
an agent in the know. In many predictive cases P(H|!G) is exceedingly
small; for example, if the set of valid credit card numbers is small the chance
of an unknowing agent hitting several right numbers in succession is negligible, whereas in comparison the chance of a knowing agent who pro-actively makes
predictions getting a succession of hits may be near unity. It doesn't follow
from the latter comment that P(H|G) is itself is near unity
because some knowing agents may not choose to use their predictive powers. Even so, it is likely that
a knowing agent is going to make use of his knowledge and therefore P(H|G)
could
be significant, especially when compared to P(H|!G). Therefore in this instance we
conclude:
P(H|G)
>> P(H|!G)
(8)
…in which case inequality (6) is
a likely scenario. Therefore this implies that P(G|H) is near
unity, justifying the feeling that a succession of successful predictions
implies a knowing agent.
Caveat: It is possible, however,
to imagine circumstances where the inference that a series of hits implies a
knowing agent is false. For example, knowing agents may, for some reason, choose to make very
few predictions thus implying a small value of P(H|G). Or perhaps P(H|!G) is not
negligible because nearly all the possible numbers are valid. So, our intuition that a successful predictor
implies a knowing agent has more the character of a heuristic than an
absolutely true rule. This heuristic is based on the likely scenario embodied
in inequalities (7) and (8). However, drawing up
a version of Bayes' theorem that embraces all possible scenarios is probably
futile because that would be inclusive of a maximum disorder cosmos and in such
a cosmos prediction is not going to work.
An example of a situation where,
in spite of successful hits, it is not clear whether we are dealing with a
genuine predictive agent is Bode’s law : This law predicted the
asteroid belt and the orbit of Uranus, but broke down for Neptune. The Orbits
of Mercury Venus, Earth, Mars and Jupiter and Saturn were fitted retrospectively
by juggling the variables of the equation. Reading the corresponding Wiki
article it seems that even today there
is no consensus as to whether or not “Bode’s Law” is just an accident.
No comments:
Post a Comment