Wednesday, February 05, 2020

Breaking Through the Information Barrier in Natural History Part 3



(See here for part 1 and here for part 2)

The Limitations of Shannon Information.

In this series I am looking at the claim by de facto Intelligent Design proponents that information is conserved; or more specifically that something called "Algorithmic Specified Information" is conserved. 

If we use Shannon's definition of information, I =  −log2(p(x)) where x = a configuration such as a string of characters, then we saw in part 1 that a conservation of information does arise in situations where computational resources are conserved; that is, when parallel processing power is fixed.


The conservation of information under fixed parallel processing becomes apparent when we deal with conditional probabilities, such as the probability of life, q, given the context of a physical regime which favours life. We assume that this physical regime favours the generation of life to such an extent that q is relatively large thus implying life has a relatively low Shannon information value. But it turns out that in order to return this low information value, we find that the probability of the physical regime must be very low, implying that a low information value for life is purchased at the cost of a high information physical regime. In part 2 I found at least three issues with this beguiling simple argument about the conservation of information. Viz:

1. The Shannon information function is a poor measure of complex configurational information; a simple outcome like say the information associated with the probability of finding a hydrogen atom at a point in the high vacuum of space is large and yet the simplicity of this event belies its large information value. Hence, Shannon Information is not a good measure of the information if we want to reflects object complexity.

2. Shannon information is a measure of subjective information; hence once an outcome is known, however complex it may be, it loses all its Shannon information and is therefore not conserved.

3. The Shannon information varies with number of parallel trials; that is, it varies with the availability of parallel processing resources and therefore on this second count Shannon Information is not conserved. This fact is exploited by multiverse enthusiasts who attempt to bring down the surprisal value of life by positing a multiverse of such great size that an outcome like life is no longer a surprise. (Although they might remain surprised that there is such an extravagant entity as the multiverse in the first place!)


Algorithmic Specified Information

In part 2 I introduced the concept of  "Algorithmic Specified Information" (ASC)  which is defined by the IDists Nemati and Holloway as (N&H):

ASC(x, Cp) := I(x) − K(x|C). 
1.0

Where: 
1. x is a bit string generated by some stochastic process, 
2. I(x) is the Shannon surprisal of x, also known as the complexity of x, and 
3. K(x|C) is the conditional algorithmic information of x, also known as the specification

And where: 
 I (x) = −log2(p(x) ), where p is the probability of string x.

As we shall see ASC has failings of its own and shares with Shannon information a dependence on computational resources. As we saw in part 2 definition 1.0 is proposed by N&H in a attempt to provide a definition of information that quantifies meaningful information. For example, a random sequence of coin tosses contains a lot of Shannon information but it is meaningless. As we saw N&H try to get round this by introducing the second term on the right-hand side of 1.0, a conditional algorithmic information term; the "meaning" behind x is supposed to lie in the library C, a library which effectively provides the esoteric context and conditions needed for K(x|C) to work and give "meaning" to x. "Esoteric" is the operative word here: De facto ID has a tendency to consecrate "intelligence" by consigning it to a sacred realm almost beyond analysis. But they've missed a trick: As us operators of intelligence well know meaning is to be found in purpose, that is in teleology. As we shall see, once again de facto ID has missed the obvious; in this case by failing to incorporate teleology into their understanding of intelligence and instead vaguely referring it to some "context" C where they feel they needn't investigate it further.

Ostensibly, as we saw, the definition of ASC appears to work at first: Viz: Bland highly ordered configurations have little ASC and neither do random sequences. But as we will see this definition also fails because we can invent high ASC situations that are meaningless. 




The limitations of ASC

As we saw in  part 2 ASC does give us the desired result for meaningful information; but it also throws up some counter intuitive outcomes along and some false positives. As we will see it is not a very robust measure of meaningful information. Below I've listed some of the situations where the definition of ASC fails to deliver sensible results although in this post my chief aim is to get a handle on the conservation question. 

1. ASC doesn't always register information complexity

A simple single binary yes/no event may be highly improbable and therefore will return a high Shannon complexity. That is, the first term on the right hand side of 1.0 will be high.  Because x is a single bit string it is going to have very low conditional algorithmic complexity; therefore the second term on the right hand side of 1.0 will be low. Ergo 1.0 will return a high value of ASC in spite of the simplicity of the object  in question.  This is basically a repeat of the complexity issue with plain Shannon information 

I suppose it is possible to overcome this problem by insisting that ASC is really only defined for large configurations.


2. ASC doesn't always register meaningfulness 

Imagine a case where the contents of a book of random numbers are read out to an uninitiated observer in the sequence the numbers appear in the book. For this uninitiated observer who subjectively sees a probabilistic output the number string x will return a high value of I(x). But we now imagine that C in K(x|C) contains in its library a second book of random numbers and that K measures measures the length of an algorithm which carries out a simple encryption of this second book into a superficially different form, a form which in fact comprises the first book of random numbers as read out to the observer. In this case K would be relatively small and therefore 1.0 will then return a high value of ASC in spite of the fact that the string x is derived from something that is random and meaningless. 

We can symbolise this situation as follows:


ASC(xn, xn+1p) := I(xn) − K(xn|xn+1). 
2.0

Here the library consists of the second book of random numbers and is represented by xn+1  This second book is used to derive the first book of random numbers, xn, via a simple encryption algorithm. Hence K(xn|xn+1) will be relatively small. Because from the point of view of the observer concerned I(xn) is very high then it follows that an otherwise meaningless string has a high ASC. 


3. ASC isn't conserved under changing subjective conditions
In 1.0 I(x), as I've already remarked, is sensitive to the knowledge of the observer; conceivably then it could go from a high value to zero once the observer knows the string.

This particular issue raises a question: That is, should I and K be mixed as they are in 1.0? They are both expressed as bit counts but they actually measure very different things. K(x|C) measures an unambiguous and objective configurational property of the string x whereas I measures a property of a string which is dependent on changeable observer information. 


4. Unexpected high order returns high ASC.
Take the case where we have a random source and just by chance it generates a long uniform string (e.g. a long sequence of heads). This string would clearly have very high surprisal value (that is a high I) and yet such a highly ordered sequence has the potential for a very low value of conditional algorithmic complexity thus resulting in a high value of ASC. However, this wouldn't happen often because of the rarity of simple order being generated by a random source. To be fair N&H acknowledge the possibility of this situation arising (rarely) and cover it by saying that ASC can only be assigned probabilistically. Fair comment!

5. ASC is only conserved under conserved computational resources 
We will briefly look at this issue after the following analysis, an analysis which sketches out the reasons why under the right conditions ASC is conserved.


The "Conservation" of ASC

In spite of the inadequacies of ASC as a measure of meaningful (purposeful) information ASC, like Shannon information, is conserved; but only under the right subset of conditions. Below is a sketchy analysis of how I myself would express this limited conservation.

We start with the definition of ASC  for a living configuration L. Equation 1.0 becomes:


ASC(L, C) := I(L) − K(L|C)
3.0

If we take a living organism then it is clear that the absolute probability of this configuration (that is, the probability of it arising as a single chance) will be very low. Therefore I(L) will be high. But for L to register with a high ASC the conditional algorithmic complexity term on the right hand side of 3.0 must be low. This can only happen if the algorithmic complexity of life is largely embedded in C; conceivably C could be a library, perhaps even some kind of DNA code and the algorithm whose complexity is quantified by K(L|C) is the shortest algorithm needed to define living configurations from this library. Thus the high ASC value of life depends on the existence of C. But C in turn will have an algorithmic complexity whose smallest value is limited by:


K(L) < K(L|C) + K(C)


4.0

...where K(L) is the absolute algorithmic complexity of life, K(L|C) is the conditional algorithmic complexity of life and K(C) is the absolute algorithmic complexity of the library C. This inequality relationship follows because the opposite would mean that K(L) is not the shortest possible algorithm for L. Notice in 4.0 I'm allowing C to be expressed in algorithmic terms; but I have to allow that as de facto IDists prefer to hallow the concept of intelligence and place it beyond intellectual reach they might actually object to this manoeuvre!

The library C could contain redundant information. But if we take out the redundancy and only include all that is needed for a living configuration then we expect 4.0 to become, at best, an equality:



K(L) = K(L|C) + K(C)
5.0


Equation 5.0 tells us that the absolute algorithmic complexity of L is equal the sum of the conditional algorithmic complexity of L and the absolute algorithmic complexity of the minimal library C. Therefore since K(L) is high, and K(L|C) is low then it follows that K(C) is high. From 5.0 it is apparent that the low conditional algorithmic complexity of life is bought at the price of the high absolute algorithmic complexity of C. The high absolute algorithmic complexity of L is a constant, and this constant value is spread over the sum on the right hand side of 5.0.  The more "heavy lifting" done by the library C the greater the ASC value and hence, according to N&H, the more meaningful the configuration.  

Relationship 5.0 is analogous to the equivalent Shannon relationship, 4.0, in part 1 where we had:


I(p) = I(q) + I(r)
5.1

In this equation the very small absolute probability of life, p, means that its information value I(p) is high. Hence, if the relative conditional probability of life q is low it can only be bought at the price of a very improbable r, the absolute probability of the physical regime. Hence if I(q) is relatively low then I(r) will be high.

Relationship 5.0 is the nearest thing I can find to an ASC "conservation law". Rearranging 5.0 gives:


K(L|C) = K(L− K(C)


6.0
Therefore 3.0 becomes:

ASC(L, C) = I(L)  K(L + K(C)


7.0


From this expression we see that ASC increases as we put more algorithmic complexity into C. That is, it becomes more meaningful according to N&H. ASC is maximised when C holds all the algorithmic complexity of L. Under these conditions K(L) + K(C) cancels to zero. Realistic organic configurations are certainly complex and yet at the same time far from maximum disorder. Therefore we expect K(L) will be a lot less than I(L). Because life is neither random disorder nor crystalline order K(L) will have a value intermediate between the maximum algorithmic complexity of a highly disordered random configuration and the algorithmic simplicity of simple crystal structures. In order to safeguard C from simply being a complex but meaningless random sequence  - which as we have seen is one way of foiling the definition of ASC - we could insist that for meaningful living configurations we must have:

 K(L) ~ K(C)
8.0

...which when combined with 6.0 it follows that K(L|C) in 5.0 is relatively small; that is K(L|C) returns a relatively small bit length and represents a relatively simple algorithm. This looks a little bit like the role of the cell in its interpretation of the genetic code: Most of the information for building an organism is contained in its genetics and this code uses the highly organised regime of a cell to generate a life form. Therefore from 7.0 it follows that the bulk of the ASC is embodied in the conserved value of I(L) (But let's keep in mind that I(L) is conserved only under a subset of circumstances). 



***

For a random configuration, R equation 1.0 becomes:


ASC(R, C) := I(R) − K(R|C)
9.0
For R the equivalent of relationship 4.0 is:





 K(R) < K(R|C) + K(C)


10.0
If C holds the minimum information needed for generating R  then:


 K(R) = K(R|C) + K(C)


11.0
Therefore:
  K(R|C) = K(R) − K(C)


12.0

Therefore substituting 12.0 into 9.0 gives

ASC(R) = I(R) − K(R) + K(C)

10.0

Since a random configuration is not considered meaningful then the minimal library C must be empty and hence  K(C) = 0.   Therefore 10.0 becomes:

ASC(R) = I(R) − K(R)

11.0

Now, for the overwhelming majority of configurations generated by a random source the value K(R) is at a  maximum and equal to the length of R unless by a remote chance R just happens to be ordered - which is a possible albeit a highly improbable outcome for a random source. Since for the overwhelming number of randomly sourced configurations I(R) = K(R) then ASC(R) will very probably be close to zero. Hence configurations sourced randomly will likely produce an ASC of zero and will likely stay conserved at this value. 



Conclusion: Information can be created

The value of the function K(x) is a mathematically defined configurational property that is an unambiguous constant given a particular configuration; it also obeys the conservation law for minimised libraries. Viz:

K(x) = K(x|C) + K(C)

12.0
...here the algorithmic complexity of K(x) is shared between K(x|C) and K(C), although not necessarily equally of course. A similar conservation law (See 4.0 above) also holds for I(x) but as we saw in part 2 this conservation depends on the computational resources available; if there is parallelism in these resources the extent of parallelism will change the probability of x. The forgoing analysis of ASC very much depended on the assumption that I(x) and its implicit reference to the absolute probability of a configuration is a fixed quantity, an assumption that may not be true. So although K(x) is a fixed value for a given configuration this is not generally true of I(x). But, the trouble is, as we saw in part 1 the introduction of massively parallel computational resources raises the ugly spectre of the multiverse. As I hope will become clear, the multiverse is a consequence of the current intellectual discomfort with teleology given a contemporary world view which is unlikely to interpret expanding parallelism as an aspect of a cosmos where a purposeful declarative computational paradigm is a better fit than a procedural one; a declarative computation creates fields of "candidates", or if you like a temporary "multiverse" of tentative trials, but in due course clears away what ever is outside the purpose of the computation; it thereby creates information. In a straight multiverse paradigm nothing ever gets cleared away. 

Using ASC as an attempt to detect an intelligent designer is, it seems, far from robust. If used carefully with an eye on its fragility ASC can be used to detect the region between the high order of monotony and the high disorder of randomness; this is the region of  complex organisation which  admittedly is often associated with intelligent action. But really N&H are missing a trick: intelligence is overwhelmingly associated with purposeful goal seeking behaviour; in fact this a necessary condition of intelligence and needs to be incorporated into our "intelligence detection" methodology. 

Intelligence classifies as a complex adaptive system, a system which is selectively seeking and settling on a wide class of very general goals and is also rejecting outcomes that don't fulfill those goals. For me the notion of "specification" only makes sense in this teleological context and that's why, I suggest, the IDists have failed to make much sense of "specification"; in their efforts to define "specification" they are endeavouring to keep within the current "procedural" tradition of science by not acknowledging the role of a "declarative" teleology in intelligent action, in spite of the fact that it is clear that "intelligence" is all about selective seeking. To be fair, perhaps this is a consequence of de facto ID's general policy of restricting their terms of reference to that of intelligence detection and keeping away from exploring the nature of intelligence itself. When IDists do comment on the nature of intelligence they have a taste for highfalutin notions like incomputability and this only has the effect of making the subject of "intelligence" seem even more intractable and esoteric; but perhaps that is exactly their intention!

It is irony that atheist Joe Felsenstein, with his emphasis on population breeding and selection, is to my mind closer to the truth than many an IDist. The standard view of evolution, however, is that it is a biological trial and error breeding system: The information associated with failures will by and large get cleared from the system, a system which is working toward the goals defined by the spongeam (if it exists!),  an object which in turn is thought to be a product of procedural physics. In one sense the spongeam fulfills a similar role to the library C in the definition of relative algorithmic complexity.

The IDists are wrong; information can be created; This becomes clear in human thinking, computers and standard evolution: However, the stickler with the latter is that as it stands it is required to start from the ground up and probably (in my view) simply isn't powerful enough a system for generating life.....unless one accepts the existence of the huge burden of up-front information implicit in the spongeam. (whose existence I actually doubt). An alternative solution is to employ massive parallelism in order to solve the probability problem. But then the subject of the multiverse rears its ugly head....unless, like evolution, the system clears its huge field of failed trials and selects according to  some teleological  criterion thus collapsing the field of trials in favour of a small set of successful results. This is exactly what intelligent systems do and at the high level there is no real mystery here: Intelligent systems work by trial and error as they seek to move toward goals.

All this is very much a departure from the current computational paradigm which directs science: This sees the processes of physics as procedural non-halting algorithms that continue purposelessly forever. In such a goalless context "specification" is either meaningless or difficult to define. Without some a prior notion of purpose  "specification" is a very elusive idea.

I hope to continue to develop these themes in further posts and in particular develop the notion of "specification" that I started to define toward the end of the paper I've linked to in part 4 of my Thinknet project (See section 11). I also hope to be looking at the two blog posts by Joe Felsenstein on this subject (See here and here) which flatly contradict the ID contention that you can't create information without the presence of some esoteric object called "intelligence". And it looks as though he's probably right. 

Note:
Much of the conceptual background for this post has its foundations in my book on Disorder and Randomness

No comments: