Friday, January 24, 2020

5 way Mexican Standoff: Dawkins vs Weinstein vs Myers vs Nelson vs.....

(This post is still undergoing correction and enhancement)

I was fascinated by the complex mix of irreconcilable intellectual positions alluded to in this post on Pharyngula, the blog of evangelical atheist PZ Myers. He tells of a debate between uncompromising evolutionary aficionados (and atheists) Dick Dawkins and Bret Weinstein. The big event was convened by Travis Pangburn, himself  a conceited  egotist - at least according to PZ Myers. Myers is also no fan of Dawkins and even less so of Weinstein who is part of the "Dark Web" (or "Dork Web" as Myers calls it). The Dark Web (which includes Jordan Peterson) is a loosely affiliated group of intellectuals who are pressing forward with their evolutionary logic, particularly in evolutionary psychology, and drawing highly politically incorrect and nihilistic conclusions. This nihilism has even led to a figure like Peterson commenting positively on the social advantages of Christian theism, although this falls well short of outright belief.

Myers tells us that Intelligent Design guru Paul Nelson reviewed the debate here. Nelson's take on the event is that Weinstein "out Darwined" Dawkins in as much as Dawkins baulked at following Weinstein in working out the socio-biological 'ethics' of the ruthless adaptionist logic of evolution.  In fact Weinstein suggests that World War II can be explained in adaptionist and selectionist terms. Dawkins was loathe to be lead by Weinstein into this territory because, according to Nelson, that would go against the grain of today's milieu which is sensitive to anything which smacks of the nihilism of political incorrectness. That's ironic because Dawkins himself has clashed with this prevailing milieu. 

Not that I know much about the ramifications of adpationist selectionism, but within the constraints it imposes it is likely that it envelopes a vast range of apparent random/chaotic outcomes which have no rationale in terms of adaptation and selection: To me WWI and WWII look to be of that meaningless ilk. This chaotic meaninglessness gives plenty of room for atheists like Dawkins and Myers to back off from Weinstein's conclusions and claim that something like WWII has nothing to do with evolution but are epiphenomenal to human nature, foibles permitted, but not necessarily promoted within a Darwinian envelope. This is the permissive 'will' of evolution at work and it is therefore not 'responsible'!

However, among these protagonists I would probably agree with the general drift of Nelson's thoughts: Whether it's down to adaptionist logic or plain randomness and/or chaos, a purely secular picture of the cosmos conjures up a world of utter ruthless indifference to human affairs and one that has no necessary reason to favour morality of any kind no matter how strenuously espoused by atheists with moral sensibilities such as Dawkins and Myers. Weinstein, in pushing through with secularist logic and drawing very politically incorrect conclusions (even if his adaptionist "rationale" is fallacious), is nevertheless being guided by atheism's dangerous assignation with the nihilist abyss: Evolution, at least in the long term, is not a custodian with human interests at heart as H G Wells book "The Time Machine" makes clear. This is an uncomfortable conclusion for secular humanism: As PZ Myers himself once said, nature doesn't care about us, or for political correctness!

Wednesday, January 08, 2020

Breaking Through the Information Barrier in Natural History Part 2

(See here for part 1)

In this post on Panda's Thumb mathematical evolutionist Joe Felsenstein discusses the latest attempts by the de facto Intelligent Design community to cast doubt on standard evolution and to bolster their belief that "Intelligence" is the mysterious and "unnatural" a priori ingredient needed to bring about organic configurations. In this second part I will introduce the subject of "Algorithmic Specified Complexity" (ASC). The definition of this ID concept can be found in an ID paper linked to in Joe's post.  Joe's point is that whatever the merits or demerits of ASC it is irrelevant to evolution. That may be the case, but more about that in part 3. In this part I want to get to grips with the concept of ASC.

The definition of "Algorithmic Complexity" (i.e. without the "specified") is fairly clear; it is the length of the shortest program which will define an indefinitely long sequential configuration. For example if we have an indefinitely repeating sequence like 101010101... it is clear that very a short program will define it. e..g. For(Ever) { print 0, print 1}.  We can see that there are obviously relatively few short programs because a short program string admits relatively few permutations of the available character tokens. On the other hand there is obviously an enormous number of indefinitely long output strings and so it follows that the supply of short programs that can be written and mapped to indefinitely long strings soon runs out. Therefore the only way to accommodate all the possible strings is to allow the program length to also increase indefinitely. It turns out that if programs are to define all the possible output strings available then the program strings must be allowed to grow to the length of the output string.  Output strings which require a program of the same string length to define them are categorised as the class of random strings and of maximum algorithmic complexity. In my paper on Disorder and Randomness I explain why these random strings are of maximum disorder. However, some output strings can be defined with a program string that is shorter than the output string. When this is the case the output string is said to be randomly deficient and of less than maximum algorithmic complexity.

So far so good. But it is when we add that vague term "specified" between 'Algorithmic' and 'Complexity' that the sparks start to fly. What does 'specified' mean here? Joe Felsenstein says this of "Algorithmic Specified Complexity" (ASC):

Algorithmic Specified Complexity (ASC) is a use of Kolmogorov/Chaitin/Solomonoff (KCS) Complexity, a measure of how short a computer program can compute a binary string (a binary number). By a simple counting argument, those authors were able to show that binary strings that could be computed by short computer programs were rare, and that binary strings that had no such simple description were common. They argued that that it was those, the binary strings that could not be described by short computer programs, that could be regarded as "random".

ASC reflects shortness of the computer program. In simple cases, the ASC of a binary string is its "randomness deficiency", its length, n, less the length of the shortest program that gives it as its output. That means that to get a genome (or binary string) that has a large amount of ASC, it needs long string that is computed by a short program. To get a moderate amount of ASC, one could have a long string computed by medium-length program, or a medium-length string computed by a short program. Randomness deficiency was invented by information theory researcher Leonid Levin and is discussed by him in a 1984 paper (here). Definitions and explanations of ASC will be found in the papers by Ewert, Marks, and Dembski (2013), and Ewert, Dembski and Marks (2014). Nemati and Holloway have recently published a scientific paper at the Discovery Institute's house journal BIO-Complexity, presenting a proof of conservation of ASC. There has been discussion at The Skeptical Zone of the technical issues with ASC -- is it conserved or is it not? In particular, Tom English (here and here) has presented detailed mathematical argument at The Skeptical Zone showing simple cases which are counterexamples to the claims by Nemati and Holloway, and has identified errors in their proof. See also the comments by English in the discussion on those posts.

As far as my understanding goes Felsenstein has given us a definition of "Algorithmic Complexity"  and not "Algorithmic Specified Complexity", a notion which seems to be proprietary to de facto ID. So, in doubt I reluctantly turned to the scientific paper by Nemati and Holloway (N&H).  They define ASC as:

ASC(x, C, p) := I(x) − K(x|C). 

1. x is a bit string generated by some stochastic process, 
2. I(x) is the Shannon surprisal of x, also known as the complexity of x, and 
3. K(x|C) is the conditional algorithmic information of x, also known as the specification

Note: I (x) = −log2(p(x) ) where p is the probability of string x.

This definition is somewhat more involved than basic Algorithmic Complexity. In 1.0 ASC has been defined as the sum of I and K. Moreover, with ordinary algorithmic complexity K(x) represents the shortest program that will generate and define string x, but N&H have used the quantity K(x|C) which is the shortest program possible given access to a library of programming resources C. These resources could include data and other programs This library effectively increases the length of the program string and therefore an output string which are otherwise inaccessible to programs of a given length may then become accessible. But although a program using a library string can define output strings  unreachable to similar length programs without a library, the number of possible strings that can be mapped to is still limited by length of the program string,

The motives behind N&H's definition of ASC, and in particular the motive for using conditional algorithmic information, they put like this (my emphases):

We will see that neither Shannon surprisal nor algorithmic information can measure meaningful information. Instead, we need a hybrid of the two, known as a randomness deficiency, that is measured in reference to an external context.....

ASC is capable of measuring meaning by positing a context C to specify an event x. The more concisely the context describes the event, the more meaningful it is. The event must also be unlikely, that is, having high complexity I(x). The complexity is calculated with regard to the chance hypothesis distribution p, which represents the hypothesis that x was generated by a random process described by p, implying any similarity to the meaningful context C is by luck. ASC has been illustrated by its application to measure meaningful information in images and cellular automata .

The use of context distinguishes ASC from Levin’s generalized form of randomness deficiency in (8) and Milosavljevi´c algorithmic compression approach. The fundamental advantage is that the use of an independent external context allows ASC to measure whether an event refers to something beyond itself, i.e. is meaningful. Without the context, the other randomness deficiencies perhaps can tell us that an event is meaningful, but cannot identify what the meaning is.

Thus, ASC’s use of an independent context enables novel contextual specifications to be derived from problem domain knowledge, and then applied to identify meaningful patterns,such as identifying non-trivial functional patterns in the game of life

We can perhaps better understand N&H's motives for ASC if we consider the examples they give.  Take a process which generates a highly computable sequence like:


Presumably this highly ordered sequence reflects a process where the probability of generating an "A" at each 'throw' is 1. Although human beings often like to create such neat sequences so do the mindless processes of crystallisation; but for N&H crystallisation is hardly mysterious enough to classify as an intelligent agent.  N&H would therefore like to eliminate this one from the inquiry. Hence in equation 1 it is clear that for a sequence like 2.0 −log2(p(x)) = 0 and therefore although the program needed to print 2.0 is very short we will have K(x|C) > 0 and it follows by substitution into 1.0 that the ASC value associated with 2.0 is negative. i.e. low!

Now let's try the following disordered sequence which presumably is generated by a fully random process:


Here   −log2(p(x)) will be very high; on this count alone 3.0  contains a lot of information. But then the value of K(x|C) will also be high because for a truly random sequence even the limited resources C will be insufficient to successfully track a truly random source.  We therefore expect the algorithm needed to generate this kind of randomness to be at least as long as 3.0. Since  −log2(p(x))  will return a bit length of similar size to  K(x|C) then ASC ~ 0.

It follows then that  both 2.0 and 3.0 return low values of ASC as expected.

Let us now turn to that character string immortalised by those literary giants Bill Shakespeare and Dick Dawkins. Viz:


It is very unlikely, of course, that such a configuration as this is generated by a random process. For a start using my configurational concept of disorder texts by Shakespeare and Dawkins will not return a disordered profile. However, the probability of 4.0 appearing by chance alone is very small. Hence  −log2(p(x)) will be large. But K(x|C) will be small because it taps into a large library of mental processing and data resources. Therefore the net result is that equation 1.0 is then the sum of a large positive and a small negative and so voila! it returns is a high value of ASC, which is what we want; or shall I say it is what N&H want!


So having assumed they have arrived at a suitable definition of ASC N&H then go on to show that ASC is conserved.  But according to Joe Felsentsein:

There has been discussion at The Skeptical Zone of the technical issues with ASC -- is it conserved or is it not? In particular, Tom English (here and here) has presented detailed mathematical argument at The Skeptical Zone showing simple cases which are counterexamples to the claims by Nemati and Holloway, and has identified errors in their proof. See also the comments by English in the discussion on those posts.

I suspect however that in spite of the errors N&H have muddled through to the right mathematical conclusion.  But in any case Joe Felsenstein thinks the question is probably irrelevant because: 

...the real question is not whether the randomness deficiency is conserved, or whether the shortness of the program is conserved, but whether that implies that an evolutionary process in a population of genomes is thereby somehow constrained. Do the theorems about ASC somehow show us that ordinary evolutionary processes cannot achieve high levels of adaptation?

As a rule the de facto IDists have two motives: Firstly they want to stop evolution in its tracks because it is classified by them as a banal "natural" process and secondly they want to place the origins of complex adaptive organisms in the mysteries of mind which I suspect they believe to be a mysterious incomputable process. In a sense they seek a return to the mystique of vitalism.

But like Joe Felsenstein I find the question of the conservation of ASC, although for different reasons, irrelevant: At the moment it's looking to me as though ASC is both trivial and irrelevant for my purposes and simply leaves us with the same old questions. In the final analysis we find that information is not absolutely conserved because the so called "conservation of information" is not logically obliged but is a probabilistic result brought about by assuming a parallel processing paradigm; more about that in part 3. Yes, there's clearly a lot of good will and expertise in the de facto ID movement and they are a lot better in their attitudes than the didactarian Genesis literalists. But although I'm a Christian there's no chance of me joining de facto ID's Christian following. Many Christians will be wowed by their work and become followers, but count me well and truly out of anything like that! The church needs independent minds, not followers.

Friday, December 20, 2019

Breaking Through the Information Barrier In Natural History. Part I

(This post is still undergoing correction and enhancement)

Propeller technology was always going to have a problem
breaking the sound barrier whereas jet technology didn't.

I was fascinated to read this post on Panda's Thumb by mathematical evolutionist Joe Felsenstein. The post is about the application of Algorithmic Information theory by Intelligent Design theorists to the evolution question. Felsenstein is rather concerned that these IDists are attempting to use Algorithmic Information to prove (yet again) that evolution is impossible. But once again their attempts go awry: They are over interpreting a genuine complexity/improbability barrier as an impossibility barrier. But it turns out to be no more an impossibility barrier than the sound barrier was to flight; with the right technology the barrier can be broken. And once again they interpret the barrier as the sign of some kind of information conservation law which prevents so-called "natural forces" bringing about the emergence of life.

But having said that let me say that my own position occupies a space somewhere between, on the one hand, the IDists who don't abide by evolution because it uses what they believe to be creatively inferior so-called "natural forces" and on the other the atheists who are determined to show that evolution is a cinch and more-or-less in the bag. The fact is some evolutionists (although this may not apply to Felsenstein) do not fully appreciate the information barrier that evolution actually presents (See Larry Moran for example). In fact the implicit parallel computational paradigm found in the standard concept of evolution is not going to be up to the task unless it starts with a huge burden of up-front information. This is where I believe IDists like William Dembski's work is relevant and valid as I have said before, although Dembski and his ID interpreters have inferred that Dembski's work also implies some kind  of "Conservation of Information", whatever that means.

The consequence of IDists over interpreting evolution's information barrier in terms of an absolute barrier is that they then conclude they have in their hands proof that "natural forces" cannot generate life and that some extra "magic" is needed to inject information into natural history in order to arrive at bio-configurations. They identify that extra magic not as the "supernatural" (that would look too "unscientific"!) but instead as "Intelligent agency"

To a Christian like myself, however, this IDist philosophy raises questions: For although there is a clear creation vs God dualism in Christian theology I find the implicit dualism within creation implied by de facto IDism problematic: If an omniscient and omnipotent God has created so-called "natural forces" then it would seem to be quite within his capabilities to provision creation in such a way that natural history could conceivably include the "natural" emergence of living configurations. The "magic" may already be there for all we know!

Moreover, it is clear that human intelligence, which is one of the processes of the created order, can "create information" and I don't think the IDists would deny that. And yet as far as we know human intelligence appears not to transcend God's created providence. IDists, however,  are likely to attempt to get round this observation by trying to maintain that human intelligence has a mysterious and extraordinary ingredient which allows it to create information - for example, I have seen IDists use Roger Penrose's idea that human intelligence involves incomputable processes as the mysterious super-duper magic needed to create information. Penrose's ideas, if correct, imply that human intelligence (and presumably the intelligence that IDists claim on occasions injects information into  natural history) cannot be described algorithmically. If this line of argument can be maintained then it would justify the IDists dualism. But this IDist paradigm can be challenged. For a start I believe Penrose was wrong in his argument about the incomputablility of human thinking; see here and here. Yes, human thinking may have some extraordinary ingredient of which we are unware, but it may have been part of the covenant of creation all along: I don't, however,  believe it to be an incomputable process.

This post (and the next post) is my take on the "information barrier" debate between evolutionists and IDists. I will not be going into the minutiae of how the IDists or their antagonists arrive at their conclusions but I will be looking at the conclusions themselves and comparing them with my own conclusions based on three projects of mine which throw light on the subject. Viz:

1. Disorder & Randomness. This project defines randomness algorithmically. 
2. The Melancholia I project: This project discusses the possibility of information creation and/or destruction.
3. The Thinknet project: This project defines the term "specified information" teleologically and notes the parallels with quantum mechanics.

Summarising my position: I can go some of the way with Dembski and the IDists in that there is an issue with standard evolution in so far as it demands some mysterious up-front information in order to work, but there is no such thing as "the conservation of information"; information can be destroyed and created; but in any case the arguments used by IDists are unsafe because there is more than one understanding of what "information" actually is. It is likely that the IDist's "conservation of information" may result because we are most familiar with linear and parallel computing resources, a computing paradigm that has difficulty breaking through the information barrier. This contrasts, as we shall see, with the exponential resources of expanding parallelism and the presence of these resources exorcises the dualists ghost in the machine which haunts IDist philosophy. On the other hand some atheists are unaware that there is an information barrier (probably not true of Joe Felsenstein - see here and here) and are unlikely to see Dembski's work as laying down a serious challenge.

As usual I don't dogmatically push my own ideas from a polemical partisan soap box seeking conversions to my case. For me this is a personal quest, an adventure and journey through the spaces of the mind which quite likely may not lead anywhere. The journey, as I often say, is better than the destination.


In this post I want to introduce the information barrier via William Dembski's work. In the video of a lecture I embedded in my blog post here Dembski introduces his concept of the "conservation of information" via the following simple relationship:

Probability of life being generated by a given physical regime =  r <  p/q
....where p is the unconditional probability of life and where q is the conditional probability of life given a physical regime whose probability is r.

I give an elementary proof of this theorem in the said blog post. Relationship 1.0 will tell us what we are looking for if we rearrange it a bit. Viz:

q < p/r
Now, we expect p, which is the unconditional probability of life, to be very small; that is, if we were using a computational method which involved choosing molecular configurations at random, then it is fairly obvious that the complex organisation of configurations capable of self replication and self perpetuation will, by virtue of their rarity in configuration space, have an absolutely tiny probability. If we want to fix this problem of improbability and get a realistic chance of life emerging then conceivably we could contrive some physical regime whereby the chance of life coming about with a better than a random selection computation is q, where q >> p and where q is the conditional probability of life; that is the probability of life given the context of the physical regime.  But if q is to be realistic then from 2.0 it follows that we must have r ~ p; that is, the only way of increasing the conditional probability of life is to first select a highly improbable physical regime. As Dembski points out the improbability has now been shifted onto the probability of the physical regime. Dembski's point becomes clearer if we convert our probabilities to formal "information" values as follows.

Now, the so-called information function I(p), as used by Dembski, is defined for a probability of p using:

I(p) = - ln (p)

...where ln is the natural logarithm function. From this definition it is clear that for very small values of p function 3.0 is going to return a large value of I; that is a large "information" value.

The so-called "conservation of information" becomes clearer if we first take the natural log of expression 2.0, followed by applying definition 3.0 and then with a bit of rearranging we arrive at:

  I(q) + I(r) = I(p

Those looking for "natural explanations" don't expect the emergence of life from a "natural" physical regime to be a surprise but rather a very "natural" outcome given that regime. This is tantamount to requiring that I(q), the "surprisal" value of the conditional emergence of life, to be relatively low. Trouble is, because I(p) is so high, it follows from 4.0 that if I(q) is to be low then I(r), the information value of the physical regime, is necessarily very high. Relationship 4.0 being effectively a "zero sum game" expression  means that something has to soak up the information; either I(q) or I(r) or both. We are therefore always destined to be surprised by the extreme contingency nature flings at us from somewhere within equation 4.0. So, at first sight we seem to have an information conservation law expressed by 4.0.

Relationship 4.0 is in fact borne out by a closer look at conventional evolution, a process which somehow generates structures that in an absolute sense are highly improbable. Joe Felsenstein himself implicitly acknowledges equation 4.0 in his suggestion that the information for life is embodied in the physical regime we call the "laws of physics". (see here and here). If so then from 4.0 we infer that these laws must be a very improbable state of affairs and therefore of very high information. Evolution as it is currently conceived requires that this information expresses itself in what I call the "spongeam" about which I say more in this blog post. (Actually, my opinion is that the spongeam doesn't exist and that some other provision applies - more about that another time)

Equation 4.0 is beguiling: It seems to come out of some simple and rigorous mathematics. But it embeds an assumption. That assumption is that I(p) has a very high value because we assume from the outset a computation method which involves a serial "throwing of the die" as it were, a method which is going to require many, many conventional computational serial steps and therefore has a prohibitively high time-complexity as far as practice is concerned*2. But then if we have dice rather than just a die we can then have more than one trial at a time and the chances of creating life by chance alone increase, although it is clear that there would have to be an enormous number of parallel trials to return a significant probability of generating living configurations in this way. This multi-trials technique is effectively the brute force resort of the multiverse extremists. It is a fairly trivial conclusion that increasing the number of parallel trials has the effect of "destroying" information in that increasing trial numbers increases the probability of an outcome and so its information value goes down. Clearly in the face of huge numbers of parallel trials an outcome, no matter how oddly contingent it might be, is no longer a "surprise" in such a "multiverse". Not surprisingly this concept appeals to anti-theists who feel more at home in a Godless multiverse.

In the paper linked to in this post of the Melancholia I project I looked into the effect of increasing parallel trial numbers and in particular I considered the subject of expanding parallelism in the generation of outcomes. It's a fairly obvious conclusion that increasing parallel trials increases the probability of a result! But it also goes to show, perhaps a little less obviously, that information isn't conserved in such a context; in fact in this context information is effectively destroyed by the increasing trial numbers and in particular by expanding parallelism. This, sort of thing is likely to go down well with anti-theists because the "surprisal" value ( i.e. -ln(p) ) associated with outcomes is eroded, although of course anti-theists may still be surprised that such a multiplying system exists in the first place!

As I contend in this post multiverse ideas which posit a sufficient number of trials needed to destroy our surprise at the universe's amazing organised contingencies leaves us looking out on a cosmos whose aspect is empty, purposeless and anthropologically meaningless. And yes, I say it again; this kind of universe suits the anti-theists down to the ground; it seems to be the sort of universe they eminently prefer. But in spite of that there is something to take away from these multiverse ideas, in particular the idea of expanding parallelism, hinted at by quantum mechanics, and which  is evidence of the potential availability of huge computational resources. Given the concept of omniscience & omnipotence implicit in the Christian notion of God, positing the existence of these huge computational resources doesn't seem so outrageous. But in a Christian context the computational potential of expanding parallelism has, I suggest, purpose and teleology and is in fact evidence of a declarative seek, reject and select computational paradigm.


Although the -ln(p) concept of information used by Dembski succeeds in quantifying some of our intuitions about information it does have some notable inadequacies and it is these inadequacies which take us on to the subject of Felsenstein's Panda's Thumb post, namely Algorithmic Information theory. Let me explain...

Ironically I called the paper that explores the subject of expanding parallelism "Creating Information" rather Destroying Information. This is because my Melancholia I project is really about a concept of information very different to -ln(p). The need for this different concept of information becomes apparent from the following considerations. Although the function -ln(p) adequately quantifies our level of surprisal at outcomes this definition of information is not good at conveying the idea of configurational information. Take this example: The chances of finding a hydrogen atom at a designated point in the high vacuum of space is very small and therefore we have  a high information event here should it happen. But it is a very elementary event, an event which only conveys one bit of information: 'yes' or 'no' depending on whether a hydrogen atom appears or not. The trouble is that a one bit configuration is hardly what one would like to call a lot of information! Therefore we need something that is better at conveying quantity of information. From the function -ln(p) it follows that a one bit configuration can "contain" the same amount of information as a large n-bit configuration. This doesn't feel very intuitive, particularly if we are dealing with potentially large and complex configurations; it seems intuitively disagreeable to classify a complex configuration as possibly having the same level of information as a one bit configuration.*1

Algorithmic information theory attempts to measure the information content of a configuration via its computational complexity and this returns a measure of information which agrees with our intuitive ideas about the quantity of information found in a configuration, something that -In(p) doesn't necessarily convey.  However, using this concept of information we find that once again the IDists think they have stumbled on another information barrier that scuppers any creation of life by those inferior but dreaded "natural forces"! In the next post this contention will take me into the subject of my book on Disorder and randomness which also deals with algorithmic information theory. Once again we will find that expanding parallelism bursts through the information barrier. Although Joe Felsenstein and his buddies certainly won't need any help from me to engage the IDists I will in fact be using my own concept of Algorithmic Information to look into the IDist claims because it provides me with something immediately to hand.

There is one more ingredient that needs to be added to the mix to complete the picture of information creation and this is an ingredient which will certainly not be to the taste of anti-theists whose world view is one of a purposeless cosmos without teleology. I'm talking of my speculative proposal that the cosmos isn't working to some meaningless and mindless procedural process that just goes on and on and leads to nowhere but rather is operating some kind of purposeful declarative computation that uses expanding parallelism to seek, reject and select outcomes. It is in this context that the notion of specified information suddenly jumps into sharp focus; in fact I touch on this subject toward the end of the paper I've linked to in part 4 of my Thinknet project. (See section 11).

I have to confess that if the cosmos is using a purposeful declarative computational paradigm that makes use of expanding parallelism I'm far from having all the details: All I have currently is an understanding of the effect that expanding parallelism has on computational complexity and the metaphor of my Thinknet project which seems to have parallels with quantum mechanics; quantum mechanics looks suspiciously like a seek, reject and select declarative computation which taps into the resources of expanding parallelism. Contrasting alternatives to my conjectures are that we either have  the anti-theist's meaningless procedural multiverse or the primitive notion that God did indeed simply utter authoritarian magic words and via brute omnipotence was able to speak stuff into existence! The latter seems very unlikely theologically speaking: David Bump, who is a (nice) young earthist Christian I am currently corresponding with, has kindly and respectfully supplied me with a long document of his thoughts on what it means to be a Christian who sees God as creating things via spoken words  "as is" about 6000 years ago. Frankly I doubt it! As I have analysed David's arguments I have found that for me all this leads to huge theological problems and unless I turn to fideism these problems don't look as though they are going to go away! I will in due course be publishing my response to David.

*1 A bit stream can carry a lot of information in the sense of definition 3.0 because its information value is a product of many probabilities and this may equate to a very small probability and therefore a correspondingly high information content. But the trouble with -ln(p) is that a one bit configuration could be equally as information laden. Another problem with -ln(p) is that once a configuration becomes a "happened event" and is recorded, all its information is lost. This is because probability is a measure of subjective knowledge and therefore once a large configuration becomes known ground, no matter how complex, it loses all its information.... a sure sign that "information" in this subjective sense is easily destroyed and therefore not conserved.

*2 There is also another assumption here (or perhaps it's a confusion rather than an assumption) that probability and randomness are identical concepts - they are not: See my paper on Disorder and Randomness. Dembski uses the principle of indifference in assigning equal probabilities to outcomes for which no prior knowledge exists as to why one outcome should be preferred over the other and hence the outcomes from this subjective point of view have equal probability. This procedure is correct in my opinion; but two outcomes which subjectively speaking have an equal probability are not necessarily equally random; randomness is an objective quality deriving from confrontational disorder. 

Tuesday, December 10, 2019

Moral Relativism

It is true that atheism doesn't set people up well to resist the intellectual pathologies found in the extremes of postmodernism and nihilism; these philosophies are like corrosive acids liable to eat away at not only one's grasp on rationality and truth. but also of one's morality. The only defence are the deep heart felt instincts supporting good community which, of course, many atheists feel as strongly as anyone else (See Romans 2:14-16). But other than having the status of being identified as strong social instincts there is little more these instincts have to commend themselves to the atheist world view other than in these social relativist terms; any cosmic absoluteness to morality (and even rationality) is in the final analysis completely lost. 

In this connection I was intrigued by a post on the de facto intelligent design web site "Uncommon Descent" by its supremo Barry Arrington in which he posted his response to the comments of two atheists named as Ed George and Seversky. These two talk about morality in the context of their atheism.  Here's the first part of Arrington's post: 

ARRINGTON: Ed George asserted that morality is based on societal consensus.  Upright Biped utterly demolished that argument.  See here.  Seversky and Ed tried to respond to UB’s arguments.

Let’s start with Sev:

"I, like everyone else here, would also want [the rape] to stop. Why? I should not have to say this but it is because we can imagine her suffering and know that it is not something we would like to experience nor would we want to see it inflicted on anyone else. It’s called empathy and its derived principle of the Golden Rule which, in my view, is more than sufficient grounds for morality."

MY COMMENT: Well done Seversky! Empathy is the ultimate (God given) rationale for morality as we shall see. It is this rationale which motivates the succinct expression of moral code embodied in the Golden Rule. One can hardly complain if Severersky carries out a thoroughgoing implementation of this rule (which of course no human being, apart from one, can do perfectly). But for a thoroughgoing atheist there is no ultimate reason why this Golden Rule should have any claim to an absolute status; after all, it is quite likely that on the basis of a minimalist survival ethic one can imagine social contexts where putting self first may be a "better" strategy (whatever "better" means in this context). Ironically an unbridled free market may illustrate the potential for moral perversity in a world without moral absolutes: For example some claim that rampant selfish self-betterment is supposed to lead to a wealth "trickle-down" affect, an affect which from a survival point of view benefits everyone. 

Nevertheless Barry has a good starter here for sharing and promoting  a common moral rationale and perhaps discussing what the origins of this rationale might be. But unfortunately he blows his chance:

ARRINGTON: This is a muddled mashup of two of the materialists’ favorite dodges.  First Sev appeals to empathy as the basis for morality.  He completely ignores several problems with this argument, including:

1.  Mere feelings are a very flimsy ground for a moral system.

2.  Some people do not have empathy (we call them sociopaths).  If empathy is the basis for morality, a sociopath has no basis for morality.

MY COMMENT:  Contrary too what Arrington is claiming here the existence of consciousness cognition (which is the context in which feelings have meaning) is the only ground for a moral system as we shall see.

Arrington inadvertently acknowledges the crucial moral role played by empathy in his reference to sociopaths; when it's not present things go badly wrong. Sociopaths have something about them which means they have no regard for the feelings of conscious cognition. To get an inkling of what it may be like to be sociopathic think of some of those realistic "shoot-em-up" computer games: Human game players have no compunction in shooting up gaming entities simply because there is no consciousness cognition to empathise with! In a sense human beings who live good moral lives outside the games environment turn into "sociopaths" of sorts when they play computer games in so far as they have no empathy (and rightly so!) for the simulated beings. These simulated entities have no consciousness and therefore no feelings. So consciousness changes everything. Perhaps it is not surprising that some atheists are inclined to deny the reality of the first person perspective of conscious cognition (as Arrington well knows - See here). For some atheists the reality of the first person perspective has just too much mystique; if there really is such a strange thing as a first person perspective inaccessible to third person observation then who knows perhaps there's even a......

But although empathy is the ultimate rationale for morality there is a difference between empathy and moral systems. Moral systems are there to best serve a society of conscious cognitions and therefore with out conscious cognition moral systems are without meaning, goals and purpose. Therefore moral systems are a means to an end rather than an end in themselves. For if human beings, like the facades of computer game entities, are a mere simulacrum with no first person perspective behind them then moral systems are purposeless and meaningless. A moral system is a code of behaviour that is cognisance of people's feelings in the context of community,

Moral systems, however, can be intellectually taxing: It is difficult for humans to anticipate all the ramifications of their behaviour in a social context and come to a reliable opinion on which moral systems best serve a community of interacting conscious entities. The moral challenge humans face therefore resolves itself into two challenges; firstly the challenge of raising a sufficient empathetic concern for other conscious entities and secondly the epistemic challenge of having to work out which moral system best serves community interests: Human beings, of course, are only capable of  imperfectly  responding to both challenges.  But even if we assume that a community is composed of perfectly empathetic beings anxious to get a moral system in place that best serves the community (clearly an idealistic assumption) there remains the problem that human epistemic limitations will imply they are unlikely to discover a moral code that best serves community.

The Golden rule is a neat one liner which sums up the spirit of moral systems but the complexity of community means that the devil is going to be in the detail; the system of moral code that best serves a community of conscious beings is going to be only fully understood if  one has divine omniscience. 

After Arrington's weak start, however, things improve:

ARRINGTON: 3.  Even for those with empathy, Sev offers no reason why they should not suppress their feelings if they believe the pleasure of their act exceeds the cost of the act in pangs of empathy.

Next Sev appeals to the Golden Rule as a ground for morality.  Well, Sev, it certainly is.  Yet, materialism offers no ground on which to adhere to the Golden Rule as opposed to any other rule such as “might makes right” or “if it feels good do it.”  Sev demonstrates yet again that no sane person actually acts as if materialism is true.

Sev, if you have to act as if your most deeply held metaphysical commitment is false as you live your everyday life, perhaps you should reexamine your metaphysical commitments.

MY COMMENT: Arrington's third point above does make headway: On what basis, other than ephemeral instinct, should anyone be troubled by the consciousness of other human beings. and instead live for self? For if as some atheists maintain consciousness is just an illusion constructed from a complex social interface why bother with it and instead just play as if one is in a computer game? But contrariwise I suppose all said and done an atheist could still claim that whilst moral instincts and code have no real absolute cosmic significance this doesn't stop people behaving instinctively with empathy and using the Golden rule. Let's hope that remains the case..... there is an unfortunate human history of principles based on bad ideology over ruling compassion all the way from the Nazis, through Christian fundamentalists to the French and October revolutions.

ARRINGTON: Now let’s go to Ed, who writes:

. . . UB’s question is not worth responding to

Ed states that a person who lives by himself has no moral obligation to anyone who venture near him.  UB points out that if that is true, Ed has just given said loner a license to rape any woman who ventures too near without breaking any moral injunction.  Instead of abandoning his screamingly stupid assertion, Ed pretends UB’s extension of Ed’s premises to their logical conclusion is “not worth responding to”. Ed is not only stupid.  He is a coward.

MY COMMENT:  ... or alternatively what does the loner do if he sees someone who desperately  needs help? (for example a child drowning in a pond? - assuming the loner can swim). Here we have an example of how the futility and purposelessness  implicit in atheism can have a corrosive affect on one's sense of what is right.

But don't let anyone go away thinking that I'm suggesting that it is only atheists whose morality is subject to corruption: As we well know those who think they have a moral code sanctioned by divine authority and go on to implement it without cognizance of the first person perspective, are also liable to corruption; especially so if they think their reading of scripture provides an all but direct, easy and utterly certain revelation of the divine will. Whether a moral system is arrived at from first principles or based on an interpretation of Holy Writ, the fact is you can't trust human beings to get it all right!

Relevant Link

Friday, November 22, 2019

Thinknet, Alexa and The Shopping List. Part I

In a "Computerphile" video my son Stuart Reeves explains in high level functional terms the stages involved in "Alexa" parsing, processing and responding to verbal requests. In the video he starts by asking Alexa what may seem a fairly simple question:

Alexa, how do I add something to my shopping list? 

Alexa responds (helpfully, may I say) by regurgitating the data on "Wikihow". But Stuart complains "This is not what I meant! It's a very useful thing if you didn't know how to make a shopping list, but it's not to do with my shopping list!". It seems that poor Alexa didn't twig the subtle differences between these two readings: a) Alexa, how do I add something to my shopping list?  and b) Alexa, how do I add something to a shopping list? Naturally enough Alexa opts for the second generic answer as 'she' has no information on the particularities of the construction of Stuart's shopping list. 

Important differences in meaning may be connoted by a single word, in this case the word "my". Moreover, whether or not this word actually impacts the meaning is subject to probabilities; somebody who wants a generic answer on how to construct a shopping list may have an idiosyncratic way of talking which causes them to slip in the word "my" rather than an "a".  If the question had been put to me I might at first responded as Alexa did and miss the subtlety connoted by "my". However, depending on context "my" could get more stress: For example if I was dealing with a learning difficulties person who was known to need a lot of help this context might set me up to understand that the question is about a proprietary situation and that the generic answer is inadequate.

Stuart's off-screen assistant is invited to put a question to Alexa and he asks this: "Alexa, what is 'Computerphile'?". Alexa responds with an explanation of  "computer file"!  It is clear from this that poor old Alexa often isn't party to essential contextual information which can throw her off course completely. In fact before I saw this video I had never myself heard of "Computerphile" and outside the context of the "Computerphile" videos I would have heard the question, as did Alexa, as a question about "computer file" and responded accordingly. But when one becomes aware that one is looking at a video in the "Computerphile" series this alerts one to the contextualised meaning of "Computerphile" and this shifts the semantic goal posts completely. 

On balance I have to say that I came away from this video by these two computer buffs, who seem to get great pleasure in belittling Alexa, feeling sorry for her! This only goes to show that the human/computer interface has advanced to the extent that it can so pull the wool over one's eyes that one is urged to anthropomorphise a computer, attributing to it consciousness and gender!

Having run down Alexa Stuart then goes on to break down Alexa's functionality into broad-brush schematic stages using the shopping list request as an example.

It is clear from Stuart's block diagram explanation of Alexa's operation that we are dealing with a very complex algorithm with very large data resources available to it. Although some of the general ideas are clear it is apparent that in the programming of Alexa the devil has been very much in the detail. But as we are in broad brush mode we can wave our hands in the direction of functional blocks declaring that "This does this!" and leave the details to be worked out later, preferably by someone else!


The precise example Stuart unpacks is this:

                                             Alexa could you tell me what's on my shopping list?

Although it is clear the "Alexa" application makes use of a huge suite of software and huge data resources, I would like to show that there is in fact a very general theme running through the whole package and that this general theme is based on the "pattern recognition" I sketch out in my "Thinknet" project. 


An elementary Thinknet "recognition" event occurs when two input tokens results in an intersection. The concept of an intersection is closely related to the concept of the intersection between two overlapping sets. For example, items which have the property of being "liquid" and items which have the property of being "red", overlap  under the heading of "paint".  In  its simplest form a Thinknet intersection, D, (such as 'paint')  results from two input stimuli A & B (such as 'liquid' and 'red') and this is what is meant by a Thinknet "recognition" event. We can symbolise this simple recognition process as follows:

                                                                                           [A B]  → D

Here inputs A and B  result in the intersecting pattern D. (In principle it is of course possible for an intersection to exist for any number of input stimuli, but for simplicity only two are used here) If we now  input a third stimulating pattern C and combine it with the intersection D we represent this as:

[AB] C → DC

Since A and B have resulted in the selection of the intersecting pattern D we now have D and C as the stimuli which are candidates for the next level intersection between D and C; if indeed there exists an intersection for D and C. If this intersection exists (let's call it E) then the full sequence of intersections can be represented thus:

[AB]C → [DC] → E

As an example the stimuli D might be "tin" and together with "red" and "liquid" this might result in the final intersecting pattern E being"tin of red paint".

Expression 3.0 could be equivalently written as:

[[AB]C]  → E

The square brackets are used to represent a successful intersection operation and in 4.0 the bracketing is applied twice: First to AB and then to the intersection of A and B and the stimuli C. The simple construction in 4.0 is now enough to give us indefinitely nested structures. For example, if we have the general pattern:



Let us assume 5.0 has strong intersections which result in a full recognition event. To exemplify this full recognition event using square bracket notation we may have for example:



This nested structure can alternatively be represented as a sequence of intersections: The first layer intersections are:

[ABC] → K and  [FG] → M


The second layer intersection (only one in fact) is:

[K D] → L

The third layer intersection combines the output of 7.0 and 7.1 with the residue E in 6.0 as follows:

[LEM] → J


...this would mean that pattern 5.0, if the intersections are strong enough, implies the likely presence of the pattern J.

The operation of forming intersections is not unlike that of searching for pages on the web where tokens are input and these tokens then define a list of pages which contain these tokens: If the combination of input token has sufficient specification value it will narrow down the search to just a few pages. However, there is a difference between web searches and Thinknet in that in Thinknet the intersection itself is then submitted along with other patterns to the next layer of intersection problem solving resulting in the nested structures we have seen above.

For advanced intersections to occur it is clear that Thinknet must have available a large data resource which effectively embodies the category information required to reach intersections. This large data resource actually takes the form of a weighted association network and this network is way of storing category information. How this network is formed in the first place is another story.

The forgoing examples give us a framework for discussing pattern recognition, Thinknet style. But as we are in broad-brush mode we can ignore the tricky details needed to get something to work really well and instead only talk about the general idea. 


If we take a context where we have the input of an unordered collection of stimuli like A, B, C, and D, we may find that at first attempt these fail to form an all embracing intersection and therefore Thinknet fails to recognise the whole context. For example:

[A, D]  [C B]

Or  expressed in more explicit terms:

[A D] → E  and  [C B]  F 

Here  [A D] and [C B] have generated intersections E and F. But the intersection formation on E and F has failed to go any further. There are however at least two ways in which this "intellectual blockage" may be circumvented. The high level goal seeking programmed into Thinknet means that it lives to understand and for Thinknet "to understand" means forming intersections. It is good, therefore, if there is more than one way of reaching an understanding.


One way to relieve the deadlock expressed by 9.0 may be to "contextualize" the problem by adding another stimulus; let us call this stimulus G. Hence, the input becomes A, B, C, D, G. Adding G may result in a complete solution as represented by the bracketing below:

[[[A, D] G]  [C B]]

Expressed explicitly in terms of intersection layers:

First layer of intersections:

[A D] → E  and [C B]  F before - see 10.0 

Second layer intersection:

[E G] → H 

13.0 the introduction of G means that E and G combine to generate the new intersection H.
The third layer of intersection generation results in full resolution:

[H F] → J

Thus the result J represents a complete recognition of the inputs A, B, C, D and the contextualising input G.


A second way which may relieve the deadlock requires a bit more sophistication. In Edward De Bono's book The Mechanism of Mind, a book on which much of my Thinknet thinking was based, we find a way of getting round this "mental blockage"; I call it a "mental blockage" because on re-submission of problem 9.0 Thinknet as it stands would simply generate the same result. But it wouldn't necessarily generate the same result if the "state" of Thinknet changed slightly every time a problem was submitted. This is achieved by ensuring that when a pattern is activated as a result of an intersection that pattern subsequently needs a greater signal threshold to activate it next time. This means that it may fail to become an intersection on the second attempt. and this may open the way for other patterns with a lower activation threshold to become intersections instead.  

For example, let us suppose that E and F as in 12.0 fail to become intersections on second attempt (or it may take third or even forth attempts) as a result of their thresholds being raised and instead we find a complete intersection solution forms as follows:

[[A D C]  B]

Or in terms of intersection layers:

First layer:
[A D C] → G 
Second layer:
 [G B] → H

The dead lock expressed by 9.0  has been broken by the  threshold increases on E and F, preventing them becoming intersections on later tries; it's a bit like raising land to prevent water pooling and stagnating at the same place by forcing it to pool elsewhere. The important point is that because the path of least resistance has become blocked by increasing thresholds Thinknet has found a new route through to a complete solution. Another way of thinking of the raising of thresholds with use is as a kind of "boredom" factor which encourages Thinknet to move on and try elsewhere. 


When I worked on the Thinknet software I got as far as programming nested intersections, but what I didn't do was add threshold changes as a function of use; this would effectively have given Thinknet a feedback loop in so far as outcomes would effect thresholds and the thresholds would effect future outcomes. Adding such a functionality would open up a vista of possible devilish detail: In particular, making the feedback non-linear would introduce the potential for complex chaotic behaviour. If we can think of a Thinknet "thinking session" as a session involving repeated re-submission of an intersection problem, a nonlinear Thinknet would never quite return to its starting state. This would turn Thinknet into a system which searched for intersections by changing its state chaotically: In so far as chaotic evolution is a way of ringing-the-changes (chaotically) Thinknet becomes an intersection search engine. Thus, the more time Thinknet spends "thinking" the more chance that a complete intersection solution pops out of the chaotic regime that thinking entails. But I must add a caution here. A chaotic Thinknet is far from an exhaustive and systematic search engine; its driving energy is chaos, a process which approximates to a random search - it is therefore not an efficient search. But one thing it is: A chaotic Thinknet is "creative" in the sense that it has the potential to come up with unique solutions where the search space is too large, open ended or ill defined to embark on a search via a systematic and exhaustive ringing-of-the-changes. 

I will be using my sketch of Thinknet principles (and how it could work if the details were sufficiently worked out) to discuss if and how Alexa's task, at all stages, can be formulated in terms of Thinknet style pattern recognition. The general nature of this style of pattern recognition encapsulates a simple idea: Thinknet uses a few input clues which, like a Web search engine, narrows down the field by "set fractionating"; that is, where multiple "Venn diagram" sets overlap,the resulting subset may be very narrow. However, where Thinknet differs from this kind of simple search is in problem re-submission and its non-linear "thinking" process. But I must concede that this underlying chaotic searching may not be suitable for an Alexa type application because there is probably a demand for predictable and controllable output results when it comes to slave systems like Alexa. In contrast a fully fledged Thinknet system has strong proprietary goal seeking behaviour and is orientated towards seeking idiosyncratic, creative and unpredictable intersections rather than finding servile answers through an exhaustive systematic search.  In short, it's too human to be of use!