In a "Computerphile" video my son Stuart Reeves explains in high level functional terms the stages involved in "Alexa" parsing, processing and responding to verbal requests. In the video he starts by asking Alexa what may seem a fairly simple question:
Alexa, how do I add something to my shopping list?
Alexa responds (helpfully, may I say) by regurgitating the data on "Wikihow". But Stuart complains "This is not what I meant! It's a very useful thing if you didn't know how to make a shopping list, but it's not to do with my shopping list!". It seems that poor Alexa didn't twig the subtle differences between these two readings: a) Alexa, how do I add something to my shopping list? and b) Alexa, how do I add something to a shopping list? Naturally enough Alexa opts for the second generic answer as 'she' has no information on the particularities of the construction of Stuart's shopping list.
Important differences in meaning may be connoted by a single word, in this case the word "my". Moreover, whether or not this word actually impacts the meaning is subject to probabilities; somebody who wants a generic answer on how to construct a shopping list may have an idiosyncratic way of talking which causes them to slip in the word "my" rather than an "a". If the question had been put to me I might at first responded as Alexa did and miss the subtlety connoted by "my". However, depending on context "my" could get more stress: For example if I was dealing with a learning difficulties person who was known to need a lot of help this context might set me up to understand that the question is about a proprietary situation and that the generic answer is inadequate.
Stuart's off-screen assistant is invited to put a question to Alexa and he asks this: "Alexa, what is 'Computerphile'?". Alexa responds with an explanation of "computer file"! It is clear from this that poor old Alexa often isn't party to essential contextual information which can throw her off course completely. In fact before I saw this video I had never myself heard of "Computerphile" and outside the context of the "Computerphile" videos I would have heard the question, as did Alexa, as a question about "computer file" and responded accordingly. But when one becomes aware that one is looking at a video in the "Computerphile" series this alerts one to the contextualised meaning of "Computerphile" and this shifts the semantic goal posts completely.
On balance I have to say that I came away from this video by these two computer buffs, who seem to get great pleasure in belittling Alexa, feeling sorry for her! This only goes to show that the human/computer interface has advanced to the extent that it can so pull the wool over one's eyes that one is urged to anthropomorphise a computer, attributing to it consciousness and gender!
Having run down Alexa Stuart then goes on to break down Alexa's functionality into broad-brush schematic stages using the shopping list request as an example.
Stuart's off-screen assistant is invited to put a question to Alexa and he asks this: "Alexa, what is 'Computerphile'?". Alexa responds with an explanation of "computer file"! It is clear from this that poor old Alexa often isn't party to essential contextual information which can throw her off course completely. In fact before I saw this video I had never myself heard of "Computerphile" and outside the context of the "Computerphile" videos I would have heard the question, as did Alexa, as a question about "computer file" and responded accordingly. But when one becomes aware that one is looking at a video in the "Computerphile" series this alerts one to the contextualised meaning of "Computerphile" and this shifts the semantic goal posts completely.
On balance I have to say that I came away from this video by these two computer buffs, who seem to get great pleasure in belittling Alexa, feeling sorry for her! This only goes to show that the human/computer interface has advanced to the extent that it can so pull the wool over one's eyes that one is urged to anthropomorphise a computer, attributing to it consciousness and gender!
Having run down Alexa Stuart then goes on to break down Alexa's functionality into broad-brush schematic stages using the shopping list request as an example.
It is clear from Stuart's block diagram explanation of Alexa's operation that we are dealing with a very complex algorithm with very large data resources available to it. Although some of the general ideas are clear it is apparent that in the programming of Alexa the devil has been very much in the detail. But as we are in broad brush mode we can wave our hands in the direction of functional blocks declaring that "This does this!" and leave the details to be worked out later, preferably by someone else!
***
The precise example Stuart unpacks is this:
Alexa could you tell me what's on my shopping list?
Although it is clear the "Alexa" application makes use of a huge suite of software and huge data resources, I would like to show that there is in fact a very general theme running through the whole package and that this general theme is based on the "pattern recognition" I sketch out in my "Thinknet" project.
***
An elementary Thinknet "recognition" event occurs when two input tokens results in an intersection. The concept of an intersection is closely related to the concept of the intersection between two overlapping sets. For example, items which have the property of being "liquid" and items which have the property of being "red", overlap under the heading of "paint". In its simplest form a Thinknet intersection, D, (such as 'paint') results from two input stimuli A & B (such as 'liquid' and 'red') and this is what is meant by a Thinknet "recognition" event. We can symbolise this simple recognition process as follows:
[A B] → D
1.0
Here inputs A and B result in the intersecting pattern D. (In principle it is of course possible for an intersection to exist for any number of input stimuli, but for simplicity only two are used here) If we now input a third stimulating pattern C and combine it with the intersection D we represent this as:
[AB] C → DC
2.0
Since A
and B have resulted in the selection of the intersecting pattern D we
now have D and C as the stimuli which are candidates for the next level intersection between D and C; if indeed there exists an intersection for D and C. If this intersection exists (let's call it E) then the full sequence of intersections can be represented thus:
[AB]C →
[DC] → E
3.0
As an example the stimuli D might be "tin" and together with "red" and "liquid" this might result in the final intersecting pattern E being"tin of red paint".
Expression
3.0 could be equivalently written as:
[[AB]C] → E
4.0
The square brackets are used to represent a successful intersection operation and in 4.0 the bracketing is applied twice: First to AB and then to the intersection of A and B and the stimuli C. The simple construction in 4.0 is now enough to give us indefinitely nested structures.
For example, if we have the general pattern:
ABCDEFG
5.0
Let us assume 5.0 has strong intersections which result in a full recognition event. To exemplify this full recognition event using square bracket notation we may have for example:
[[[ABC]D]E[FG]]
6.0
This nested structure can alternatively be represented as a sequence of intersections: The first layer intersections are:
[ABC] →
K and [FG] → M
7.0
The second layer intersection (only one in fact) is:
[K D] → L
7.1
The third layer intersection combines the output of 7.0 and 7.1 with the residue E in 6.0 as follows:
[LEM] →
J
8.0
...this would mean that pattern 5.0, if the intersections are strong enough, implies the likely presence of the pattern J.
The operation of forming intersections is not unlike that of searching for pages on the web where tokens are input and these tokens then define a list of pages which contain these tokens: If the combination of input token has sufficient specification value it will narrow down the search to just a few pages. However, there is a difference between web searches and Thinknet in that in Thinknet the intersection itself is then submitted along with other patterns to the next layer of intersection problem solving resulting in the nested structures we have seen above.
For advanced intersections to occur it is clear that Thinknet must have available a large data resource which effectively embodies the category information required to reach intersections. This large data resource actually takes the form of a weighted association network and this network is way of storing category information. How this network is formed in the first place is another story.
The forgoing examples give us a framework for discussing pattern recognition, Thinknet style. But as we are in broad-brush mode we can ignore the tricky details needed to get something to work really well and instead only talk about the general idea.
***
If we take a context where we have the input of an unordered collection of stimuli like A, B, C, and D, we may find that at first attempt these fail to form an all embracing intersection and therefore Thinknet fails to recognise the whole context. For example:
[A, D] [C B]
9.0
Or expressed in more explicit terms:
[A D] → E and [C B] → F
10.0
***
[[[A, D] G] [C B]]
11.0
Expressed explicitly in terms of intersection layers:
First layer of intersections:
First layer of intersections:
[A D] → E and [C B] → F .....as before - see 10.0
12.0
[E G] → H
13.0
...here. the introduction of G means that E and G combine to generate the new intersection H.
The third layer of intersection generation results in full resolution:
[H F] → J
14.0
Thus the result J represents a complete recognition of the inputs A, B, C, D and the contextualising input G.
***
A second way which may relieve the deadlock requires a bit more sophistication. In Edward De Bono's book The Mechanism of Mind, a book on which much of my Thinknet thinking was based, we find a way of getting round this "mental blockage"; I call it a "mental blockage" because on re-submission of problem 9.0 Thinknet as it stands would simply generate the same result. But it wouldn't necessarily generate the same result if the "state" of Thinknet changed slightly every time a problem was submitted. This is achieved by ensuring that when a pattern is activated as a result of an intersection that pattern subsequently needs a greater signal threshold to activate it next time. This means that it may fail to become an intersection on the second attempt. and this may open the way for other patterns with a lower activation threshold to become intersections instead.
For example, let us suppose that E and F as in 12.0 fail to become intersections on second attempt (or it may take third or even forth attempts) as a result of their thresholds being raised and instead we find a complete intersection solution forms as follows:
[[A D C] B]
15.0
Or in terms of intersection layers:
First layer:
[A D C] → G
16.0
Second layer:
[G B] → H
17.0
The dead lock expressed by 9.0 has been broken by the threshold increases on E and F, preventing them becoming intersections on later tries; it's a bit like raising land to prevent water pooling and stagnating at the same place by forcing it to pool elsewhere. The important point is that because the path of least resistance has become blocked by increasing thresholds Thinknet has found a new route through to a complete solution. Another way of thinking of the raising of thresholds with use is as a kind of "boredom" factor which encourages Thinknet to move on and try elsewhere.
***
When I worked on the Thinknet software I got as far as programming nested intersections, but what I didn't do was add threshold changes as a function of use; this would effectively have given Thinknet a feedback loop in so far as outcomes would effect thresholds and the thresholds would effect future outcomes. Adding such a functionality would open up a vista of possible devilish detail: In particular, making the feedback non-linear would introduce the potential for complex chaotic behaviour. If we can think of a Thinknet "thinking session" as a session involving repeated re-submission of an intersection problem, a nonlinear Thinknet would never quite return to its starting state. This would turn Thinknet into a system which searched for intersections by changing its state chaotically: In so far as chaotic evolution is a way of ringing-the-changes (chaotically) Thinknet becomes an intersection search engine. Thus, the more time Thinknet spends "thinking" the more chance that a complete intersection solution pops out of the chaotic regime that thinking entails. But I must add a caution here. A chaotic Thinknet is far from an exhaustive and systematic search engine; its driving energy is chaos, a process which approximates to a random search - it is therefore not an efficient search. But one thing it is: A chaotic Thinknet is "creative" in the sense that it has the potential to come up with unique solutions where the search space is too large, open ended or ill defined to embark on a search via a systematic and exhaustive ringing-of-the-changes.
I will be using my sketch of Thinknet principles (and how it could work if the details were sufficiently worked out) to discuss if and how Alexa's task, at all stages, can be formulated in terms of Thinknet style pattern recognition. The general nature of this style of pattern recognition encapsulates a simple idea: Thinknet uses a few input clues which, like a Web search engine, narrows down the field by "set fractionating"; that is, where multiple "Venn diagram" sets overlap,the resulting subset may be very narrow. However, where Thinknet differs from this kind of simple search is in problem re-submission and its non-linear "thinking" process. But I must concede that this underlying chaotic searching may not be suitable for an Alexa type application because there is probably a demand for predictable and controllable output results when it comes to slave systems like Alexa. In contrast a fully fledged Thinknet system has strong proprietary goal seeking behaviour and is orientated towards seeking idiosyncratic, creative and unpredictable intersections rather than finding servile answers through an exhaustive systematic search. In short, it's too human to be of use!