Why is processing language hard for software?


Linguistic knowledge doesn’t exist in isolation. Human ability to understand and to produce natural language is only possible when connected with world knowledge. Are humans able, for example, to understand or produce a text that describes a traffic accident without knowing anything about traffic regulations, about the speed of pedestrians, bikers or cars or about the driving conditions on wet streets?

A short example may illustrate this

point: Imagine how a human or a computer goes about understanding the following texts:

1. Peter was hungry. He looked into the refrigerator.
2. Peter was hungry. He looked into the Yellow Pages.

For a human who reads these texts the connection between the two sentences is at once obvious. The unpleasant feeling called “hunger” in English is the trigger for two quite different kinds of action that could both be equally suitable to overcome the unpleasant condition. For humans it is equally natural to answer questions like “What happened next?”, for instance, with “Peter ate something that was in the refrigerator” in the first case or “Peter looked up a restaurant and called them or went there” in the second case. This human ability to draw conclusions that are not explicit in the texts without any conscious effort are hard to explain when world knowledge is not taken into consideration. World knowledge or common sense is essential in our ability to understand language.

1. Call me a taxi.
2. Call me John.

Again you use common sense, knowing that people are not usually called taxi, hence the first sentence implies calling a taxi company, probably using the phone. All this information is not explicitly mentioned in the sentence, but any rational person is able to imply it. When using language we are relying on a wealth of world knowledge and assuming that the other person is familiar with it.

There has been many attempts to teach computers common sense, most notably the Cyc project attempted to manually build a database of human common sense knowledge, led by researcher Doug Lenat. Doug Lenat and his team of knowledge engineers have worked for nearly two decades, at the cost of many tens of millions of dollars, painstakingly building up what is now a database of 1.5 million pieces of commonsense knowledge. Despite all the extensive effort on the Cyc project, 1.5 million pieces of knowledge, it is still terribly far away by about several hundred million. The Cyc project faces a challenge no single team could expect to succeed at. And indeed, they themselves admit they are still one or two orders of magnitude away from what is needed. One of the principled scientists Ramanathan V. Guha moved to Apple Computers and was quoted in Fortune Magazine as saying that “the goal of creating a system that would exhibit real common sense failed”. In summary, parsing natural language grammar is easy, teaching computers common sense or background knowledge is a problem that hasn’t been solved to date.



Language


As you are reading these words, you are taking part in one of the wonders of the natural world. For you and I belong to a species with a remarkable ability: we can shape events in each other’s brains with exquisite precision. I am not referring to telepathy or mind control, the ability is language. What is truly arresting about our kind is better captured in the story of the Tower of Babel, in which humanity, speaking a single language, came so close to reaching heaven that God himself felt threatened. Language is so tightly woven into human experience that it is scarcely possible to imagine life without it. Chances are that if you find two or more people together anywhere on earth, they will soon be exchanging words. When there is no one to talk with, people talk to themselves, to their dogs, even to their plants.

In the 1950s the social sciences were dominated by behaviourism, the school of thought popularized by John Watson and B.F. Skinner. Behaviour was explained by laws of stimulus-response learning that could be studied with rats pressing bars and dogs salivating to tones. In the 1970s, Noam Chomsky called attention to two fundamental facts about language. First, virtually every sentence that a person utters or understands is a brand new combination of words, appearing for the first time in the history of the universe. Therefore, a language cannot be a repertoire of responses; the brain must contain a recipe or program that can build an unlimited set of mental grammar. The second fundamental fact is that children develop these complex grammars rapidly and without formal instruction and grow up to give consistent interpretations to novel sentence constructions that they have never before encountered. Therefore, Chomsky argued, that children must innately be equipped with a plan of common grammar of all languages. A Universal Grammar, that tells them how to distil the syntactic patterns out of the speech of their parents. Language is not a cultural artefact that we learn the way we learn to tell time or tie our shoelaces. Instead, it is a distinct piece of the biological makeup of our brains. Language is a complex, specialized skill, which develops in the child spontaneously, without conscious effort or formal instruction, is deployed without awareness of its underlying logic. Language is qualitatively the same in every individual.
People know how to talk in more or less the way spiders know how to spin webs.

In a human grammar, words are grouped into phrases, like twigs joined in a branch. Little phrases can be joined into bigger ones. Take the sentence “the tall gentleman wears a hat”. It begins with three words that hang together as a unit, the noun phrase “the tall gentleman”. In English a noun phrase (NP) is composed of a noun sometimes preceded by an article or determiner and any number of adjectives. All this can be captured in a rule that defines what English noun phrases looks like in general.

NP -> (determiner) (Adjective*) Noun
A noun phrase consists of an optional determiner, followed by any number of adjectives, followed by a noun.
() means optional
* means one or more
-> means consist of
The rule defines an upside down tree branch:

Here are two other rules, one defining the English sentence (S), the other defining the verb phrase (VP)
S -> NP VP
A sentence consists of a noun phrase followed by a verb phrase.
VP-> verb NP
A verb phrase consists of a verb followed by a noun phrase
A set of rules like the ones I have listed “phrase structure grammar” defines a sentence by linking the words to branches on an inverted tree.

The key insight is that a tree is modular, like Lego blocks. It allows one block or phrase to snap into any of several positions inside other phrases.

Notice that noun phrases and verb phrases have a lot in common:
1. A head, which gives the phrase its name and determines what it is about.
2. Some role players, which grouped with the head inside a sub phrase.
3. Modifiers, which appear outside the noun or verb.
4. A subject.

The orderings inside a noun phrase and inside a verb phrase are the same: the noun comes before its role-players, and the verb comes before its role players. The modifiers go to the right in both cases, the subject to the left. The design is the same in prepositional phrases and adjective phrases. With this common design, there is no need to write out a long list of rules to capture what is inside a speaker’s head. There maybe just one pair of super rules for the entire language, where distinctions among nouns, verbs, prepositions and adjectives are collapsed and all four specified with a variable like “X”. We can just call itan “X phrase”.

X Phrase -> (SPEC) X’ (Y Phrase *)
A Phrase consists of an optional subject, followed by an X’, followed by any number of modifiers.
X’ -> X (Z Phrase *)
X’ consists of a head word, followed by any number of role-players.
X can be noun, verb, adjective, adverb, and preposition
Y can be noun, verb, adjective, adverb, and preposition different from X
Z can be noun, verb, adjective, adverb, and preposition different from X and Y

This rule extends to all languages. In English, the head of the phrase comes before its role players. In many languages, like Japanese, it is the other way around but it is the other way around across the board, across all the kinds of phrases in that language. This is a remarkable discovery. It means that the super rules suffice not only for all phrases in English, but all phrases in all languages, with one modification, removing the left to right order from each super rule. This is the first leg of the “principles and parameters” theory proposed by Chomsky.