Why is processing language hard for software?


Linguistic knowledge doesn’t exist in isolation. Human ability to understand and to produce natural language is only possible when connected with world knowledge. Are humans able, for example, to understand or produce a text that describes a traffic accident without knowing anything about traffic regulations, about the speed of pedestrians, bikers or cars or about the driving conditions on wet streets?

A short example may illustrate this

point: Imagine how a human or a computer goes about understanding the following texts:

1. Peter was hungry. He looked into the refrigerator.
2. Peter was hungry. He looked into the Yellow Pages.

For a human who reads these texts the connection between the two sentences is at once obvious. The unpleasant feeling called “hunger” in English is the trigger for two quite different kinds of action that could both be equally suitable to overcome the unpleasant condition. For humans it is equally natural to answer questions like “What happened next?”, for instance, with “Peter ate something that was in the refrigerator” in the first case or “Peter looked up a restaurant and called them or went there” in the second case. This human ability to draw conclusions that are not explicit in the texts without any conscious effort are hard to explain when world knowledge is not taken into consideration. World knowledge or common sense is essential in our ability to understand language.

1. Call me a taxi.
2. Call me John.

Again you use common sense, knowing that people are not usually called taxi, hence the first sentence implies calling a taxi company, probably using the phone. All this information is not explicitly mentioned in the sentence, but any rational person is able to imply it. When using language we are relying on a wealth of world knowledge and assuming that the other person is familiar with it.

There has been many attempts to teach computers common sense, most notably the Cyc project attempted to manually build a database of human common sense knowledge, led by researcher Doug Lenat. Doug Lenat and his team of knowledge engineers have worked for nearly two decades, at the cost of many tens of millions of dollars, painstakingly building up what is now a database of 1.5 million pieces of commonsense knowledge. Despite all the extensive effort on the Cyc project, 1.5 million pieces of knowledge, it is still terribly far away by about several hundred million. The Cyc project faces a challenge no single team could expect to succeed at. And indeed, they themselves admit they are still one or two orders of magnitude away from what is needed. One of the principled scientists Ramanathan V. Guha moved to Apple Computers and was quoted in Fortune Magazine as saying that “the goal of creating a system that would exhibit real common sense failed”. In summary, parsing natural language grammar is easy, teaching computers common sense or background knowledge is a problem that hasn’t been solved to date.


No Comments, Comment or Ping

Reply to “Why is processing language hard for software?”