The Turing Test 65 Years On: What We’ve Learnt

Alan Turing

There has been renewed interest in Artificial Intelligence (AI) and Natural Language Processing (NLP) as a means of humanising the complex technological landscape that we encounter in our day-to-day lives. This has not escaped the attention of industry analysts: Gartner has recently released its top industry trends, naming the smart machines era as the ‘most disruptive in the history of IT.’

It has also captured the imagination of the public: the recent film ‘The Imitation Game’ tells the story of Alan Turing, one of the most significant mathematicians, logicians, and cryptanalysts of the modern era, credited with cracking the Nazi code, laying the foundations for the field of computer science and being the first to ask the question of whether machines could be made to think.

In the context of the latter, he proposed the ‘imitation game’ or what is now referred to as The Turing Test: a human interrogator can engage in an arbitrary verbal interaction with a computer and another human, without knowledge of which is which. If the interrogator cannot tell the human and the computer apart then the program can be said to be intelligent.

Recently, a software program modelled after a 13-year old boy and named ‘Eugene Gootsman’ by its creators took the Turing Test and was rumoured to have passed it. Very soon afterwards, however, it was observed that Eugene Gootsman made use of trickery to fool judges; a closer examination revealed that it suffered from all of the same problems with past entrants. The test had not, in fact, been passed.

As the voices critical of Eugene’s achievement grew louder, research scientists from Nuance had already been looking to an alternative test for intelligence, called the Winograd Schema Challenge. The approach was proposed by Hector Levesque at the University of Toronto as a more accurate means of measuring a machine’s common sense reasoning abilities and relating it to research and advances in AI and NLP: these were also well known deficiencies of the Turing Test.

Reframing The Turing Test

While the Turing Test is based on a free-form conversation, the Winograd Schema Challenge poses a set of multiple-choice questions that implicitly draw on a human or machine’s common sense knowledge of the world. An example would be: The town councilors refused to give the angry demonstrators a permit because they advocated violence. Who advocated violence?

  • Answer A: the town councilors
  • Answer B: the angry demonstrators

For humans, the answer is clearly B. However, most AI systems lack the everyday common sense knowledge to support such simple conclusions. Nuance is currently sponsoring a yearly competition to motivate research and development of intelligent systems that can pass the Winograd Schema Challenge. At the upcoming AAAI 2015 national conference on AI, it represents the first alternative proposal to be pursued at the AAAI workshop on ‘Beyond the Turing Test’.

At a time in which public interest in AI is growing, however, expectations must be tempered to reflect current technological limitations. For instance, last year’s movie ‘HER’, focused on the relationship between a man and his operating system Samantha. Samantha was conversational, could fully understand the context of conversation, and as she evolved continued to become more intelligent in ways that far exceed the current state of the art. Samantha was able to take on complex tasks and engage in flexible reasoning without any obviously predetermined responses, as well as experience emotion.

What is currently possible, however, are virtual personal assistants that can interact with us in natural ways through language and can simplify our daily lives by reducing some of our everyday tedium in well-defined domains. Current speech understanding systems are fairly robust at handling individual queries and they are becoming more adept at participating with us in extended conversations.

The Amplification Of Human Intelligence

At Nuance, we are developing next generation conversational interfaces that exploit the efficient properties of language: by maintaining a context of the past conversation, such systems can correctly interpret user queries that are more compact. Conventional ‘one-shot’ systems of the sort we routinely use for web search require that a user be explicit regarding all of the information relevant to a query; and successive queries cannot be related in any way. This makes interactions more tedious than they need to be. We are also seeing the lexicon and range of understanding for such virtual personal assistants continuing to expand at an exponential rate, leveraging Big Knowledge as well as Big Data.

Systems are becoming more knowledgeable about both the social and the physical world so that their behaviour is more humanlike and less likely to cause frustration among users through stilted and unnatural conversation and to end in failure. The more they are able to experience and interact with humans, the faster they learn and the more intelligent they become. By expanding the more cognitive elements of their reasoning and by maintaining contextual awareness, systems no longer just react to a command, but rather have the ability to anticipate the needs of their users, and behave in a more responsive fashion.

Our research in NLP has also centered on merging data-driven and linguistically motivated approaches to enable machines and applications that not only recognise the spoken word, but are able to understand it, derive meaning from it and, ultimately, act on it in service of a human user. Ongoing investment is taking place at the Nuance’s research laboratory dedicated to the advancement of NLP and AI technologies, and in Natural Language research sites around the globe.

Nuance is developing additional improvements to these systems along a number of dimensions, including the introduction of more sophisticated noise processing, the use of multi-microphone arrays capable of forming directional beams that focus in on the user, and the application of voice biometrics to tell the intended user apart from interfering voices. Additionally, the idea of associating an intelligent assistant with a single device is already a thing of the past; instead, our assistants are already manifesting themselves through a variety of hardware that we utilise throughout the day.

These AI-driven virtual assistants will simplify the often-overwhelming spectrum of content, services and capabilities that we have access to through phones, PCs, tablets, TVs, cars, apps – and now watches, thermostats, and an expanding array of consumer electronics.

For example, we recently worked with international pop star Will.i.am to create ‘AneedA’, a virtual personal assistant which will be embedded into Will’s new smart band range. AnnedA is designed to simplify a user’s day, scheduling appointments, making calls, posting on Facebook and more, actioned through natural speech. Moving forward, to render these devices even more capable, intelligent and useful, it’s critical that these attributes are matched by the ability to reason and understand what you and I want, need and expect out of them.

The areas for future technical collaboration in NLP and AI are potentially far reaching, ranging from speech enabled human-robot collaborative systems to intelligent user interfaces to very large scale knowledge management. Ultimately, the real promise of AI — at least as we see it — is not the creation of artificial companions, but an Amplification of Intelligence (ours) through the creation of amazing and transformative tools.

Charles Ortiz is Senior Principal Manager of the Artificial Intelligence and Reasoning Group at the Nuance Natural Language and AI Laboratory in Sunnyvale, CA.