Skip to main content

The context you need, when you need it

When news breaks, you need to understand what actually matters — and what to do about it. At Vox, our mission to help you make sense of the world has never been more vital. But we can’t do it on our own.

We rely on readers like you to fund our journalism. Will you support our work and become a Vox Member today?

Join now

Forget the Turing test. This is a better way to measure artificial intelligence.

How can we tell if this robot has artificial intelligence?
How can we tell if this robot has artificial intelligence?
How can we tell if this robot has artificial intelligence?
Shutterstock

Ever since 1950, one of the most popular measuring sticks of artificial intelligence has been the Turing test — named after mathematician Alan Turing. The idea is that a program with some kind of artificial intelligence should be able to use text-based chatting to convince more than 30 percent of people that it’s a human being. In June 2014, researchers claimed that a chatbot named Eugene Goostman did just that.

Nowadays, however, many experts are questioning whether the Turing test is really the best test. A computer tricking people into thinking that it’s a 13-year-old is definitely an achievement — but it’s not necessarily the ideal display of true, humanlike thought.

So what would be a better test for artificial intelligence? One front-runner is an exam that relies on common sense. Specifically the test is of something called Winograd schemas. Because Winograd schemas rely on cultural knowledge, they’re super easy for people and difficult for computers.

How to test computers for common sense

Scientist baton and robot

(Shutterstock)

The test would take the form of a multiple-choice quiz of reading comprehension. But the text itself would have some very specific features. It would consist of Winograd schemas: pairs of sentences whose intended meaning can be flipped by changing just one word. They generally involve unclear pronouns or possessives. A famous example comes from Stanford computer scientist Terry Winograd:

  • ”The city councilmen refused the demonstrators a permit because they feared violence. Who feared violence?“1) The city councilmen2) The demonstrators

And:

  • ”The city councilmen refused the demonstrators a permit because they advocated violence. Who advocated violence?“1) The city councilmen2) The demonstrators

Most human beings can easily answer these questions. We use our common sense to figure out what “they” is supposed to be referring to in each case. And that common sense basically involves a combination of extensive cultural background knowledge with analytical skills. (In the first question, we can deduce that the city councilmen feared violence. In the second, the demonstrators advocated violence.)

For computers, however, these questions can be quite difficult. From a grammatical standpoint, the “they” in the sentences is technically unclear. In both questions, “they” could be either the councilmen or the demonstrators.

A computer could have access to all of Google and still not really be able to grasp that city councilmen are probably less likely to advocate violence than demonstrators. It’s simply less culturally appropriate for councilmen to do so. But you’re not going to find that in the dictionary under “city councilmen.”

Here’s some more Winograd schemas, from a growing, open collection of more than 100:

  • The trophy doesn’t fit into the brown suitcase because it’s too [small/large]. What is too [small/large]?Answers: The suitcase/the trophy.
  • Jane gave Joan candy because she [was/wasn’t] hungry. Who [was/wasn’t] hungry?Answers: Joan/Jane.
  • The woman held the girl against her [chest/will]. Whose [chest/will]?Answers: The woman’s/the girl’s

In 2011, University of Toronto computer scientist Hector Levesque proposed using a bunch of multiple-choice Winograd schemas as an alternative to the Turing test.

Levesque said that they should pick Winograd schemas that are simple for humans to solve. And they shouldn’t be Google-hackable. Basically, the computer shouldn’t be able to solve the question by only analyzing the statistical frequency of certain words appearing together in a large collection of English-language texts (aka the Internet).

So, for instance, Levesque gives the example of "The racecar zoomed by the school bus because it was going so [fast/slow]." There's probably a much higher statistical association between "fast" and "racecars" than with "school bus." So this question isn't hard enough for a computer. Instead, they suggest using a similar question comparing the school bus with a delivery truck.

Levesque has laid out several reasons why a Winograd schema test could be better than a Turing test. “A machine should be able to show us that it is thinking without having to pretend to be somebody,” he writes. “Our WS challenge does not allow a subject to hide behind a smokescreen of verbal tricks, playfulness, or canned responses.” And, unlike the Turing test, which is scored by a panel of human judges, he notes that grading a Winograd schema test is completely non-subjective.

Will computers ever pass this new Winograd schema test?

In the past, programmers have often tried to devise computers that could pass a Turing test. The Loebner Prize, for example, offers a top award of $100,000 for a chatbot that can convince judges it’s human during a five-minute period involving both text and audio.

And now there’s a new competition on the scene. The Winograd Schema Challenge will have its first annual competition in 2015 and will offer $25,000 for any computer program that can reach human levels of performance on a test of at least 40 such puzzles that the computer has never seen before. The competition is organized by computer-science nonprofit Commonsense Reasoning and funded by computer-software company Nuance Communications.

And how far away are computers from achieving this goal? Altaf Rahman and Vincent Ng, of the University of Texas at Dallas, used machine-learning techniques on 30 similar questions and got to an accuracy of 73 percent. Not bad. But any reasonably intelligent person should get 100 percent correct, so there’s still a ways to go.

Further watching:

Alan Turing did many other (arguably more) important things in his life than come up with the Turing test, including building an early computer that he used to break encrypted German messages during World War II. That work is the focus of the movie The Imitation Game, starring Benedict Cumberbatch, which was released in theaters on November 28.

Further reading:

A computer just passed the Turing test — but no, robots aren’t about to take over

8 things you didn’t know about Alan Turing

Code-breaker: The life and death of Alan Turing

See More:

More in Technology

Podcasts
Are humanoid robots all hype?Are humanoid robots all hype?
Podcast
Podcasts

AI is making them better — but they’re not going to be doing your chores anytime soon.

By Avishay Artsy and Sean Rameswaram
Future Perfect
The old tech that could help stop the next airborne pandemicThe old tech that could help stop the next airborne pandemic
Future Perfect

Glycol vapors, explained.

By Shayna Korol
Future Perfect
Elon Musk could lose his case against OpenAI — and still get what he wantsElon Musk could lose his case against OpenAI — and still get what he wants
Future Perfect

It’s not about who wins. It’s about the dirty laundry you air along the way.

By Sara Herschander
Life
Why banning kids from AI isn’t the answerWhy banning kids from AI isn’t the answer
Life

What kids really need in the age of artificial intelligence.

By Anna North
Culture
Anthropic owes authors $1.5B for pirating work — but the claims process is a Kafkaesque messAnthropic owes authors $1.5B for pirating work — but the claims process is a Kafkaesque mess
Culture

“Your AI monster ate all our work. Now you’re trying to pay us off with this piece of garbage that doesn’t work.”

By Constance Grady
Future Perfect
Some deaf children are hearing again because of a new gene therapySome deaf children are hearing again because of a new gene therapy
Future Perfect

A medical field that almost died is quietly fixing one disease at a time.

By Bryan Walsh