KEN JENNINGS AND BRAD RUTTER WERE TRICKED BY WATSON — THEY SHOULD DEMAND A REMATCH

Published in

codeburst

53 min readDec 27, 2017

This article is going to be a bit lengthy as I’m going through each of the questions (sorry answers) given in the Jeopardy match between Ken Jennings, Brad Rutter and Watson.

I show here that a simple algorithm using some primitive word manipulation and Google search can pretty much equal Watson’s performance. After analyzing all the questions one can see that Watson is only really good at fast look up and if all the questions required actual understanding then it would have scored zero.

I want to point out that I’m not saying these kinds of machines are not useful, they are and will be even more so in the future. However, I am emphasizing that these are just machines and they are not really intelligent in the sense of having understanding of the situation and being able to make deductions and conclusions based on context, knowledge and common sense. They are just machine that mechanically arrive at an answer whether it makes sense or not.

I have read articles where people have been gushing over Watson’s performance, saying it understood nuance and metaphor. I think these people have been deceived. As we shall see it is nothing like that. It is just a clever mechanical process. A party trick.

Just for a start let me say that I’m not going to go along with the Jeopardy trope of them giving the answer and you supply the question. It just gets too confusing when you are trying to discuss these questions in detail. So I’m calling them questions and what the contestant’s respond with are answers. I’m also dropping the ‘Who is’ and ‘Where is’ or whatever is needed to twist the answer into a question. None of this really changes anything.

I describe here an algorithm (step by step procedure) to answer Jeopardy questions. There is nothing intelligent here, it is just mechanical. This kind of algorithm is just processing groups of characters separated by spaces and punctuation marks i.e. groups of characters that we humans call words. The algorithm processes the question words, and attempts to comes up with a list of possible answers (e.g. the list could be people , which of course means famous people including fictional characters) the algorithm then goes through each item in the list and calculates a numeric score for it. After the whole list has been processed the scores are compared to see if one of them is significantly better than the others.

Now to show how mechanical Watson is and how mechanical its answers were we are going to simulate such an algorithm using simple Google searches instead of specialized high performance databases that Watson undoubtedly uses. We could actually build software to implement it, but that’s a lot of work and doesn’t really gain us anything. It is enough to show how the algorithm works and be convinced that given a bit of programming and the right hardware it would be able to compete in a real Jeopardy game.

We are going to base our simulated algorithm on Google search results and the score will be based on the number of times the list item we are evaluating occurs in the first page of search results. We award bonus points for different ways it can appear in the search results. Each occurrence of the item in the search results gets a number of points. For just being anywhere on the search page it gets a point, it gets bonus points for being in the actual large font underlined text that is a link to a website, and it gets added bonus points for being close to actual words from the question. No understanding here, just taking a page of search results and doing a calculation using the words in that page and the possible answer we are evaluating. (For you programming types think curl piped into grep — that kinda stuff).

For example consider one of the actual Watson match questions, “From 1911 to 1917, this Romantic Russian composed “Etudes-Tableaux” for piano”

If you search for “1911 1917 composed “Etudes-Tableaux” piano” you see that Rachmaninov occurs 9 times in the first page of Google search results. Five of those 9 occur in the actual titles of the search results. 4 of those title occurrences are the first words in the title. One of them is the full name Sergei Rachmaninov, and there are others that give an alternate spelling i.e. Rachmaninoff. We assume that in our list are any variant spellings and they are all treated equally.

Because we are going to be using Google searches for our algorithm let’s name our algorithm GoogleGuy or GG or short. Giving Watson a person’s name is a way to anthropomorphize the algorithm and trick us into using common thinking patterns that we use for people. That helps us view it as a sentient being with reasoning power. So, being just as shameless as IBM, we will do the same and see how GG compares to Watson.

So first how do we come up with a list. This is done by looking for what we will call collection words. (I’m avoiding the word category to avoid confusion with the Jeopardy categories that group the questions.) These are words that describe something that can be listed. For example words for occupations — architect, rapper, painter, poet, author etc. There are collections of things like paintings, frescoes, common phrases, cities, rivers etc. There are more abstract collections like art periods, eras (think bronze age, stone age) and philosophies/movements. All of these words can be associated with a collection. We remove possessives and plurals so that man, man’s, and men all lead to the list of people. We assume that GG has a list of such collection words, and for each of these words it has an associated collection of things they describe. So for example if the collection word is ‘lakes’ or ‘lake’ then we have an associated list of all the well known lakes in the world. If we wanted to actually generate such collections we can get them via a Google search. If you search for ‘list of lakes’ you get a Wikipedia article for a list of lakes that gives exactly the kind of collection we want.

So let us describe GG’s algorithm in detail. You see I don’t want to wave my hands here and say magic happens — because it doesn’t. This is the opposite of magic. What I want to describe here is a pedestrian step-by-step procedure of how to get the Jeopardy answer by doing some simple fiddling with words, generating a bunch of Google searches and counting words in the search results. When you look at these steps you will be amazed at the simplicity.

Step 1: GG compares the words in the Jeopardy category to its list of collection words. For example the very first category in the Watson match was “LITERARY CHARACTER APB.” For an algorithm designed for Jeopardy, literary characters, would be one of the main collection words. In this case we would use the two words ‘literary character(s)’ as the word ‘character’ has many meanings which as we will see can easily confuse algorithms. Most collection words are a singular word,e.g. mountain, planet, mineral, etc. But this would be a special case as literary characters are a mainstay of Jeopardy.

If we come up with a collection word GG keeps it as a secondary collection. Below it tries to derive a collection from the words in the question but this one is the backup if that fails.

Step 2: So the next step is for GG to process the words in the question to derive a collection. If we determine a collection from the question than this overrides the collection determined from the category. This is done by a mechanical process:

a) Look for an occurrence of the word ‘this’ or ‘these’ Take the words after that word up to a punctuation mark (which includes quote marks) or a preposition (assume we have a list of prepositions) such as ‘of’, ‘from’, ‘about’, ‘over’, ‘under’ etc. Now take those words and see if any is a collection word. If it is then that’s the collection we work with.

For example consider one of the questions asked in the match:

Milorad Cavic almost upset this man’s perfect 2008 Olympics, losing to him by one hundredth of a second

So for the question above the words after ‘this’ and before the comma are “man’s perfect 2008 Olympics.” None of the words ‘perfect’, ‘2008’ or ‘Olympics’ are a collection but “man’s” is. So this question leads us to our collection of famous people.

b) If there is a quote directly before or directly after or both keep an alternate strategy of treating this question as a fill in the blank.

For example consider this other question taken from the Watson match:

“And anytime you feel the pain, hey” this guy “refrain, don’t carry the world upon your shoulders”

The word after the ‘this’ is ‘guy’ so that gives it the people collection again. There are quotes before and after the ‘this’.

So we can treat this as : “And anytime you feel the pain, hey ___ refrain, don’t carry the world upon your shoulders”

Part b) is a backup to part a). If we can’t resolve an answer using 2 a) and step 3) we take the quote from 2b) and run it through step 3 i.e. but in that instance we search the search results for occurrences of the sentence ignoring the missing word(s). If we get hits with the same missing word(s) in them return that missing word(s) as the answer.

In the example above we would find several occurrences of the words “And anytime you feel the pain, hey Jude refrain, don’t carry the world upon your shoulders” The word ‘Jude’ is the missing word in all of those occurrences. So ‘Jude’ is GG’s answer.

c) If the question starts with “It’s” or “It is” or ‘A’ then treat this as a ‘this’ or ‘these’. Only use this is a) and b) don’t work.

Step 3:

If GG has derived a collection using part 2) then it does a Google search on all combinations of the words in the question. It treats quotes as a single words and put it into searches with the quotation marks intact. It omits simple connective words like ‘the’, ‘of’, ‘and’.

GG then takes every member of the collection list (except for any that explicitly occur in the question as Jeopardy never puts answers in the question) and calculates a score for that member of the list.

The score is calculated as follows:

If the answer appears as the first part of an actual result i.e. one of the headings listed in a larger font by Google that you can click on to go to the referenced article then that occurrence counts as a 5. If it is in that link but not the first word(s) then it counts as 3. Other wise it counts as 1. If the answer you are testing occurs in the search results next to some of the words in the question give it an extra point for each such occurrence. The score is the sum of all counts for each occurrence.

If the answer has a score that is significantly higher than any other than GG will choose it otherwise don’t answer. (I’m being purposefully vague on the definition of significantly higher as I’m not sure what it really should be defined as. As we shall see in most cases it’s pretty obvious one of them scores 13 and all the others that even register are around 1 or 2. But I don’t want to say has to be 8 times bigger to twice or whatever. Let’s just be reasonable about it. If we were serious about actually implementing GG as software we could experiment and come up with a significance test that works in most circumstances.

For example of a score consider the following question from the match. “From 1911 to 1917, this Romantic Russian composed “Etudes-Tableaux” for piano” GG would, as discussed above, get the collection word of ‘Russian’. This collection word leads to GG’s list of famous people. Running them all by the search results for “1911 1917 composed “Etudes-Tableaux” piano” gives Rachmaninov a score of 29. (There was a bonus point for Rachmaninov occuring next to ‘composed’.) There are mentions of other people, Sergei Bertensson, Jay Leyda, Sophia Satina who are biographers, and “Biret” a pianist, but they don’t come close to matching Rachmaninov’s score. I’m not sure these names would even be in GG’s collection as I doubt that even the most knowledgeable Jeopardy contestant has heard of them. However if they were, their score would be a small fraction of the score for the answer ‘Rachmaninov’ which consequently would be chosen. Of course if GG was a little more sophisticated it would be sieving the collections to come down to just composers.

Step 4:

This is just a modification in the case of the Final Jeopardy. In final jeopardy there no downside to guessing. As you have already wagered an amount, leaving the question unanswered is an automatic loss. So no matter how low a probability of your answer being right, it is better to give one than none. So in this case we throw away the ‘threshold’ part that says one answer’s score has to be significantly better than the others. That was to minimize wrong answers so we didn’t get penalized all the time. In the final jeopardy you just go with whatever you’ve got.

Now before we being looking at all the questions I want to point out again that this process is completely mechanical. It has absolutely no understanding of what is going on. It’s just a way of manipulating the words and the patterns of how often they occur in close vicinity to other words to try to determine the answer.

Now we can go through all the questions that were in the jeopardy games played between Watson, Ken Jennings and Brad Rutter and see how our algorithm compares to Watson and see how mechanical its answers seem to be.

I didn’t actually do all the Google searching and count the hits in the search results for each answer. For most of them I tried a few key words from the question to convince myself that the right answer was clearly indicated by the hit count. I could have written a program to do this by generating all the word combinations for the searches and for each doing a command line Google search and piping the words from the search page through a program to figure out the score for a possible answer. Generating collections would be a lot of work if I wanted to cover the whole Jeopardy possible answer space. However, as interesting as that project looked, it didn’t seem like it would add that much to this discussion. So I left it as an exercise for the reader if you happen to be a programmer and want to fool around with this.

I should mention. If you are fooling around with Google searches to test this out there is, unfortunately, another step to the processing of the search results that is a bit artificial. There has been considerable pollution of the search results by reporting on the games between Watson, Jennings and Rutter. If you search for complete questions you will more often than not get a link to the actual Watson Jeopardy game. Even the key words from some questions trigger a flood of such results, especially the final jeopardy question that Watson got wrong. So any hits in the search page that relate to the actual game are removed as they are not real hits but artifacts of the intense interest this match has produced.

What follows is a complete list of all the questions that were asked in the Watson matches. The answer is given after the question. If the answer has an asterisk that means that Watson got it right. If there is no asterisk then Watson didn’t answer. And if after the correct answer there is another * and another answer then Watson responded with the other answer as a wrong answer.

I’m just scoring the number of right and wrong answers. The dollar value of the question is ignored. I suppose one could become more cautious with higher value questions as if you get it wrong the amount is subtracted from your score, but I couldn’t see any real understanding being gained by treating them differently as we are really interested in how does Watson compare to GG.

So let’s start. The first category is:

LITERARY CHARACTER APB

This is a category where Watson got all but one. Using step 1 we get a collection of ‘literary characters’ which leads to a people list — a list that includes all fictional characters. We remove Beowulf and Hrothgar from consideration as they occur in the question.

$200 : Wanted for a 12-year crime spree of eating King Hrothgar’s warriors; officer Beowulf has been assigned the case : Grendel*

When you do a search with King Hrothgar and Beowulf, the literary character Grendel appears 3 times on the first page.

GG :1 Watson: 1

$400 : His victims include Charity Burbage, Mad Eye Moody & Severus Snape; he’d be easier to catch if you’d just name him! : Voldemort

You might think that this question would be as easy as the first but remember Harry Potter has a ton of characters all of them somewhat major. So if we do all the word combinations of search terms a lot of the characters in Harry Potter will occur in the front page. So Voldemort will not trigger a significantly higher score than a lot of other characters. So our algorithm is not able to determine an answer for this, nor was Watson. Getting this answer depends on understanding the allusion to ‘you'd’ just name him’ which of course we know means Voldemort because in the book it is dangerous to say his name. Understanding is not something Watson does.

GG : 1 Watson: 1

$600 : Wanted for general evil-ness; last seen at the tower of Barad-dur; it’s a giant eye, folks. Kinda hard to miss : Sauron*

In this one if you Google “Barad-dur”. The literary character Sauron appears several times on the first page.

GG : 2 Watson: 2

$800 : Daily Double Wanted for killing Sir Danvers Carew; Appearance — Pale & Dwarfish; Seems to have a split personality : Hyde*

Sir Danvers Carew is one of those minor characters with a distinctive name. It is unique in that if immediately identifies the literary work as “Jekyll and Hyde” by Robert Louis Stevenson. However our algorithm while processing all characters , gets to the characters Jekyll and Hyde and does a score for each. Hyde wins because it occurs close to the words “Pale and Dwarfish” while Jekyll does not. No other literary character really registers.

GG :3 Watson: 3

$1000 : Wanted for stealing a loaf of bread in “Les Miserables”; really, really wanted, for other thefts too : Jean Valjean*

Running the literary characters leads to Jean Valjean. He just has more mentions when you search for the other words than any other literary character.

GG :4 Watson: 4

BEATLES PEOPLE

The word ‘People’ means that the answer could come from the collection of people which, of course, includes names from songs.

$200 : “And anytime you feel the pain, hey” this guy “refrain, don’t carry the world upon your shoulders” : Jude*

There is a ‘this’ in this question. Following our process you get the word guy which leads to the list of names. This overrides the collection from the category, but it is the same one. Jude is a song name, which comes up all over the search page when you put in the quoted text.

GG :5 Watson: 5

$400 : This title gal, “Children at your feet, wonder how you manage to make ends meet” : Lady Madonna*

Following our algorithm. The words after ‘this’ are ‘title gal’. Gal leads us to the list of people. Lady Madonna has lots of hits when you search for the quoted text.

GG :6 Watson: 6

$600 : “Bang Bang” his “Silver hammer came down upon her head” : Maxwell’s Silver Hammer*

I think this is probably a wrong answer and I’m not sure why Jeopardy gave Watson credit for it. The character is Maxwell not his hammer. His name is actually Maxwell Edison, from the line “Maxwell Edison, majoring in medicine.” If you follow the fill in the blanks formula you find the missing bit is “Maxwell’s Silver Hammer”. So that is what Watson answered. However, this question is a bit different in that it put the word ‘his’ in there and that indicates that the answer should be the owner of the thing that came down upon her head rather than the thing itself. Watson just returned the missing words as it has no understanding of what is really wanted.

Step 1 of GG would have produced a collection of “people”. Step 2 doesn’t work as there is no ‘this’ or ‘these’. So the collection from step 1 is used. When you examine all the people in the collection against a search on the other words which are basically the two quoted phrases the name “Maxwell Edison” appears 4 times in the first page. So GG would have returned the answer ‘Maxwell Edison’ which is the correct one.

GG : 7 Watson: 6

$800 : She “Died in the church and was buried along with her name. Nobody came” : Eleanor Rigby*

When GG gets to Eleanor Rigby in the list then there are all kinds of hits with the search for the quoted lyrics.

GG: 8 Watson: 7

$1000 : “So I sing a song of love” this woman, also the name of John’s mother : Julia

Watson missed this one and I’m not sure why. GG would have picked out the word ‘this’ and derived the collection word ‘woman’. This leads to the list of people. Now “Julia” would be in the list as a name in a song title. So I’m going to give GG credit for this one.

GG: 9 Watson: 7

OLYMPIC ODDITIES

Not really a collection to be got from the category title. Neither word leads to one.

$200 : Milorad Cavic almost upset this man’s perfect 2008 Olympics, losing to him by one hundredth of a second : Michael Phelps*

GG extracts the words after ‘this’ i.e. ‘man’s perfect 2008 Olympics”. The only thing that leads to a list is man’s which is the list is people again. If you Google Milorad Cavic the one famous person he comes up with mostly is “Michael Phelps”. In fact Michael Phelps is the only famous person he most appears close to in search results.

GG: 10 Watson: 8

$400 : In 1908 in this city U.S. flag-bearer Ralph Rose caused controversy by not lowering the flag when passing the king : London*

Now here the words after ‘this’ are ‘city U.S. flag-bearer Ralph Rose” Then there is a verb. Taking those words one at a time, the only one that leads to a list is ‘city’. If you take the list of world cities and look at the search pages from words in the question London come up more than any other city.

GG: 11 Watson: 9

$600 : A 1976 entrant in the “Modern” this was kicked out for wiring his epee to score points without touching his foe : Pentathlon*

There are no words after ‘this’ and before the comma. However there is a quote before the word ‘this’, so GG goes for a fill in the blanks. The phrase is “Modern ___”. GG looks for word combinations of that form. The phrase ‘Modern Pentathlon’ occurs many times.

GG: 12 Watson: 10

$800 : In the 2004 opening ceremonies a sole member of this team opened the Parade of Nation; the rest of his team closed it : Greece

The word after ‘this’ but before a verb is ‘team’. That’s kind of tough to get a list from. Watson is more sophisticated than GG and could have combined it with Olympic i.e. ‘Olympic team’. Trouble is there are many kinds of Olympic teams. There are the country ones but also soccer teams, hockey teams etc. There aren’t many really stand out words in the question either. Watson missed this probably because it requires more than just facts it requires understanding. We understand that Greece always starts and the host country always is the last. So when Greece was the host country it created the question’s scenario. But that is understanding neither Watson or GG have. So both GG and Watson missed this.

GG: 12 Watson: 10

$1000 : It was the anatomical oddity of U.S. gymnast George Eyser, who won a gold medal on the parallel bars in 1904 : Missing a leg**leg

There is no ‘this’ but the question starts with “It was” hence the list should be in the words ‘the anatomical oddity’ stopping at the ‘of’. GG was lost at this point. There isn’t a collection of ‘anatomical oddities’ that makes sense for Jeopardy. So GG is lost and can’t answer. Watson clearly had a list associated with anatomical. Probably parts of the body. Again it is a lack of any real understanding. A database will have all kinds of descriptions of George Eyer’s condition but for Watson the only word that correlated with anatomical was leg. The whole understanding of that an ‘anatomical oddity’ was needed was beyond it.

GG: 12 Watson: 10

NAME THE DECADE

This is a category that Watson’ got none of. The title has the word ‘Name’ in it which would leave us to think the list might be people’s names. GG would take the list of people and go through all the search results of words from each question. Nothing would really rise above the noise. So both GG and Watson miss all of these.

$200 : Disneyland opens & the Peace symbol is created : 1950's
$400 : The Empire State Building opens & the “War of the Worlds” radio broadcast causes a panic : 1930's
$600 : Klaus Barbie is sentenced to life in prison & DNA is first used to convict a criminal : 1980's
$800 : The first flight takes place at Kitty Hawk & baseball’s first World Series is played : 1900's
$1000 : The first modern crossword puzzle is published & Oreo cookies are introduced : 1910's**1920's

It is interesting in that Watson hazarded an answer for the last one. I’m not sure if it factors in the answers for previous questions. But it could have been seeing that all the answers to previous questions were decade starting years in the 20th century. That should have led to the right answer as all the dates that come up when you search for “The first modern crossword puzzle is published” are in the 1910’s. Not sure why Watson missed this, but it may have been on another logical tangent entirely. Both GG and Watson tanked on this category.

GG: 12 : Watson: 10(1) (I put the number of wrongs in brackets beside the number of rights.)

FINAL FRONTIERS

The category title doesn’t really help. There is no collection to get from it. Frontiers has too many meanings.

$200 : It’s Michelangelo’s fresco on the wall of the Sistine Chapel, Depicting the saved and the damned : The Last Judgement*

GG detects the starting ‘It is’ as so applies 2 c). So the words before the preposition ‘on’ are “Michelangelo’s fresco’. Fresco leads to a list of art works which includes all famous frescoes. That leads to evaluating “The last Judgement” which when you look at the search results of the question words, turns up many times.

GG: 13 Watson: 11(1)

$400 : From the Latin for “End”, this is where trains can also originate : Terminal**Finis

There are no words after this but before the verb ‘is’. So GG will take the quoted term “End” and combine it with a blank. So it is looking for “end___” which is total confusion and leads no where.

This is one that Watson also got wrong. I think Watson is programmed to look for translations like ‘Latin for …” or “French for …”. The translations are quite often the answer. So it appears that Watson took the translation in Latin for ‘end’ and came up with ‘Finis’. It’s kind of a shot in the dark that often works. This time it didn’t. Both Watson and GG got it wrong.

GG: 13 Watson: 11(1)

$600 : To push one of these paper products is to stretch established limits : Envelope

There is a ‘these’ in the question. The words after but before the verb ‘is’ are “paper products”. Can you really list all paper products. Is a book a paper product? Is a newspaper? GG just stalls here.

This is Watson’s worst kind of question. Something metaphoric that requires understanding. We have the metaphor ‘to push the envelope’ which comes from test pilots who were referring to the performance of an airplane when there were boundaries to its performance, maximum sharpness of a turn, maximum altitude etc. To relate this to a paper envelope is a very human thing and totally not something that you can really figure out in an algorithm.

GG: 13 Watson: 11(1)

$800 : It’s a 4-letter term for a summit; the first 3 letters mean a type of simian : Apex

GG picks up the starting “It is” the word following before the preposition ‘for’ are “a 4-letter term” Does GG associated term with words. Probably not. Term has so many meanings. GG is not clever enough to figure out 4 letter, but even running all 4 letter words by the search results would not lead to any definite answer.

For Watson it was probably the word term. It’s a very ambiguous word that requires context to determine what particular meaning is being used. Also I’m not sure how much natural language parsing Watson can do to figure out what is wanted and also there is a conjunctive clause which seems to confuse Watson most of the time.

GG: 13 Watson: 11(1)

$1000 : Tickets aren’t needed for this “event”, a black hole’s boundary from which matter can’t escape : Event horizon*

GG would take the quoted “event” and make a fill in the blank “___ event”. Which is the reverse of what is wanted i.e. “Event ___”. So GG fails on this one.

Watson gets this one. The last phrase is a definition and I think it looks for definitions to see if can come up with a term that relates to the other words. So “Event Horizon” relates to event and is defined by the last phrase.

GG: 13 Watson: 12(1)

ALTERNATE MEANINGS

None of the words in the category lead to a collection. Both are far too general. GG would miss this whole category. There is no ‘this’ and no list from the category.

$200 : 4-letter word for a vantage point or a belief : View

Watson missed this one because if you look up ‘vantage point’ there are too many words. I’m not sure if Watson was able to figure out that it should be 4 letters. Example of that kind of thing in other questions it missed indicated that it cannot. It is also hard to figure out what the ‘or’ is connecting. Is it part of one definition or an alternative definition. The simplest approach is to treat it as two definitions which is right in this case. However, even having two definitions is not so much help in this case as the phrases ‘vantage point’ and ‘belief’ have all kinds of meanings. This question just required too much understanding.

GG is just totally confused as it is beyond its simple processing.

GG: 13 Watson: 12(1)

$400 : 4-Letter word for the iron fitting on the hoof of a horse or a card-dealing box in a casino : Shoe*

If you take the or to mean two definitions then the second is just a ‘shoe’. We don’t say ‘card shoe’ or ‘dealer’s shoe’ just ‘shoe’, so that specifies the answer as you get all kinds of hits when you look for ‘iron fitting on the hoof…’ So I think Watson’s use of definitions probably got it this one. GG was confused as there was no ‘this’ or ‘these’ and no collection from the category.

GG: 13 Watson:13(1)

$600 : A piece of wood from a tree, or to puncture with something pointed : Stick*

Again just split on the or and Google the two definitions. The second definition is pretty specific. If you Google it Stick appears quickly and when you Google the first there is a high correlation to stick. So Watson got this one. GG tanked.

GG: 13 Watson: 14(1)

$800 : Stylish elegance, or students who all graduated in the same year : Class**Chic

Watson went for the definition here ‘Chic’ is stylish elegance but has nothing really to do with students. I think here the definition was a short word and on a calculated probability Watson took the algorithmic branch of giving it as the answer. GG was just lost.

GG: 13 Watson: 14(2)

$1000 : A thief, or the bent part of an arm : Crook

This is a tough one for Watson. It probably has a whole database of parts of the body, but ‘crook of the arm’ is very idiomatic and not in medical databases. Again GG was lost.

GG: 13 Watson: 14 (2)

DON’T WORRY ABOUT IT

The category doesn’t lead to any lists. So it doesn’t really help GG or Watson.

$400 : It’s just a bloody nose! You don’t have this hereditary disorder once endemic to European royalty : Hemophilia

The words after this are “ hereditary disorder once endemic to European royalty”. The list word is ‘disorder’ but it is questionable whether that would be related to a list of diseases. If it is then GG would get them all. If it isn’t then it gets none.

I was surprised Watson didn’t get this one. If you Google ‘disorder once endemic to European royalty” the word Hemophilia appears. I think however the word ‘disorder’ might be confusing. Again it has many definitions. If the question had used a word like ‘disease’ Watson would probably have got it. However you can see in the next answer that the word disorder is used again and Watson answers correctly. I would speculate that this is Watson taking into account that the previous answer is a name of a disease can try an assumption that the collection word means ‘diseases’. So from here on it’s just a matter of trying out every disease and choosing the one with the highest correlation to the words in the clue. There seems to be some merit in this speculation because after missing the first question Watson runs the table. GG has no mechanism for using information from prior answers and hence continues to be mute.

$800 : You just need a nap! You don’t have this sleep disorder that can make sufferers nod off while standing up : Narcolepsy*
$1200 : You just need a little more sun! You don’t have this hereditary lack of pigment : Albinism*
$1600 : You’re just a little stiff! You don’t have this painful mosquito-borne joint illness with a Swahili name : Dengue Fever*
$2000 : It’s just acne! You don’t have this skin infection also know as Hansen’s Disease : Leprosy*

GG: 13 Watson: 18(2)

ETUDE, BRUTE

There is not much info in the category to figure out a collection.

$400 : An étude is a composition that explores a technical musical problem; the name is French for this : Study*

There is nothing after the ‘this’ and no quote before. GG would fail at this. Watson goes for the translation “French for this” and can detect the only French word in the question is ‘étude”. That leads to the English word ‘study’. Again I think Watson goes for translations as they are usually the answer.

GG: 13 Watson: 19(2)

$800 : Heitor Villa-Lobos dedicated his “12 Etudes” for this instrument to Andres Segovia : Guitar*

GG finds the ‘this’ and the following words are ‘instrument’,as we stop at ‘to’ or ‘for’. That leads to a list of instruments. I would think that in the context of Jeopardy we could assume that instruments includes musical instruments. So all GG has to do is take a list of all musical instruments and see which one comes up associated with Heitor Vila-Lobos and “12 Etudes’. The word Guitar pops up in the search results.

GG: 14 Watson: 20(2)

$1200 : Paganini’s “24 Capricci” set the standard for etudes for this instrument : Violin*

Same approach as the last question.

GG:15 Watson: 21(2)

$1600 : Music fans wax rhapsodic about this Hungarian’s “Transcendental Etudes” : Franz Liszt*

GG: 16 Watson: 22(2)

The word after ‘this’ and before the quote marks is “Hungarian” which is a word for a nationality which leads to the list of people. Hence we look at human names associated with the title. Liszt is the one that comes up most often in search results.

GG: 17 Watson: 23(2)

$2000 : From 1911 to 1917, this Romantic Russian composed “Etudes-Tableaux” for piano : Rachmaninoff*

Same as the last question.

GG: 18 Watson: 24(2)

HEDGEHOG-PODGE

There is an animal name in the category ‘hedgehog’. That doesn’t really lead to a collection.

$400 : Some hedgehogs enter periods of torpor; the Western European species spends the winter in this dormant condition : Hibernation*

The this clause again ‘dormant condition’. If you Google it Hibernation is right up there. It associates better than any other word with the other phrases because there is a spike with ‘ periods of torpor’

GG: 19 Watson: 25(2)

$800 : There are about 50 species of the hedgehog type of this plant, so named for its spiny fruit : Cactus*

The word after ‘this’ and before the comma is ‘plant. So GG looks at list of plants. The plant hedgehog cactus which of course gets hits against the search results because of the shared word Hedgehog.

GG: 20 Watson: 26(2)

$1200 : “The Hedgehog and the Fox” is an essay on this Russian count’s view of history : Leo Tolstoy*

Looking at the ‘this’ clause we see “Russian count’s view”. Count is a title and hence means someone on the famous person list. When you Google the title the famous person name Leo Tolstoy has the most hits.

GG: 21 Watson: 27(2)

$1600 : Hedgehogs are covered with quills or spines, which are hollow hairs made stiff by this protein : Keratin*

The word after ‘this’ is ‘protein’. If we take the collection of all named proteins and run it by hedgehog and the rest of the words in the clue, we see a spike with Keratin against hedgehog quills.

GG: 22 Watson: 28(2)

$2000 : A recent bestseller by Muriel Barbery is called this “of the hedgehog” : The Elegance of the Hedgehog*

This is a case of ‘this’ followed by a quote which means fill in the blank: “___ of the hedgehog” Looking up the quoted text returns a book title. The book title is longer than the quoted phrase so just return the whole title.

GG: 23 Watson: 29(2)

THE ART OF THE STEAL

Watson no doubt didn’t get it about the category and Donald Trump and all. The keywords however are “Art’ and ‘Steal’. Steal has too many meanings to be useful, but Art points to the database of artworks.

$400 : Rembrandt’s Biblical Scene “Storm on the Sea of” this was stolen from a Boston museum in 1990 : Galilee*

Again this is a quote before the ‘this’. There is a verb directly after so we go to 2b). “Storm on the Sea of ____”. A Google search on the quoted phrase pops up the full title.

GG: 24 Watson: 30(2)

$800 : A Goya stolen (but recovered) in 2006 belonged to a museum in this city (Ohio, not Spain) : Toledo

Watson missed this one. The ‘this’ clause said city, so Watson would have wanted to return an entry in its city database. But there is no strong correlation between Toledo Ohio and the words in the clue. GG has the same problem, the collection is cities but there is not a clear spike in occurrences of the first page of a search.

GG: 24 Watson: 30(2)

$1200 : Daily Double The ancient “Lion of Nimrud” went missing from this city’s national museum in 2003 (along with a lot of other stuff) : Baghdad*

Again the ‘this’ clause indicates a city. If you search the ‘Lion of Nimrud’ it correlates highly with the city of Baghdad

GG: 25 Watson: 31(2)

$1600 : In May 2010 5 paintings worth $125 million by Braque, Matisse & 3 others left Paris’ Museum of this art period : Modern Art**Picasso

The ‘this’ clause is ‘art period’. It seems unclear why Watson came up with a person’s name instead of an art period. Probably the concept of an art period is too abstract and it had high correlations for Picasso which tipped it over the point where it is willing to guess. Of course this is a mechanical guess with no understanding that the answer is totally inappropriate for what was asked for.

For GG it is a question of whether ‘art periods’ is a collection. If it is then all the art periods would be in it including Modern Art. I’m torn whether to credit this to GG, but in the interest of fairness I won’t.

GG: 25 Watson: 31(3)

$2000 : A Titian portrait of this Spanish king was stolen at gunpoint from an Argentine museum in 1987 : Philip II**unanswered

In this case Watson started to answer but didn’t come up with anything. I would have thought that this would be one of the easy ones. The ‘this’ clause ‘King’ which would be in listed as a human title. So the answer is a famous human. One would think that Phillip II would correlate highly with the other words in the clue, but perhaps other Spanish Kings also came up making the correlation below the threshold for an answer.

For GG the words after ‘this’ are ‘Spanish king’. That certainly leads to the list of people. However once you get rid of all the answers that relate to this actual Jeopardy question in the Watson match, you get all kinds of hits for other Spanish Kings. So GG would fail also.

GG: 25 Watson: 31(3)

CAMBRIDGE

Watson would be able to look up Cambridge and see it as a city as well as a University. GG get’s nothing from this as Cambridge doesn’t really lead to any collection.

$400 : With much “Gravity”, this young fellow of Trinity became the Lucasian Professor of Mathematics in 1669 : Isaac Newton*

The ‘this’ clause says ‘young fellow”. So it is a person’s name that is wanted. The Lucasian Professor of mathematics in 1669 only really correlates to one person.

GG: 26 Watson: 32(3)

$800 : In 1626 it was “Paradise Lost” when he was “Rusticated” (temporarily expelled) from Christ’s College : John Milton

Watson missed this one. If it had started from a list of people John Milton would have correlated highly with the question words. However the subject is a bit more obscure being ‘he was..’ indicating that ‘he’ is the answer. Too subtle it would appear.

GG is also totally confused. The was no ‘this’ or ‘these’ and the category doesn’t produce a collection. So it is stumped.

GG: 26 Watson: 32(3)

$1200 : In 1546 This king founded Trinity College, the largest of Cambridge’s colleges : Henry VIII*

The ‘this’ clause is ‘king’ so a list of famous people is what is needed. If you run Henry VIII by all the possible parts of this question there are lots of hits.

GG: 27 Watson: 33(3)

$1600 : Daily Double The chapels at Pembroke & Emmanuel Colleges were designed by this architect : Sir Christopher Wren*

The ‘this’ clause is ‘architect’, so it is a person we are looking for. We can run a list of all famous people listed as being an architect by the different parts of this question and certainly Sir Christopher Wren gets a high spike of correlation.

GG: 28 Watson: 34(3)

$2000 : This “Narnia” author went from teaching at Magdalen College, Oxford to teaching at Magdalene College, Cambridge : C.S. Lewis*

The subject is author. We see from author that this will be in our author’s list. We run all the author’s by the words and C.S.Lewis is the only one with any significant correlation with “Narnia”

GG: 29 Watson: 35(3)

“CHURCH” & “STATE”

Watson can get a keyword of ‘church’. State is less helpful having so many meanings. For GG there is no real collection word.

$400 : A Dana Carvey character on “Saturday Night Live”; Isn’t that special… : The Church Lady*

The ‘A’ at the start is equivalent to ‘this’ (2c). We can look at our list of characters from TV, movies, books, songs etc. When we run “The Church Lady” by the parts of this question it definitely has a large number in the search results when searching for “Saturday Night Live” and “Dana Carvey”

GG: 30 Watson: 36(3)

$800 : To bring back someone to his original function or position : Reinstate

Watson missed this one. It’s just not something you can look up easily. The ‘or’ is confusing as it is part of the definition and not an alternative definition. So Watson didn’t answer.

GG is totally stumped as there isn’t any collection that can be determined.

GG: 30 Watson: 36(3)

$1200 : A can opener with a triangular pointed end : Church key

This is more general knowledge of common idioms. It also doesn’t give a clear list to search. One might have thought that the definition would come up with the answer but it depends on what definitions Watson has on file.

GG sees the ‘A’ at the start and comes up with a collection word of ‘opener’. That doesn’t really lead to collection. It stalls.

GG: 30 Watson: 36(3)

$1600 : It can mean to develop gradually in the mind or to carry during pregnancy : Gestate*

Again an implied ‘this’ clause by starting with it. Both the conjunctive clauses are definitions. If you look them up and compare answers Gestate is about the only choice. However GG is less sophisticated than Watson and doesn’t have steps to cover definitions.

GG: 30 Watson: 37(3)

$2000 : It’s New Zealand’s second-largest city : Christchurch*

Again an implied ‘this’. Taking all the words between the “it’s” and the colon we get “New Zealand’s second-largest city”. Taking all those words one by one the only collection word we see is city. Running all cities by the rest of the question will see a correlation of ChristChurch with ‘New Zealand’s second-largest city’. No problem.

GG: 31 Watson: 37(3)

FINAL JEOPARDY — U.S. CITIES

Its largest airport is named for a World War II Hero; its second largest for a World War II Battle : Chicago**Toronto

The category says cities. So Watson is ready with its city list. But the clues are a bit vague. If you ran every possible city by the search results of the question words, there is no real correlation to Chicago. The words are not specific enough.

This is very revealing about the capabilities of algorithms as compared to people. A person would think that there aren’t many U.S. cities that are likely to be answers. It would almost certainly be one of the major ones. New York, Chicago, LA, Miami, Detroit,Denver — and if he/she didn’t know they would guess one of those.

However, an algorithm doesn’t even understand that the word U.S. modifies the city so that it is restricted to certain cities. This is final jeopardy so the restraints on guessing are removed for GG(step 4)and no doubt for Watson. Watson probably just ran cities past the words looking for correlations. Why Toronto got a larger score than Chicago depends on the databases Watson is using and the associations set up in them. No doubt Toronto got a slightly higher score than other cities so that’s what Watson answered with. The fact that the answer makes no sense it not relevant because an algorithm has no sense. It is just a machine and doesn’t understand anything that is going on.

Unfortunately it seems near impossible to do these Google search experiments now. All you get are results about the Jeopardy question. There is intense interest in why Watson got this particular question wrong. To which I would say — this article is explaining why. Because (normally without the Jeopardy articles polluting the search results — and Watson’s internal fast lookup databases obviously predated this problem) when you search for combinations of the words in the question no city would come up because there is nothing in there about cities. You would get a flood of results on the first page about battles and WWII and heroes. Not a chance a city would occur in those search results. Watson would have had the same result. There are no real correlations between “airport largest second hero WWII battle”and a city. So Watson’s algorithm would have come to a similar result. The score was probably zero in correlation to a city. But Watson was programmed that it had to give an answer on final jeopardy as logically any answer is better than no answer. So it picked one on some backup rule or weighting or a random number generation — and came up with Toronto. To a machine when you pick randomly from a list and have no understanding of the meaning of the words you are processing — what’s the problem? it’s on the list.

GG would take the word cities in the category and start with the collection of cities. There is no ‘this’ or ‘these’ in the question and it does not start with ‘A’ or “It’s”. (It starts with ‘its’ which is the possessive form rather than a contraction.) So it takes the collection of cities. Now just like Watson GG when it searches against words like “airport largest second hero WWII battle” none of the cities are going to get a high score if any. I have no idea what city would come up. I think that no city would score any mentions on the first page. Maybe if they had some WWII memorial or ceremony. Of course that has nothing to do with the question, but this is an algorithm running here, it is following the instructions that it has been given. That means GG has to decide something as this is final jeopardy. The scores that cities are getting against those search results are basically random noise when you relate it to the question. They could all get zero. I don’t have a rule to cover this so let’s say GG just picks a city at random. If we are looking at a list of a thousand cities there is a .1% chance that GG may get lucky and come up with the right one — but I’ve got to give GG a thumb’s down on this one.

GG: 31 Watson: 37(4)

February 16, 2011’s Match

EU, THE EUROPEAN UNION

The category gives Watson something to help select subject lists.

$200 : Each year the EU selects capitals of culture; one of the 2010 cities was this Turkish “meeting place of cultures” : Istanbul*

The ‘this’ clause is ‘Turkish “meeting place of cultures”’. Meeting place means location i.e. country, city or something like that. Running the list of all known locations by the words in the sentence gets the most correlation for Istanbul.

GG: 32 Watson: 38(4)

$400 : The Schengen Agreement removes any controls at these between most EU neighbors : National borders

There is a ‘these’ clause but it is ‘ between most EU neighbors” kind of confusing and doesn’t really lead GG to a colletion of things to check. So no answer.

Watson was similarly confused and also didn’t answer.

GG: 32 Watson: 38(1)

$600 : A controversial EU subsidy program is called CAP, short for “Common” this “Policy” : Agricultural

This is an implied ‘this’ of ‘A controversial EU subsidy program’. It would be possible to have a list of EU programs but the other words there make it difficult to figure out. It seems that Watson couldn’t figure out a list to choose from same as GG.

GG: 32 Watson: 38(4)

$800 : Elected every 5 years, it has 736 members from 7 parties : Parliament*

The implied this clause is ‘elected every 5 years”. If you search on all of these terms you get a massive correlation with EU Parliament. However, GG’s algorithm doesn’t lead it there.

GG: 32 Watson: 39(14)

$1000 : As of 2010, Croatia & Macedonia are candidates but this is the only former Yugoslav republic in the EU : Slovenia**Serbia

The ‘this’ clause says ‘ the only former Yugoslav republic in the EU”. It seems that Watson’s database leads from republic to country. So running a list of countries by the parts of the sentence evidently came up with more correlation to Serbia than Slovenia. That is probably because the front part of the question lists Croatia and Macedonia which are more closely correlated to Serbia than Slovenia. To really answer this you have to understand it. Something that Watson doesn’t do. Then again neither does GG. So it would fail on this one as well.

GG: 32 Watson: 39(4)

ACTORS WHO DIRECT

This is a category that has a key word Actors, but the word ‘direct’ has a lot of different meanings. So it requires understanding to realize that direct in this context means they are the director.

$200 : “Rocky II”, “III” & “IV” : Sylvester Stallone
$400 : “Million Dollar Baby” & “Unforgiven” : Clint Eastwood
$600 : “The Pledge” & “Into the Wild” : Sean Penn
$800 : “The Great Debaters” : Denzel Washington
$1000 : “A Bronx Tale” : Robert De Niro

Watson gets none of these and doesn’t even venture an answer. When it looks up the quoted text it would be led to the movie entries in its database, but there is no algorithmic way to move to the actual director of the movie. You have to understand what the category is saying.

GG would probably come up with a collection of actors from the category. All of these movies have many actors so there is not real correlation, as it also would miss the whole point.

GG: 32 Watson: 39(4)

DIALING FOR DIALECTS

The one keyword here that leads to a list is ‘dialects’. That leads to languages which leads to a list of the world’s languages and dialects.

$200 : Sprechen is plattdeutsch? If you do, you speak the low variety of this language : German*

Not too much of a struggle here. The collection is language. When you correlate each of the languages with the words from the question you get overwhelming spikes for German. Both Watson and GG get this.

GG: 33 Watson: 40(4)

$400 : Dialects of this language include Wu, Yue & Hakka : Chinese

Watson missed this one. Again the ‘this’ clause was language, so there was a list to work from. I would have thought that the correlation of Chinese to Wu,Yue and Hakka would have been high enough to trigger an answer, but it didn’t seem to be.

I think that GG gets credit for this one. When you search for “Wu, Yue & Hakka” Chinese lights up the search results.

GG: 34 Watson: 40(4)

$600 : Vedic, dating back at least 4,000 years, is the earliest dialect of this classical language of India : Sanskrit*

The ‘this’ clause is ‘classical language’ again leading to the list of languages. Sanskrit is highly correlated with Vedic and so produce an answer threshold.

GG: 35 Watson: 41(4)

$800 : While Maltese borrows many words from Italian, it developed from a dialect of this semitic language : Arabic*

The ‘this’ clause is ‘language’ which leads to the list of languages. When you correlate Arabic with Maltese and Italian it has a higher correlation than other languages.

GG: 36 Watson: 42(4)

$1000 : Aeolic, spoken in ancient times, was a dialect of this : Ancient Greek*

GG would have to use the category collection of dialects which is languages. From what I can tell it would come up with “Greek” not “Ancient Greek”. I think the Jeopardy judges would give credit for that so I’m scoring it as a correct for GG.

GG: 37 Watson: 43(4)

BREAKING NEWS

There’s not a lot of info in the category words for Watson or GG. Both words have lots of definitions.

$200 : Before this hotel mogul’s elbow broke through it, a Picasso he owned was worth $139 million; after, $85 million : Steve Wynn*

The ‘this’ clause is ‘hotel mogul elbow’. For GG there is a confusion between collection words as both hotel and mogul could be one. If you process both collections you get a high correlation between “Steve Wynn’ and Picasso and throw the word elbow in and there is a definite answer threshold spike.

GG: 38 Watson: 44(4)

$400 : It was 103 degrees in July 2010 & Con Ed’s command center in this N.Y. borough showed 12,963 megawatts consumed at 1 time : Manhattan

The ‘this’ clause is ‘N.Y. borough’ which leads to a relatively short list of NY boroughs by just filtering the place names. But the rest just doesn’t correlate. There are no terms in there that are going to get correlations with Manhattan to be above the answer threshold. It’s because knowing the answer rests in knowing that Con Ed’s command center is is Manhattan, all the other info is not really that useful and reduces correlations down to non-answer territory.

So this leaves both GG and Watson without an answer.

GG: 38 Watson: 44(4)

$600 : Daily Double Senator Obama attended the 2006 groundbreaking for this man’s memorial, 1/2 mile from Lincoln’s : Martin Luther King

The ‘this’ clause is ‘man’. So there is a list of famous people to pick from. But when you try and correlate famous names to the rest of the clue Martin Luther King doesn’t correlate any better than other names. To answer you have to understand what is being described. It is a groundbreaking for someone’s memorial — ah Martin Luther King — however, if you just try and look for correlation to words there is no traction to be gained.

GG: 38 Watson: 44(4)

$800 : Gambler Charles Wells is believed to have inspired the song “The Man Who” did this “at Monte Carlo” : Broke the bank*

The ‘this’ clause is one of those quoted ones in this case before and after. “The Man Who ___ at Monte Carlo”. The word song leads to checking the list of all songs and the match is obvious.

GG: 39 Watson: 45(4)

$1000 : Nearly 10 million Youtubers saw Dave Carroll’s clip called this “friendly skies” airline “breaks guitars” : United Airlines*

The collection word is ‘airline’. When you take the list of all the airlines and run it past the question words the correlation between United and “friendly skies’ is way above the threshold.

GG: 40 Watson: 46(4)

ONE BUCK OR LESS

Not a helpful clue for Watson. Nothing there to really help narrow down alternatives. The word ‘buck’ has tons of meanings. Same for GG just no collection word here.

$200 : On December 8, 2008 this national newspaper raised its newsstand price by 25 cents to $1 : USA Today

The ‘this’ clause is ‘newspaper’. So there is a list of newspapers to test out. However the clue is one of those where you have to understand what is being said rather than be triggered by key words. None of those words or combinations correlate higher with USA Today than with others.

GG: 40 Watson: 46(4)

$400 : The USPS cost for mailing this, a minimum of 3 1/2 x 5 inches, is 28 cents; wish you were here! : Postcard

The this clause is empty as there is a comma immediately after the this. So there is no collection to lead to and really no way to come up with the answer unless you have knowledge about the fact that the things that USPS handles of the given size are postcards.

GG: 40 Watson: 46(4)

$600 : In 2002 Eminem signed this rapper to a 7-figure deal, obviously worth a lot more than his name implies : 50 Cent

The ‘this’ clause is rapper. So this leads to a list of famous people who are rappers. But again it is the circumstance described that is identifying not just the words. There was probably a fairly high correlation with 50 Cent, but not so much higher than other rappers to put it above the threshold for Watson. For GG it’s the same. It would process 50 Cent as part of the collection but there is no real spike of hits in the search results.

GG: 40 Watson: 46(4)

$800 : 99 cents got me a 4-pack of Ytterlig coasters from this Swedish chain : IKEA

The ‘this’ clause is ‘Swedish Chain’. That is a bit ambiguous. I suspect that’s where Watson stalled. If it had Chain as a synonym for retail company it would have led to a list of retailers and IKEA probably would have correlated highly with ‘Ytterlig” and Swedish.

For GG it’s again a matter of does ‘chain’ lead to a collection. It seems to be ambiguous, so I rule no.

GG: 40 Watson: 46(4)

$1000 : a 15-ounce VO5 moisture milks conditioner from this manufacturer averages a buck online : Alberto

The ‘this’ clause is ‘manufacturer’ which leads to a list of company names. When you search for V05 the manufacturer Alberto is all over the search results. Not sure why Watson didn’t get this. GG would.

GG: 41 Watson: 46(4)

ALSO ON YOUR COMPUTER KEYS

The category here has a keyword ‘computer’, but the questions are puns and allusions to specific keys. Not a good category for an algorithm. Watson misses all of these as would GG. It requires an understanding that what is wanted is the name of a computer key that relates to the question. An algorithm has no understanding hence a general wipe out.

$200 : Proverbially, it’s “where the heart is” : Home
$400 : A loose-fitting dress hanging straight from the shoulders to below the waist : Shift**Chemise

Here there is a ‘this’ clause at the beginning ‘ loose fitting dress’. So that leads to the list of clothing. Chemise seemed to correlate, but there was no understanding of the basic premise of the category that it should be the name of a computer key. A person would never give this answer as Chemise is just not associated with a computer keyboard. Watson just being an algorithm doesn’t care and goes for it. GG would be totally lost.

$600 : Football position that can be split or tight : End
$800 : It’s an abbreviation for Grand Prix auto racing : F1
$1000 : An additional section placed within the folds of a newspaper : Insert

GG: 41 Watson: 46(5)

NONFICTION

This category is a collection word. It indicates nonfiction books or stories.

$400 : In 2010 this former first lady published the memoir “Spoken From the Heart” : Laura Bush*

The ‘this’ clause is ‘former first lady’. First lady is a definite human position and leads to a list of famous people. When you correlate Laura Bush with “Spoken From the Heart” you get your answer threshold. So GG is good with this one as was Watson.

GG: 42 Watson: 47(5)

$800 : This book by Michael Lewis subtitled “Evolution of a Game” focused on left tackle prodigy Michael Oher : The Blind Side

The ‘this’ clause is ‘book’. Which definitely leads to a list of books. I would have thought that there would be a strong correlation between ‘The Blind Side’ and the other words especially the authors name, but it didn’t seem to for Watson. For GG the book’s title is the first title in the search results and occurs throughout.

GG: 43 Watson: 47(5)

$1200 : Daily Double The New Yorker’s 1959 review of this said in its brevity & clarity it is “unlike most such manuals, a book as well as a tool” : Elements of Style**Dorothy Parker

The subject is described by saying that it was reviewed by the New Yorker in 1959, and goes on to describe it as a book. It is hard in this case to figure out the subject and come up with a list. Watson obviously was confused. It decided the answer was a person. I suppose Dorothy Parker who published in the New Yorker and was known for brevity and clarity cleared the answer threshold. The fact that answering this with a person’s names makes no sense has no impact on the algorithm, the only thing is the answer threshold.

GG would take the collection from the category as there is no way to get one via its steps from the question. If you run all the nonfiction books past the words you get “Elements of Style”. I’m giving GG a win on this one.

GG: 44 Watson: 47(6)

$1600 : Dave Eggers not-so-modestly titled his memoir “A Heartbreaking Work of” this : Staggering Genius*

This is more standard. For GG there is a quote before the this so we follow 2b and fill in the blank “A Heartbreak Work of ___”. A general search shows that is a book with a title that matches that pattern and it is highly correlated with Dave Eggers.

GG: 45 Watson: 48(6)

$2000 : HBO’s miniseries “John Adams” was based on this author’s Pulitzer Prize-winning biography : David McCullough*

The ‘this’ clause is author. So running the author list by the other words gets a real hit for HBO miniseries “John Adams” for David McCullough.

GG: 46 Watson: 49(6)

LEGAL “E”s

$400 : In English law, it’s a title above a gentleman & below a knight; in the U.S., it’s usually added to the name of an attorney : Esquire*

The ‘this’ clause is title which leads to a list of titles. Esquire correlates to attorney highly more than any other title.

GG: 47 Watson: 50(6)

$800 : One definition of this is entering a private place with the intent of listening secretly to private conversation : Eavesdropping

The ‘this’ clause is defined by being something that can be defined. I think that Watson just didn’t understand and couldn’t really proceed. GG cannot find a collection word after ‘this’ so it too fails.

GG: 47 Watson: 50(6)

$1200 : This person is appointed by a testator to carry out the directions & requests in his will : Executor*

The ‘this’ clause is ‘person’. That leads to a list of people and also occupations and titles as they describe people as well. Executor correlates to testator ‘directions and requests in his will’.

GG: 48 Watson: 51(6)

$1600 : Daily Double This 2-word phrase means the power to take private property for public use; it’s ok as long as there is just compensation: Eminent domain*

The ‘this’ clause is ‘2-word phrase’. Now it is possible to have a list of all common phrases in English. All the way from “Apple a day” to “Zebra crossing” (common in U.K.). This would include legal terms, medical terms and other technical terms. “Eminent domain” would correlate highly with the sentence words.

GG: 49 Watson: 52(6)

$2000 : This clause in a union contract says that wages will rise or fall depending on a standard such as cost of living : Escalator*

The subject is ‘clause’. The correlation to “Escalator clause” or “Escalating clause” is high.

GG: 50 Watson: 53(6)

WHAT TO WEAR

To an English speaker the category is clearly asking for something that a person wears. i.e. clothing. However just looking at the words individually they have many meanings and don’t really lead to a collection for GG.

$400 : This plain-weave, sheer fabric made with tightly twisted yarn is also used to describe a pie or a cake : Chiffon

The list is fabrics but passing all the fabrics won’t produce a spike of search hits for ‘Chiffon’. There are a lot of plain-weave sheer fabrics. If you throw pie and cake in, the search results just don’t show lots of hits with Chiffon.

GG: 50 Watson: 53(6)

$800 : A bit longer than a cocktail dress, one hemmed to end at the shins is this beverage “length” : Tea

So the subject is beverage but it has a quote. Not the usual question pattern. Probably Watson was just confused. GG would be looking for “_ length” and there is no spike of hits for “tea length”.

GG: 50 Watson: 53(6)

$1200 : Also the name of a rope for leading cattle, this women’s backless top has a strap that loops around the neck : Halter*

The subject is ‘top’. If you put in ‘name of a rope for leading cattle’, Halter pops right up. It correlates highly with the word for a backless top. GG would have the collection word ‘top’ but would we have that as leading to a list of women’s clothing. I’m going to be conservative and say no. So GG would not get this.

GG: 50 Watson: 54(6)

$1600 : If you’re wearing wellingtons at Wimbledon, you’re wearing these : Galoshes

Again this is a bit of a non standard form. If you look for a collection to select from it is hard to find. Even if you arrive at items of clothing, boots will correlate as well as Galoshes. You have to know that Wellingtons are a brand name that English people use commonly for what Americans term Galoshes.

GG is totally out of luck. The ‘these’ occurs at the end with no collection word after it. There is no quote before it to give a fill in the blanks.

GG: 50 Watson: 54(6)

$2000 : Throw on an outfit from the “Marc by” this designer line : Marc Jacobs*

The subject is designer line. If you Gogole “Marc by”, Marc Jacobs picture pops up so there is a massive correlation to Marc Jacobs. I didn’t give any bonuses for this to GG , but if you’re looking for a name and a person’s picture pops up you can be pretty sure that’s the person you’re after.

GG: 51 Watson: 55(6)

U.S. GEOGRAPHIC NICKNAMES

GG would probably not get a collection out of the category. Nicknames is a collection but many things have nicknames. Would we have a list of nicknames of all kinds of things — possibly but probably not. So I’m saying no. Any collection to work with will have to be derived from the question.

$400 : Cape Hatteras is known as this cemetery synonym “of the Atlantic” : Graveyard

This is one of those fill in the blank variants. The words to replace are ‘cemetery synonym’ but that is confusing. So Watson missed this. GG would too, as cemetery could be a collection as could synonym but neither really give any things that will give a spike of hits by searching the other words.

GG: 51 Watson: 55(6)

$800 : Appropriately enough, this New York metropolis is “Bison City” : Buffalo*

The subject is metropolis which is another word for city. If you take a list of cities then Buffalo correlates highly to ‘Bison City’

GG: 52 Watson: 56(6)

$1200 : This town is known as “sin city” & its downtown is “glitter gulch” : Las Vegas*

Again town is another word for city, so these quoted nicknames correlate highly with Las Vegas.

GG: 53 Watson: 57(6)

$1600 : It’s known as both “the steel city” & “the iron city” : Pittsburgh

Not sure why Watson missed this one. Probably because the question is not quite as clear. If Watson had processed a list of cities, surely Pittsburgh would have correlated highly with these nicknames. GG would have similar problems. If under 2b the ‘It’s’ is taken as a ‘this’, there is no real collection word that follows.

GG: 53 Watson: 57(6)

$2000 : “The Coyote State” is an unofficial nickname of this 75,885-square-mile state : South Dakota*

The subject is State, if you pass all the US states by the Google search results you get a lot of hits with South Dakota.

GG: 54 Watson: 58(6)

MAGICAL MOUSE-TERY TOUR

The category here is a word play. It’s hard to speculate on what Watson may have got from it. GG doesn’t get anything.

$400 : Itchy (the mouse) & Scratchy (the cat) starred in “Skinless in Seattle” on a show within this Fox show : The Simpsons*

The subject is “Fox show”. That should lead to a short list. When you search for words in the question “The Simpsons’ is the only one to get lots of hits in the search results.

GG: 55 Watson: 59(6)

$800 : In 1939’s cartoon “The Pointer”, this guy got a new, more pear-shaped body & pupils were added to his eyes : Mickey*

The collection word is guy which means person but also includes literary, movie and cartoon characters. Mickey Mouse is all over the search results of the words in this question.

GG: 56 Watson: 60(6)

$1200 : This 1959 Daniel Keyes novella about Charlie Gordon & a smarter-than-average lab mouse won a Hugo Award : Flowers for Algernon*

Again the collection word is novella. When you run all the possible book titles against the search results = Bingo!

GG: 57 Watson: 61(6)

$1600 : The samplefest “The Grey Album” & the band Gnarls Barkley are 2 projects of Brian Burton, AKA this : Danger Mouse

Watson isn’t good at subjects that are somewhat obscured. The question says ‘AKA this’ which is clear to a human but unless you understand that another name for Brian Burton there is no way to figure out what the subject list is. GG has the same problem. There is no collection word after the this and no quote before it. The Jeopardy category gave nothing either.

GG: 57 Watson: 61(6)

$2000 : Maurice LaMarche found his inner Orson Welles to voice this rodent whose simple goal was to take over the world : Brain

The subject is ‘rodent’. You can run all the rodents passed the search results but none of them stand out. Even if you get to a list of rodents in film, books etc. the name Brain is just too generic to really correlate well.

GG: 57 Watson: 61(6)

FAMILIAR SAYINGS

This category leads to a list of sayings. There aren’t that many common English sayings, probably a few thousand.

$400 : Familiarity is said to breed this, from the Latin for “Despise” : Contempt*

This is a fill in the blanks kind of question ‘Familiarity is said to breed ____”. Also there is Latin for Despise. Watson does well with translations as they are just look ups. The translation come back with ‘contempt’ as one and it comes up often in the search results of question words.

For GG there is no collection word after ‘this’ and before the comma. So it would use ‘familiar sayings’ from the Jeopardy category. The saying “Familiarity breeds contempt” would light up the search results but that is what GG would return. Remember it is just an algorithm and the fact that returning the whole saying rather than the word asked for is just not logical is not a consideration. The instructions say do this and then return that. So that is what happens. Like any machine it just operates, there is no understanding just mechanics. So GG gets it wrong.

GG: 57 Watson: 62(6)

$800 : Even a broken one of these on your wall is right twice a day : Clock*

Just running a list of sayings by the search results will correlate with the word Clock. The saying is “Even a broken clock is right twice a day” There is only one word that is in the saying but not in the question. So that is the answer. However GG is not clever enough. Following ‘these’ is “on your wall”. Nothing there to give a collection, so going to the category leads it into a saying which is not right.

GG: 57 Watson: 63(6)

$1200 : If you’re one of these capable fellows, you’re unfortunately “master of none” : Jack of all trades

The subject is ‘fellow’ which is a person. If you run all the common sayings by the search result you should get a correlation to “Jack of all trades but master of none’. However it’s not really clear that Jack of all trades is the answer. It’s not one word like the previous and next questions, so Watson cannot pick out the answer, but neither can GG.

GG: 57 Watson: 63(6)

$1600 : A camel is a horse designed by this : Committee

There is really no subject list. This is one of those more complicated forms that confuse Watson and GG.

GG: 57 Watson: 63(6)

$2000 : It’s a poor workman who blames these : Tools*

If you run the sayings by the search results the right one certainly correlates. Also there is only one word that is missing from the question ‘Tools’. However GG would be confused and probably return the whole saying.

GG: 57 Watson: 64(6)

FINAL JEOPARDY — 19th CENTURY NOVELISTS

William Wilkinson’s “An Account of the Principalities of Wallachia and Moldavia” inspired this author’s most famous novel : Who is Bram Stoker?*. Watson’s wager was $17,973. Note: Ken Jenning’s written comment — “I for one welcome our new computer overlords”.

The category gives the subject as novelists. When you run Bran Stoker by the search results you get all kinds of hits for his name. So both Watson and GG get it.

GG: 58 Watson: 65(6)

There were two games, each with 12 categories of 5 questions each plus final jeopardy. That is a total of 122 questions.

Just looking at the right answers we see that Watson got 65 right for a score of 53.3%. GG got 58 right for a score of 42.6%. They differed on 14 questions. Which means they got the same results for 88.5% of the questions.

So GG pretty much did the same as Watson. The questions Watson got right GG generally got right. Watson did better than GG with 53% question correct v 43% but remember it is a multi-million dollar machine with scores of programmers and the full weight of IBM behind it. GG is a primitive algorithm cobbled together with some elementary dissection of words and Google search — but it performs 88% the same as Watson — there is not much difference in capability — just speed.

This tells us that Watson is just an algorithm. It isn’t the coming of a new age of machine dominance i.e. a ‘Computer Overlord’ as Ken Jennings said. It is just a machine that pretty much implements a fast lookup that you could achieve with standard Google searches.

When you look at all the questions Watson missed, you can see a pattern. They are the ones that required some understanding and couldn’t be answered by a mechanical process that had no awareness of what it was doing.

So I think Ken Jennings and Brad Rutter should have a rematch. This time ask questions that require understanding and aren’t just easy look ups of voluminous information. We know that computers look up information very very quickly and human’s cannot compete with that kind of speed. But, humans understand the questions and provide answers that make sense. You never saw Ken Jennings or Brad Rutter answering with a Canadian city when the question asked for a U.S. one, or a person like Dorothy Parker when the question clearly asked for the name of a book, or give an answer of a person like Picasso when the question asked for an art period. If a human did that we would think that one or two occurrences were a brain freeze, any more and we would start to worry about the person’s mental capacity.

If the questions all required understanding then Watson would score 0. I don’t think IBM would want a rematch with these kind of real questions. It is quite happy to have Watson win the game by exploiting an ability for fast look up of information. It can then bask in all the headlines that Watson wiped the floor with the two human contestants, even if this, upon examination, turns out to be far from the case. After all Watson is being pitched as a new advance that businesses can take advantage of, and there is money in that.

That is not to say that there is no value in something like Watson. These kinds of algorithms can be very helpful in a large number of cases. There are many situations that can be addressed by a machine doing fast look ups of information and processing patterns in huge data sets. Think image recognition, automated driving and analysis of medical data. These are tremendously useful, as humans are not good at finding the patterns in large masses of data, and many situations like driving and language translation turn out to be more patterned than we might have realized, making them amenable to this kind of processing. This can give the appearance of intelligence but it is absolutely not that. No one would think that GG is intelligent. It is just a simplistic algorithm on top of Google search. Yet it still pretty much matched Watson’s performance.

As I have said, if there was a rematch with the right questions then Watson would score zero. That doesn’t sound like ‘computer overlord’ to me.

codeburst

KEN JENNINGS AND BRAD RUTTER WERE TRICKED BY WATSON — THEY SHOULD DEMAND A REMATCH

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in codeburst

Written by Lance Gutteridge

Responses (2)