Proverb: Candidate Generation
In the first stage of the system, candidates for each clue are
generated independently of grid information. Clues are seperated from
the grid and passed to a collection of expert modules. The
modules generate candidates for each clue given the target length.
Each module returns a confidence score (how sure it is that the answer
lies in its list), and a weighted list of possible answers.
For example, given the clue:
<Farrow of ``Peyton Place'' [3]: MIA>
the movie module returns:
1.0: 0.909091 MIA, 0.010101 TOM, 0.010101 KIP, ...
0.010101 BEN, 0.010101 PEG, 0.010101 RAY
The module returns a 1.0 confidence in its list, and gives higher
weight to the person on the show with the given last name, while
giving lower weight to other cast members.
Partial List of Expert Modules
- Bigram: All letter strings of the correct length are given
non-zero probability based on the pattern of letters (``st'' more
likely than ``zq''). Other ``implicit distribution'' modules include
word sequences and 4-grams.
- Word lists (WordList-Big, WordList, WordListCWDB): Ignore the clue
and return all valid words. WordList-Big contains over 2.1 million
entries.
- CWDB-specific (ExactMatch, Transformations, Partial Match):
Measure similarity between the clue and clues in the CWDB. Return the
targets of the best matching clues. IR style.
- Lexical distance (Dijkstra[1-4], Encyclopedia, LSI-Ency,
LSI-CWDB): Using a word-word similarity metric, compare the words in
the clue to a database of possible targets. Return the best matching
words. IR style.
- Database modules (Movie, Music, Geography, Writers, Compass,
Myth, WordNet, WordNetSyns, RogestSyns, MobySyns): Each uses a
specific mechanism for transforming clues into queries against
databases collected from the web.
- Syntactic (Blanks-[Books,Geo,Movies,Music,Quotes], KindOf): Fill
in the blanks (and KindOf variation) with different databases of text.
Other Module Info
- Expert modules were written by approximately ten people.
- Components written in at least 4 languages (Java, C, C++, Perl).
- Expert modules differ in coverage (clues they attempt to answer),
accuracy, and candidate-list length.
Michael Littman (mlittman@cs.rutgers.edu)
Last modified: Thu Apr 29 09:19:38 EDT 1999