Friday April 19 Ellen Riloff University of Utah Learning Multiple Semantic Categories Simultaneously using Collective Evidence over Extraction Patterns We will present a bootstrapping algorithm called Basilisk that learns semantic lexicons for multiple categories simultaneously. Basilisk begins with an unannotated text corpus and a small set of seed words for each semantic category, which are then bootstrapped to learn new words for each category. Basilisk uses two original ideas that distinguish it from previous bootstrapping techniques for semantic lexicon induction. (1) Basilisk gathers collective statistical information over a large body of extraction patterns. (2) Basilisk learns multiple semantic classes simultaneously, which constrains the bootstrapping process. We evaluate Basilisk on six semantic categories using the MUC-4 text corpus. Our results show that the semantic lexicons produced by Basilisk have higher precision than those produced by previous techniques, with several categories showing substantial improvement. We also present results demonstrating that both of Basilisk's contributions (collective statistics over extraction patterns, and simultaneous bootstrapping) produce performance gains independently.