Automatic Construction of Multilingual Name Dictionaries
Automatic Construction of Multilingual Name Dictionaries
Machine translation and other natural language processing systems often experience performance loss when processing texts with unknown words, such as proper names. Proper name dictionaries are rare and can never be complete because new names are being made up all the time. A solution to overcome this problem would be to recognize and mark a named entity in text before translating it and to carry over the named entity untranslated. This chapter presents a method and a system that recognizes named entities of the types “person” and—to some extent—“organization” in multilingual text collections, and automatically identifies which of the newly identified names are variants of a known name. By doing this for nineteen languages and in the course of years, a multilingual name dictionary has been developed containing over 630,000 names and over 135,000 known variants, with up to 170 multilingual variants for a single name. The automatically generated name dictionary is used daily in the publicly accessible multilingual news aggregation and analysis system NewsExplorer.
Keywords: machine translation, name recognition, multilingual recognition, proper names, language processing systems NewsExplorer
MIT Press Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.
Please, subscribe or login to access full text content.
If you think you should have access to this title, please contact your librarian.
To troubleshoot, please check our FAQs, and if you can't find the answer there, please contact us.