Reranking for Large-Scale Statistical Machine Translation
Reranking for Large-Scale Statistical Machine Translation
Statistical machine translation (SMT) systems, which are trained on parallel corpora of bilingual text (e.g., French and English), typically work as follows: for each sentence to be translated, they generate a plethora of possible translations, from which they keep a smaller n-best list of the most likely translations. Even though the typical n-best list contains mostly high-quality candidates, the actual ranking is far from accurate. This chapter presents a novel approach to reranking the n-best list produced by an SMT system. It uses an ensemble of perceptrons that are trained in parallel, each of them on just a fraction of the available data. Experiments were performed on two large-scale commercial systems: a Chinese-to-English system trained on 80 million words and a French-to-English system trained on 1.1 billion words. The reranker obtained statistically significant improvements of about 0.5 and 0.2 BLEU points on the Chinese-to-English and the French-to-English system, respectively.
Keywords: reranking, n-best list, statistical machine translation, perceptrons, Chinese-to-English system, French-to-English system
MIT Press Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.
Please, subscribe or login to access full text content.
If you think you should have access to this title, please contact your librarian.
To troubleshoot, please check our FAQs, and if you can't find the answer there, please contact us.