Mining Patents for Parallel Corpora
Mining Patents for Parallel Corpora
Large-scale parallel corpora are indispensable language resources for machine translation (MT). However, there are only a few publicly available large-scale parallel corpora. This chapter describes a Japanese-English patent parallel corpus created from patent families filed in Japan and the United States. The parallel corpus contains about 2 million sentence pairs that were aligned automatically. This is the largest Japanese-English parallel corpus and will be available to the public after the NTCIR-7 workshop meeting.
Keywords: large-scale parallel corpora, machine translation, Japanese-English patent parallel corpus, patent families
MIT Press Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.
Please, subscribe or login to access full text content.
If you think you should have access to this title, please contact your librarian.
To troubleshoot, please check our FAQs, and if you can't find the answer there, please contact us.