Jump to ContentJump to Main Navigation
Semi-Supervised Learning$
Users without a subscription are not able to see the full content.

Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien

Print publication date: 2006

Print ISBN-13: 9780262033589

Published to MIT Press Scholarship Online: August 2013

DOI: 10.7551/mitpress/9780262033589.001.0001

Show Summary Details
Page of

PRINTED FROM MIT PRESS SCHOLARSHIP ONLINE (www.mitpress.universitypressscholarship.com). (c) Copyright The MIT Press, 2020. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in MITSO for personal use.date: 09 April 2020

Semi-Supervised Text Classification Using EM

Semi-Supervised Text Classification Using EM

Chapter:
(p.32) (p.33) 3 Semi-Supervised Text Classification Using EM
Source:
Semi-Supervised Learning
Author(s):

Nigam Kamal

McCallum Andrew

Mitchell Tom

Publisher:
The MIT Press
DOI:10.7551/mitpress/9780262033589.003.0003

This chapter explores the use of generative models for semi-supervised learning with labeled and unlabeled data in domains of text classification. The widely used naive Bayes classifier for supervised learning defines a mixture of multinomials mixture models. In some domains, model likelihood and classification accuracy are strongly correlated, despite the overly simplified generative model. Here, expectation-maximization finds more likely models and improved classification accuracy. In other domains, likelihood and accuracy are not well correlated with the naive Bayes model. Here, we can use a more expressive generative model that allows for multiple mixture components per class. This helps restore a moderate correlation between model likelihood and classification accuracy, and again, EM finds more accurate models. Finally, even with a well-correlated generative model, local maxima are a significant hindrance with EM. Here, the approach of deterministic annealing does provide much higher likelihood models, but often loses the correspondence with the class labels. When class label correspondence is easily corrected, high accuracy models result.

Keywords:   generative models, semi-supervised learning, text classification, naive Bayes classifier, multinomials mixture models, well-correlated generative model, EM

MIT Press Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.

Please, subscribe or login to access full text content.

If you think you should have access to this title, please contact your librarian.

To troubleshoot, please check our FAQs, and if you can't find the answer there, please contact us.