- Title Pages
- Series Foreword
- Preface
-
1 Introduction to Semi-Supervised Learning -
1 A Taxonomy for Semi-Supervised Learning Methods -
3 Semi-Supervised Text Classification Using EM -
4 Risks of Semi-Supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers -
5 Probabilistic Semi-Supervised Clustering with Constraints -
6 Transductive Support Vector Machines -
7 Semi-Supervised Learning Using Semi-Definite Programming -
8 Gaussian Processes and the Null-Category Noise Model -
9 Entropy Regularization -
10 Data-Dependent Regularization -
11 Label Propagation and Quadratic Criterion -
12 The Geometric Basis of Semi-Supervised Learning -
13 Discrete Regularization -
14 Semi-Supervised Learning with Conditional Harmonic Mixing -
15 Graph Kernels by Spectral Transforms -
16 Spectral Methods for Dimensionality Reduction -
17 Modifying Distances -
18 Large-Scale Algorithms -
19 Semi-Supervised Protein Classification Using Cluster Kernels -
20 Prediction of Protein Function from Networks -
25 Analysis of Benchmarks -
22 An Augmented PAC Model for Semi-Supervised Learning -
23 Metric-Based Approaches for Semi-Supervised Regression and Classification -
24 Transductive Inference and Semi-Supervised Learning -
25 A Discussion of Semi-Supervised Learning and Transduction - References
- Notation and Symbols
- Contributors
- Index
Risks of Semi-Supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers
Risks of Semi-Supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers
- Chapter:
- (p.56) (p.57) 4 Risks of Semi-Supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers
- Source:
- Semi-Supervised Learning
- Author(s):
Cozman Fabio
Cohen Ira
- Publisher:
- The MIT Press
This chapter presents a number of conclusions. Firstly, labeled and unlabeled data contribute to a reduction in variance in semi-supervised learning under maximum-likelihood estimation. Secondly, when the model is “correct,” maximum-likelihood methods are asymptotically unbiased both with labeled and unlabeled data. Thirdly, when the model is “incorrect,” there may be different asymptotic biases for different values of λ. Asymptotic classification error may also vary with λ—an increase in the number of unlabeled samples may lead to a larger estimation asymptotic bias and to a larger classification error. If the performance obtained from a given set of labeled data is better than the performance with infinitely many unlabeled samples, then at some point the addition of unlabeled data must decrease performance.
Keywords: semi-supervised learning, maximum-likelihood estimation, asymptotic classification error, estimation asymptotic bias, unlabeled data
MIT Press Scholarship Online requires a subscription or purchase to access the full text of books within the service. Public users can however freely search the site and view the abstracts and keywords for each book and chapter.
Please, subscribe or login to access full text content.
If you think you should have access to this title, please contact your librarian.
To troubleshoot, please check our FAQs, and if you can't find the answer there, please contact us.
- Title Pages
- Series Foreword
- Preface
-
1 Introduction to Semi-Supervised Learning -
1 A Taxonomy for Semi-Supervised Learning Methods -
3 Semi-Supervised Text Classification Using EM -
4 Risks of Semi-Supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers -
5 Probabilistic Semi-Supervised Clustering with Constraints -
6 Transductive Support Vector Machines -
7 Semi-Supervised Learning Using Semi-Definite Programming -
8 Gaussian Processes and the Null-Category Noise Model -
9 Entropy Regularization -
10 Data-Dependent Regularization -
11 Label Propagation and Quadratic Criterion -
12 The Geometric Basis of Semi-Supervised Learning -
13 Discrete Regularization -
14 Semi-Supervised Learning with Conditional Harmonic Mixing -
15 Graph Kernels by Spectral Transforms -
16 Spectral Methods for Dimensionality Reduction -
17 Modifying Distances -
18 Large-Scale Algorithms -
19 Semi-Supervised Protein Classification Using Cluster Kernels -
20 Prediction of Protein Function from Networks -
25 Analysis of Benchmarks -
22 An Augmented PAC Model for Semi-Supervised Learning -
23 Metric-Based Approaches for Semi-Supervised Regression and Classification -
24 Transductive Inference and Semi-Supervised Learning -
25 A Discussion of Semi-Supervised Learning and Transduction - References
- Notation and Symbols
- Contributors
- Index