Experimental Comparison and Results
Our data sets are 7 sets gotten from UCI Machine Learning repository. They are having different level of class imbalance. In the experiments we want to study the accuracy and robustness of different semi-supervised classifiers in terms of different positive class ratio within the whole set. Hopefully, experimental comparisons between methods on different data sets with different class imbalance degree, would give us a reasonable complete picture of those methods’ accuracy and robustness under this specific situation. Based on our literature review of semi-supervised learning, we choose the following three kinds of semi-supervised classifier to try in this project.
For each data set, there are various labeled set sizes to be tested: {5, 10, 20, 30, 40, 60, 80, 100}. For each labeled set size l tested, we perform 10 trials. In each trial, we randomly sample labeled data from the entire dataset, and use a fixed number of items from the rest items as unlabeled data, see data sets for details. We use error rate, average error rate and AUC area to measure the performance of each case (averaged over 10 trials). If any class is absent from the sampled labeled set, we redo the sampling. Note: All features of our data sets are numerical features.
F Balanced error rate (BER = the average of the error rate on positive class examples and the error rate on negative class examples).
If there are fewer positive examples, the errors on positive examples will count more.
F Error rate: errors on positive and negative examples are penalized in the same way.
F The area under the ROC curve (AUC)
· T. Joachims.(1999). Transductive inference for text classification using support vector machines. Proceedings of the 16th International Conference on Machine Learning, pages, 1999
Trasductive SVM attempts to maximize the classification margin on both labeled and unlabeled data, while classifying the labeled data as correctly as possible. The discriminative method imposed fewer restrictions on the data model than do Gaussian mixture models. The intuition behind is that they assume decision boundaries lie between classes in low-density regions of instance space, and that unlabeled examples help find these areas.
· We first normalize the data’s each dimension into [-1, 1] (hard normalize)
· Semi-supervised learning using Gaussian Fields and Harmonic Function
i. X. Zhu, et al. (2003) Semi-Supervised learning using Gaussian Fields and Harmonic Functions. ICML 2003
· We first soft normalize the data’s each dimension into (mean 0, stderr 1)
· David J. Miller, et al.(1996). A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data. NIPS 1996
· We hard normalize the features before applying the algorithm ( but this is not essential ) ( Hard normalize into range of [-1, 1])
· Summary