We downloaded our data from UCI data repository. 7 of them are from UCI Machine learning Repository. The data sets are listed in the following table and with different class imbalance degree. For each data set, there are various labeled set sizes to be tested: {5, 10, 20, 30, 40, 60, 80, 100}.
Table1.
The data description
|
# |
Dataset |
%
Minority Examples |
Dataset
Size |
FEATURE /
Class Situation |
CLASS USEd |
Unlabel
data size in EAch
Experimental Run |
|
1 |
Letter-a |
3.9 |
20000 |
16 numeric (integer) features 17 classes |
Letter A against all other letter |
2000 |
|
2 |
Pendigits |
8.3 |
7494 |
16 attributes (All input attributes are integers 0..100) 10 classes |
Digits 0 against all other digits |
2000 |
|
3 |
Letter-a-subset |
17.0 |
4639 |
16 numeric (integer) features 17 classes |
Letter A against Letter BCDEF |
2000 |
|
5 |
Yeast |
28.9 |
1484 |
8 attributes (numerical ) 10 classes |
NUC against all
the other localizations (429 positive) |
1350 |
|
6 |
Pima |
34.7 |
768 |
8 attributes ( numerical ) 2 classes |
( 268 positive) |
650 |
|
7 |
Bupa |
42.0 |
345 |
6 attributes (numerical ) 2 classes |
(145 positive) |
240 |
|
8 |
Pendigits -Subset |
50.0 |
1438 |
16 numeric (integer) features 17 classes |
Digit 3 against digits 9 (719 positive) |
1300 |