Classification, Language Analysis and Information Retrieval
  Language Technologies Institute, School of Computer Science  
 

Carnegie Mellon University

 
Members Group Seminars Library Projects Datasets

Datasets


All downloads free for scientific use only. Please contact Dr. Yiming Yang for any other uses. Software must not be further distributed without prior permission. If you use our software in your work, please give us credit.

Datasets

Retuers-21578Our copy of the Reuters-21578 corpus
RCV1The RCV1 corpus (aka Reuters-2001)
HooversThe Hoovers-28 and Hoovers-225 datasets
Tandem MS datasetsSigma49, Mark12 and PPK datasets

Toolkits

EvaluateOutput2Tools we use for thresholding, threshold calibration, and evaluating performance.



Please do not contact us with problems as we do not have the resources to provide individualized help for everbody.
All dowloads are provided on an as-is basis. We are not responsible for any errors in these downloads nor problems they may cause.

Classification, Language Analysis and Information Retrieval (CLAIR)
Language Technologies Institute (LTI), School of Computer Science (SCS)
Carnegie Mellon University (CMU), Pittsburgh, PA 15213, USA