| Fan
Li
Language Technologies Institute Carnegie Mellon University Sepetember 2003 |
Advanced IR Lab,
11-743
Instructors: Jamie Callan, Yiming Yang |
| This project is to fulfill the course requirement of Advanced
IR Lab. As emails become more and more important in people's daily life,
it becomes very important to automatically classify email into hierarchical
folders.
This work isnot only closely related to hierarchical text categorization, but also closely related to the properties of email data.In this project, we are tring to solve there problems:First, whether the user-specific hierarchical folders can be learned effectively by classifiers. Second, what kind of classifiers is most suitable for this task. Third, how to use the special properties of email data to improve the classification performance. |
| Hierarchical email classificaiton is a practical
and interesting problem.
It is closely related to hierarchical text categorization(a relatively well studied area), but not limited to that. Since we are dealing with email data, we need to consider the special properties of emails. The first question is : Is this task feasible? Or in other words, whether the user-specific defined hierarchical structure is learnable? Intuitively, if a user defines different folder based on email topics, the classification task may be relatively easy. However, if a user just defines folders based on its priority extent, the task may be much harder. It is also interesting to consider what kind of classifiers can do a
good job
|
| This is the algorithm: |
| Algorithm ...... |
| Results ...... |
| [1] D. M. Pennock, E. Horvitz, S. Lawrence and C. L. Giles. Collaborative
Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based approach.
Uncertainty in Artificial Intelligence, 2000.
|