Email Prioritization

Fan Li
Language Technologies Institute
Carnegie Mellon University
Sepetember 2003
Advanced IR Lab, 11-743
Instructors: Jamie Callan, Yiming Yang

 

Contents

  • Initial Project Presentation
  •  

    Abstract

    This project is to fulfill the course requirement of Advanced IR Lab. As emails become more and more important in people's daily life, it becomes very important to automatically classify email into hierarchical folders. 
    This work isnot only closely related to hierarchical text categorization, 
    but also closely related to the properties of email data.In this project, we are tring to solve there problems:First, whether the user-specific hierarchical folders can be learned effectively by classifiers. Second, what kind of classifiers is most suitable for this task. Third, how to use the special properties of email data to improve the classification performance.
     
     

     

    Introduction

    Hierarchical email classificaiton is a practical and interesting problem.
    It is closely related to hierarchical text categorization(a relatively well
    studied area), but not limited to that. Since we are dealing with
    email data, we need to consider the special properties of emails.
    The first question is : Is this task feasible? Or in other words, 
    whether the user-specific defined hierarchical structure
    is learnable? Intuitively, if a user defines different folder based
    on email topics, the classification task may be relatively easy.
    However, if a user just defines folders based on its priority extent,
    the task may be much harder.

    It is also interesting to consider what kind of classifiers can do a good job
    in hierarchical email classification.  The key point is that, diferent users may define folders according to very different strategies.  some users may
    define folders according to senders, others may define folders according to
    dates, or email contents.   Thus it is natural to to investigate
    which kind of classifiers(rule based or bag of words based) can do a better job, or how can we combine different classifiers in a effective way.

     

    Algorithm

    This is the algorithm: 

     

    Dataset

    Algorithm ...... 

     

    Experimental Results

    Results ...... 

     

    Bibliography
     
    [1] D. M. Pennock, E. Horvitz, S. Lawrence and C. L. Giles. Collaborative Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based approach. Uncertainty in Artificial Intelligence, 2000.