nsf-pr1                                                      cmu_logo.gif

User-centric, Adaptive and Collaborative Information Filtering

An NSF funded Collaborative Project # III-COR 0704628 & 0704689

 

Project Personnel:

PI at Carnegie Mellon University: Yiming Yang

Students involved fully or partially  (Current): Abhay Harpale, Abhimanyu Lad, Konstantin Salomatin, Siddharth Gopal

Students involved fully or partially (Past): Henry Shu, Subramaniam Ganapathy

Co-PI at University of Pittsburgh: Daqing He (Project Website)

 

Project Goals and Objectives

The goal of this project is to develop new and advanced technologies for adaptive filtering -- the problem of learning and adapting to a user's information needs "on-the-fly". We propose a new framework called "Enriched Vector Space Model" (EVSM) that allows a rich representation of user's interests in terms of queries, entities (person names, locations, dates), topical categories (politics, crime, economics), implicit and explicit feedback received from the user. Such user profiles can be used to perform more intelligent and personalized information filtering for each user. The joint representation of multiple user profiles in EVSM enables the discovery of intra- and inter-object similarities among users, queries, entities, and categories, based on their content as well as interrelationships (see the attached figure). Thus, the notion of relevant information can be shared among users with similar information needs. A matrix representation of multi-user profiles also allows the application of standard dimensionality reduction techniques to discover latent clusters of users or queries, as well as the application of link analysis to identify important users and authoritative sources of information.

Research Challenges

Challenge 1: How to bridge the gap between adaptive filtering (AF) and collaborative filtering (CF)?

Current AF research, while focusing on incremental learning of topics from sparse training examples, does not take into account the possibility of information sharing among multiple users and cannot leverage parallel, multi-user relevance feedback. Current work in CF, on the other hand, focuses on optimal use of multi-user information in item search but the solutions are primarily designed for batch learning with large collections of training examples, a condition that is difficult to meet in AF applications.  Bridging the technical gap between CF and AF requires the development of new algorithms that can learn incrementally and efficiently with extremely sparse training examples, and that can effectively "borrow" information from similar users when predicting the need of a particular user.

Challenge 2: How to develop a new framework for leveraging multi-type relevance feedback from different users?

A user can express his or her interest using any combination of a few keywords (as a query), a list of Named Entities (as the clues for tracking related events), a category or several categories in a domain-specific classification hierarchy (as the scope of navigation), and relevance judgments on system-selected documents (as on-topic and off-topic examples). Moreover, a user's interest is subject to change, depending on context.

Challenge 3: How to enable multi-level adaptive filtering by using hierarchical text categorization?

Categories (or topics) have been commonly used by humans and by systems to organize documents and retrieved information. Some categories are generic, stable and relatively easy to identify, such as "Sports" and "Politics", the common subjects of newswire stories and TV broadcast news. Some other topics are more specific, short-lasting or fast-evolving, such as "Clinton's Gaza trip" and "Operation screaming eagle" (in Iraq). From the user's point view, automatic topic spotting of both types would be useful: broader topics are useful for discarding big chunks of irrelevant documents, and narrower topics are useful for focused tracking of event-level interests. Independent learning of such topics, while common in current AF systems, is suboptimal since domain knowledge reflected in the taxonomy is ignored. This problem is exacerbated when topics are sparsely populated with positive labeled examples, which is often the case in adaptive filtering.

Challenge 4: How to develop an evaluation framework for testing user-centric adaptive and collaborative filtering?

Existing evaluation frameworks are either for adaptive filtering or for collaborative filtering, but there is no single framework suitable for testing both. In addition, no real users are represented in these frameworks. The new evaluation framework should possess several key features. It should contain explicit representation of adequate number of real users and their interests in details.  It also should represent temporal aspect of the user's interests and relevance judgments. The content of the document collection in the framework should be of interest for people to access.    

Publications

2009

·         Abhimanyu Lad, Yiming Yang, Rayid Ghani, Bryan Kisiel, "Toward Optimal Ordering of Prediction Tasks", International Conference on Data Mining (SDM09)

·         Konstantin Salomatin, Yiming Yang, Abhimanyu Lad, "Multi-field Correlated Topic Modeling", SIAM International Conference on DataMining (SDM09)

·         Yiming Yang, Abhimanyu Lad, Henry Shu, Bryan Kisiel, Chad Cumby, Rayid Ghani, Katharina Probst, "Graph Structure Learning for Task Ordering", In proceedings of the 11th International Conference on Enterprise Information Systems (ICEIS), 2009

2008

·         Abhay Harpale and Yiming Yang, "Personalized Active Learning for Collaborative Filtering". In Proceedings of the 31st Annual International ACM SIGIR Conference (2008).

·         Jian Zhang, Zoubin Ghahramani and Yiming Yang, "Flexible Latent Variable Models for Multi-Task Learning", In Journal of Machine Learning (2008).

·         Daqing He, Peter Brusilovsky, Jaewook Ahn, Jonathan Grady, Rosta Farzan, Yefei Peng, Yiming Yang, Monica Rogati, "An Evaluation of Adaptive Filtering in the Context of realistic Task-based Information Exploration", Information Processing and Management, Vol. 44(2), 2008. 

2007

·         Abhimanyu Lad, Yiming Yang, "Generalizing from Relevance Feedback using Named Entity Wildcards", Conference on Information and Knowledge Management (CIKM). (2007).

 

Broader Impacts

Our new approach goes substantially beyond current approaches to adaptive filtering. If successful, it will make a substantial contribution to the fundamental basis of AF technology and strongly impact practical applications. It could also augment the capabilities of web-based and enterprise search engines, giving them a major adaptive and personalization dimension. This project will also play a valuable role in education, by funding and training both graduate and undergraduate students in the study that brings together information retrieval, machine learning, software engineering and scientific experimentation methodology.

Point of Contact

Further information please contact Yiming Yang

Latest Quarterly Reports

Date of Last Update

July 1, 2009