
An NSF funded Collaborative Project # III-COR 0704628 & 0704689
Project Personnel:
PI at Carnegie Mellon University: Yiming Yang
Students involved fully or partially (Current): Abhay Harpale, Abhimanyu Lad, Konstantin Salomatin, Siddharth Gopal
Students involved fully or partially (Past): Henry Shu, Subramaniam Ganapathy
Co-PI at University of Pittsburgh: Daqing
He (Project Website)
Project Goals and Objectives
The goal of this project is to develop new and advanced technologies for adaptive filtering -- the problem of learning and adapting to a user's information needs "on-the-fly". We propose a new framework called "Enriched Vector Space Model" (EVSM) that allows a rich representation of user's interests in terms of queries, entities (person names, locations, dates), topical categories (politics, crime, economics), implicit and explicit feedback received from the user. Such user profiles can be used to perform more intelligent and personalized information filtering for each user. The joint representation of multiple user profiles in EVSM enables the discovery of intra- and inter-object similarities among users, queries, entities, and categories, based on their content as well as interrelationships (see the attached figure). Thus, the notion of relevant information can be shared among users with similar information needs. A matrix representation of multi-user profiles also allows the application of standard dimensionality reduction techniques to discover latent clusters of users or queries, as well as the application of link analysis to identify important users and authoritative sources of information.
Research Challenges
Challenge 1: How to bridge the gap between adaptive filtering (AF) and collaborative filtering (CF)?
Current AF research, while focusing on incremental learning of topics from sparse training examples, does not take into account the possibility of information sharing among multiple users and cannot leverage parallel, multi-user relevance feedback. Current work in CF, on the other hand, focuses on optimal use of multi-user information in item search but the solutions are primarily designed for batch learning with large collections of training examples, a condition that is difficult to meet in AF applications. Bridging the technical gap between CF and AF requires the development of new algorithms that can learn incrementally and efficiently with extremely sparse training examples, and that can effectively "borrow" information from similar users when predicting the need of a particular user.
Challenge 2: How to develop a new framework for leveraging multi-type relevance feedback from different users?
A user can express his or her interest using any combination of a few keywords (as a query), a list of Named Entities (as the clues for tracking related events), a category or several categories in a domain-specific classification hierarchy (as the scope of navigation), and relevance judgments on system-selected documents (as on-topic and off-topic examples). Moreover, a user's interest is subject to change, depending on context.
Challenge 3: How to enable multi-level adaptive filtering by using hierarchical text categorization?
Categories (or topics) have been commonly used by humans and by systems to organize documents and retrieved information. Some categories are generic, stable and relatively easy to identify, such as "Sports" and "Politics", the common subjects of newswire stories and TV broadcast news. Some other topics are more specific, short-lasting or fast-evolving, such as "Clinton's Gaza trip" and "Operation screaming eagle" (in Iraq). From the user's point view, automatic topic spotting of both types would be useful: broader topics are useful for discarding big chunks of irrelevant documents, and narrower topics are useful for focused tracking of event-level interests. Independent learning of such topics, while common in current AF systems, is suboptimal since domain knowledge reflected in the taxonomy is ignored. This problem is exacerbated when topics are sparsely populated with positive labeled examples, which is often the case in adaptive filtering.
Challenge 4: How to develop an evaluation framework for testing user-centric adaptive and collaborative filtering?
Existing evaluation frameworks are either for adaptive filtering or for collaborative filtering, but there is no single framework suitable for testing both. In addition, no real users are represented in these frameworks. The new evaluation framework should possess several key features. It should contain explicit representation of adequate number of real users and their interests in details. It also should represent temporal aspect of the user's interests and relevance judgments. The content of the document collection in the framework should be of interest for people to access.
Publications
2009
·
Abhimanyu
Lad, Yiming Yang, Rayid Ghani, Bryan Kisiel, "Toward Optimal Ordering of
Prediction Tasks", International Conference on Data Mining (SDM09)
·
Konstantin
Salomatin, Yiming Yang, Abhimanyu Lad, "Multi-field Correlated Topic
Modeling", SIAM International Conference on DataMining (SDM09)
· Yiming Yang, Abhimanyu Lad, Henry Shu, Bryan Kisiel, Chad Cumby, Rayid Ghani, Katharina Probst, "Graph Structure Learning for Task Ordering", In proceedings of the 11th International Conference on Enterprise Information Systems (ICEIS), 2009
2008
·
Abhay
Harpale and Yiming Yang, "Personalized Active Learning for Collaborative
Filtering". In Proceedings of the 31st Annual International ACM SIGIR
Conference (2008).
·
Jian
Zhang, Zoubin Ghahramani and Yiming Yang, "Flexible Latent Variable Models
for Multi-Task Learning", In Journal of Machine Learning (2008).
· Daqing He, Peter Brusilovsky, Jaewook Ahn, Jonathan Grady, Rosta Farzan, Yefei Peng, Yiming Yang, Monica Rogati, "An Evaluation of Adaptive Filtering in the Context of realistic Task-based Information Exploration", Information Processing and Management, Vol. 44(2), 2008.
2007
·
Abhimanyu
Lad, Yiming Yang, "Generalizing from Relevance Feedback using Named Entity
Wildcards", Conference on Information and Knowledge Management (CIKM).
(2007).
Broader Impacts
Our new approach goes substantially beyond current approaches to adaptive filtering. If successful, it will make a substantial contribution to the fundamental basis of AF technology and strongly impact practical applications. It could also augment the capabilities of web-based and enterprise search engines, giving them a major adaptive and personalization dimension. This project will also play a valuable role in education, by funding and training both graduate and undergraduate students in the study that brings together information retrieval, machine learning, software engineering and scientific experimentation methodology.
Point of Contact
Further information please contact Yiming Yang
Latest Quarterly Reports
Date of Last Update
July 1, 2009