User-centric, Adaptive and Collaborative Information Filtering

An NSF funded Collaborative Project # III-COR 0704628 & 0704689

Quarterly Report: 2nd Year, 3rd Quarter

 

Title: Multi-Field Correlated Topic Modeling

Summary: There is a great need in practical applications for analyzing and maintaining data collections where each entity (object or event) consists of multiple fields with different but interrelated contents. For example, in a troubleshooting scenario each record may contain several free-text fields, such as a brief problem description by a user, an initial analysis of the problem by a technical specialist, and a detailed technical description by the expert(s) who fixed the problem. Other fields in the record may include related information in the forms of nominal, categorical, ordinal and numerical attributes, such as by whom the problem was reported, what level of urgency was specified, which expert(s) was assigned etc. For each new troubleshooting scenario multiple interrelated tasks must be solved, such as finding similar past cases (retrieval) or automatically determining severity of the problem, category, right experts etc (prediction). The main challenge in this scenario is to model the dependencies among multiple fields so that the rich connections among tasks can be effectively leveraged.

1)      We have developed a new multi-field correlated topic modeling approach to enable modeling such multi-field data in a global Bayesian graphical structure.

 

2)      We have developed a variant of the mean-field variational algorithm as the approximation procedure to perform inference and parameter estimation.

 

3)      We have evaluated our approach on the real troubleshooting data. Our approach outperforms state of the art Correlated Topic Modeling in terms of likelihood (Figure 1.) and predictive perplexity (Figure 2.)

 

 

Figure 1 shows the likelihood of two modifications of our approach (mf-CTM.dt, mf-CTM.ct) and state of the art baseline (CTM) as a function of the number of latent topics (the parameter of the algorithm that needs to be tuned).

 

 

Figure 2 shows the predictive perplexity (the lower the better) of two modifications of our approach (mf-CTM.dt, mf-CTM.ct) and state of the art baseline (CTM). Perplexity reflects the ability of the model to predict unseen fields (unsolved tasks) given the observed fields (solved tasks).