Information Retrieval Lab Project


Entity-Tagged Language Models for Question Classification in a QA System

Jonathan Brown (jonbrown @ cs.cmu.edu)




Introduction

Question Classification is an important task in most Question Answering systems. The entire passage retrieval and answer extraction process relies on having the correct answer type for a given question. Much research in this area has focused on employing regular expressions, hand written grammar rules, and more advanced natural language processing techniques to parse the question and determine what is being asked for. Although this has been successful, the results are not perfect, and furthermore enhancing these systems for new question types is difficult and time consuming. I apply language modeling techniques to this problem, using entity-tagged language models to classify questions according to answer type. This technique is fully automatic and requires no hand-written rules. Language models are built for each question class, and new questions are classified using these models. I compare my results with those of a Support Vector Machines question classification baseline. I also compare the language modeling technique with others presented in the literature. Throughout, I examine the effects of using very specific answer types.




Overall Tasks




Specific Tasks Completed




Results

In order to test the effects of using named entities, I built three classifiers of each type. The first classifier used only the words of the questions. The second classifier used the words of the questions plus the named entities. The third classifier replaced words with their named entities, so it used a mixture of the original words and named entities. I decided to try the second classifier because, although this was not tested by the previous language modeling classifiers, it is done in machine learning techniques. That is, in machine learning classifiers, words are not replaced with their named entities. The named entities are simply added in as additional features. It is of course possible that the words with named entities could be filtered out during feature extraction if they do not occur often, but this is not explicitly done.

The first SVM classifier, using words only, performed worse than the first language modeling classifier. The SVM classifier had 44% precision, using the fine-grained answer set of 50 types. The equivalent language modeling classifier had 48% precision. Adding named entities brought the SVM classifier up to 51% precision, and replacing the words with named entities, the third classifier, also had 51% precision. The SVM classifier correctly classified the same number of questions in both cases. It clearly improved by making use of the named entities, but having the original words for those entities neither helped nor hurt performance. The language modeling classifier, on the other hand, had a small difference between the second and third method. The addition of named entities resulted in 53% precision, and the replacement of words with named entities resulted in 54% precision. See Table 2 in the report for all of these results.

The language modeling classifier seems to perform slightly better that the SVM classifier. However, the performance of the language modeling classifiers are lower than the performance of the language modeling classifiers seen earlier (Pinto, 2002). The earlier classifiers were using a courser version of the answer set, with only seven types. Thus, it would be beneficial to compare my language modeling classifiers with the earlier ones using the courser version of this answer set. When allowing a correct answer to be any subtype of the correct course type, the precision jumps to 62% and 65%, for the addition of named entities and the replacements of words with named entities, respectively. The performance of the SVM classifier is also improved in this case. See Table 3 in the report for all of these results.

Although the precision of my language modeling classifiers did not quite reach the precision of the previously discussed language modeling classifiers, which were around 74%, this is still reasonable. Because I am using a different data set and a different set of answer types, the performance is not expected to be identical. The report does describe possibilities for increasing performance, however. The SVM classifiers are also not doing as well as those described earlier, which were around 80%. The difference between the 60% here and the 80% there is significant. However, those reaching 80% usually use extra features like part of speech and chunking information. These features can be added straightforwardly to these classifiers to improve performance.

Overall, the language modeling classifiers performed well with respect to the SVM baseline. Precision was close to that which was previously achieved on a different dataset and answer type set.

The language modeling question classification scheme also extended well to the finer answer taxonomy with 50 types. Performance degraded, but it is evident that the language-modelling scheme has the potential to use more detailed answer taxonomies.

I have also mentioned a number of items in the Future Work section of the paper that should lead to additional improvements.




Report

Entity-Tagged Language Models for Question Classification in a QA System




Primary References

See report for full list.




Other

Data set and answer type hierarchy

Data manipulation tools




© 2004, Jonathan Brown