Question Classification is an important task in most Question Answering systems. The entire passage retrieval and answer extraction process relies on having the correct answer type for a given question. Much research in this area has focused on employing regular expressions, hand written grammar rules, and more advanced natural language processing techniques to parse the question and determine what is being asked for. Although this has been successful, the results are not perfect, and furthermore enhancing these systems for new question types is difficult and time consuming. I apply language modeling techniques to this problem, using entity-tagged language models to classify questions according to answer type. This technique is fully automatic and requires no hand-written rules. Language models are built for each question class, and new questions are classified using these models. I compare my results with those of a Support Vector Machines question classification baseline. I also compare the language modeling technique with others presented in the literature. Throughout, I examine the effects of using very specific answer types.
Perform literature survey
Determine answer types to use
Create or find questions and classifications for use in training and testing
Determine correct answer type for each question in training and test set
Determine features to be used in classification
Implement baseline for study - SVM
Implement entity-tagged language model classification
Run baseline and experimental procedures on training set
Evaluate performance on test set
Compare to other existing methods
Write report on methods and results
Literature survey completed. See below for references. See report for summary of previous work.
Chose to use answer types used in Javelin.
Chose questions from the TREC Question Answering task for training and testing.
Hand classified 500 questions according to their answer type.
Implemented SVM baseline.
Implemented entity-tagged langage model classification technique.
Ran baseline and experimental procedures on training data.
Determined that the 500 questions were insufficient data for training these algorithms.
Began search for larger data set.
Since this would need to be pre-classified, I looked for a data set that used more detailed answer types.
Decided on a 5500 question training set and 500 question test set classified using a hierarchical answer set of 50 types.
Re-ran baseline and experimental procedures
Evaluated performance on test set. See results.
Compared to other existing methods. See report.
Wrote report on methods and results.
In order to test the effects of using named entities, I built three classifiers of each type. The first classifier used only the words of the questions. The second classifier used the words of the questions plus the named entities. The third classifier replaced words with their named entities, so it used a mixture of the original words and named entities. I decided to try the second classifier because, although this was not tested by the previous language modeling classifiers, it is done in machine learning techniques. That is, in machine learning classifiers, words are not replaced with their named entities. The named entities are simply added in as additional features. It is of course possible that the words with named entities could be filtered out during feature extraction if they do not occur often, but this is not explicitly done.
The first SVM classifier, using words only, performed worse than the first language modeling classifier. The SVM classifier had 44% precision, using the fine-grained answer set of 50 types. The equivalent language modeling classifier had 48% precision. Adding named entities brought the SVM classifier up to 51% precision, and replacing the words with named entities, the third classifier, also had 51% precision. The SVM classifier correctly classified the same number of questions in both cases. It clearly improved by making use of the named entities, but having the original words for those entities neither helped nor hurt performance. The language modeling classifier, on the other hand, had a small difference between the second and third method. The addition of named entities resulted in 53% precision, and the replacement of words with named entities resulted in 54% precision. See Table 2 in the report for all of these results.
The language modeling classifier seems to perform slightly better that the SVM classifier. However, the performance of the language modeling classifiers are lower than the performance of the language modeling classifiers seen earlier (Pinto, 2002). The earlier classifiers were using a courser version of the answer set, with only seven types. Thus, it would be beneficial to compare my language modeling classifiers with the earlier ones using the courser version of this answer set. When allowing a correct answer to be any subtype of the correct course type, the precision jumps to 62% and 65%, for the addition of named entities and the replacements of words with named entities, respectively. The performance of the SVM classifier is also improved in this case. See Table 3 in the report for all of these results.
Although the precision of my language modeling classifiers did not quite reach the precision of the previously discussed language modeling classifiers, which were around 74%, this is still reasonable. Because I am using a different data set and a different set of answer types, the performance is not expected to be identical. The report does describe possibilities for increasing performance, however.
The SVM classifiers are also not doing as well as those described earlier, which were around 80%. The difference between the 60% here and the 80% there is significant. However, those reaching 80% usually use extra features like part of speech and chunking information. These features can be added straightforwardly to these classifiers to improve performance.
Overall, the language modeling classifiers performed well with respect to the SVM baseline. Precision was close to that which was previously achieved on a different dataset and answer type set.
The language modeling question classification scheme also extended well to the finer answer taxonomy with 50 types. Performance degraded, but it is evident that the language-modelling scheme has the potential to use more detailed answer taxonomies.
I have also mentioned a number of items in the Future Work section of the paper that should lead to additional improvements.
Entity-Tagged Language Models for Question Classification in a QA System
See report for full list.
Nyberg, E., T. Mitamura, J. Carbonell, J. Callan, K. Collins-Thompson, K. Czuba, M. Duggan, L. Hiyakumoto, N. Hu, Y. Huang, J. Ko, L. Lita, S. Murtagh, V. Pedro and D. Svoboda. "The JAVELIN Question-Answering System at TREC 2002", Proceedings of TREC 11, 2002.
Nyberg, E., T. Mitamura, J. Callan, J. Carbonell, R. Frederking, K. Collins-Thompson, L. Hiyakumoto, Y. Huang, C. Huttenhower, S. Judy, J. Ko, A. Kupsc, L. V. Lita, V. Pedro, D. Svoboda, and B. Van Durme. "The JAVELIN Question-Answering System at TREC 2003: A Multi-Strategy Approach with Dynamic Planning", Proceedings of TREC 12, 2003.
Van Durme, B., Y. Huang, A. Kupsc and E. Nyberg, "Towards Light Semantic Processing for Question Answering," HLT/NAACL 2003 Workshop on Text Meaning.
These three papers describe the JAVELIN Question-Answering System. This system uses techniques which require rules to be written by hand for question classification. The first paper introduces the JAVELIN architecture and the second and third papers go into more detail on the NLP-based question classification methods. These papers also discus the answer types used in JAVELIN.
Pinto, D., Branstein, M., Coleman, R., King, M., Li, W., Wei, X. and Croft, W.B. "QuASM: A System for Question Answering Using Semi-Structured Data", Proceedings of the JCDL 2002 Joint Conference on Digital Libraries, 2002.
The authors describe the QuASM Question Answering System, which combines simple regular expressions with language modeling to improve question classification accuracy. The resulting method is more accurate than either the language modeling or regular expression methods alone. In addition, the language models are constructed on text that has had named entities replaced with their respective types. For instance, a phrase such as "I bought the software from Microsoft" would be changed to "I bought the software from <company7gt;." I plan to use these models, replacing the named entities in questions with their types, to change questions such as "Who is George Bush?" to "Who is <person>?," which can then be better matched to other similar questions with their language models.
Li, X. & Roth, D. “Learning Question Classifiers”, Proceedings of the 19th International Conference on Computational Linguistics, 2002.
The authors introduce a set of answer categories, which I use. The also describe a learning algorithm for question classification based on their own SnoW learning algorithm. The data used in this algorithm is semi-automatically created.
Dell Zhang, Wee Sun Lee. “Question Classification using Support Vector Machines”, In proceedings of SIGIR 2003.
Kadri Hacioglu and Wayne Ward. “Question Classification with Support Vector Machines and Error Correcting Codes", In Proceedings of HLT-NAACL, 2003.
Both of these papers detail methods of using Support Vector Machines for question classification. They also both use the same dataset and answer categories that I am using.
Data set and answer type hierarchy
© 2004, Jonathan Brown