English/Chinese Named Entity Translation

 

Contents


 

Abstract

Named entity translation is very important in multilingual natural language processing such as cross-lingual information retrieval, statistical machine translation as well as cross-lingual question answering. In this course project, I mainly focus on English / Chinese named entity translation. Plan to make use of Chinese language features to improve the named entity translation quality.

 

Introduction

Named entity translation is an important issue for several research areas such as cross-lingual information retrieval, machine translation and cross-lingual question answering. Currently, researchers prefer to give a general algorithm to solve the problem. But different languages have their own features. For example in Chinese, the named entities are always  translated into Pin-yin, such as Beijing and Zemin Jiang. Also in Chinese, the total number of most frequent person’s last name is around 100. These language features may allow us to find a way to improve the named entity translation quality.

Also most of the researchers more focus on English language, we have commercial named entity identification tool such as BBN identifier. But we don’t have some tools for Chinese named entity extraction. In the project, I will also put some efforts on Chinese named entity extraction.

 

Algorithm

Working on …

 

 

Available Resource

·        BBN Named Entity Tool

·        LDC Chinese Dictionary with Pinyin

·        English/Chinese parallel corpus: Hong Kong News

 

 

Schedule

n      Sep 15- Oct 15:

n      Chinese named entity identification

n      Oct 16 – Nov 10

n      English/Chinese named entity alignment

n      Result: get translation dictionary

n      Check the queries including named entities

n      Nov 11 – Dec

n      Experiments

n      Comparing the performance with other methods

n      Report

 

 

Bibliography

[1]Learning Translations of Named-Entity Phrases from Parallel Corpora. Robert C. More.

[2]Improve Named Entity translation and Bilingual Named Entity Extraction. Fei Huang and Stephan Vogel.

[3]Chinese Named Entity Identification Using Class-based Language Model. Jian Sun, Jianfeng Gao, Lei Zhang, Ming Zhou, Changning Huang.