ChineseNLP

Chinese Transliteration

Background

Transliteration translates proper names and technical terms across languages that use different alphabets and sound systems.

Example input/output

Input:

约翰伍兹 (yue han wu zi)

Output:

John Woods

Standard Metrics

NEWS 2018 Dataset_03.

Named Entity Workshop (NEWS) is a long-running transliteration evaluation campaign. Chinese/English is one of the most popular NEWS language pairs. For NEWS 2018:

Test set name Source Target Test set size (phrase pairs)
NEWS 2018 Dataset_03 T-EnCh English Chinese 1000
NEWS 2018 Dataset_03 B-ChEn Chinese English 1000

Results

English-Chinese

  ACC F-score MRR MAP
He, Cohen (2020) 0.299 0.6799    
EDI (University of Edinburgh) 0.304 0.6791 0.4364 0.304

Chinese-English

  ACC F-score MRR MAP
UALB (University of Alberta) 0.3 0.8 0.374 0.3
EDI (University of Edinburgh) 0.276 0.83 0.386 0.276

Resources

Train set name Source Target Train set size (phrase pairs)
NEWS 2018 Dataset_03
T-EnCh
English Chinese 41318
NEWS 2018 Dataset_03
B-ChEn
Chinese English 32002


Suggestions? Changes? Please send email to chinesenlp.xyz@gmail.com