ChineseNLP

Chinese Question Answering

Background

Question answering (QA) automatically provides answers to questions posed in natural language. Answers may be contained in structured databases or unstructured text collections.

Example

Input:

世界上最大的国家是什么?

Output:

俄国

Standard Metrics

NLPCC KBQA shared task.

The KBQA shared task at NLPCC 2017 asks systems to retrieve answers from a provided knowledge base (KB) of factual triples. The knowledge base consists of 8.7m entities and 47.9m triples.

The test set was formed by human annotators who selected triples. For each triple, the annotator wrote down a natural-language question whose answer is the object of the triple. Q/A pairs are provided, but the triple is not provided.

Test set Size (Q/A pairs) Genre
NLPCC-ICCPOL KBQA 2016 9870 Open domain
NLPCC KBQA 2017 7631 Open domain

Metric

Averaged F1.

Results

14 teams participated.

System Averaged F1
Best anonymous score reported 0.47

Resources

Train set Size (Q/A pairs) Genre
NLPCC KBQA 2016/2017 14,609 Open domain

NLPCC DBQA shared task.

The DBQA shared task at NLPCC 2017 asks systems to

The test set was formed by human annotators who were given documents. For each document, an annotator selected a sentence, then constructed a natural-language question whose answer is that sentence.

Test set Size (document/sentence pairs) Genre
NLPCC-ICCPOL DBQA 2016 5779 Open domain
NLPCC DBQA 2017 2500 Open domain

Metrics

Results

NLPCC DBQA 2016

System MRR F1
ERNIE 2.0 95.8 85.8
Meng et. al. (2019) (Glyce + BERT) - 83.4
ERNIE(baidu) 95.1 82.7
BERT 94.6 80.8

NLPCC DBQA 2017

System MRR MAP Accuracy @ 1
Best anonymous score reported 72.0 71.7 59.2

Resources

Train set Size (document/sentence pairs) Genre
NLPCC DBQA 2016/2017 8772 Open domain

Machine Reading Comprehension (MRC) tasks from CLUE benchmark.

CLUE is a Chinese Language Understanding Evaluation benchmark. Machine Reading Comprehension (MRC) is a task to teach machine to read and understand unstructured text and then answer questions about it. MRC corpus in CLUE consists of three datasets: CMRC 2018 (Cui et al.), ChID (Zheng et al.), and C3 (Sun et al.).

Metrics

Results

| System | CMRC 2018 | ChID | C3 | | — | — | — | — | | HUMAN (CLUE origin) | 92.40 | 87.10 | 96.00 | | RoBERTa-wwm-ext-large (CLUE origin) | 76.58 | 85.37 | 72.32 | | BERT-base (CLUE origin) | 69.72 | 82.04 | 64.50 |

Resources

CLUE benchmark

Other resources.


Suggestions? Changes? Please send email to chinesenlp.xyz@gmail.com