Chinese Sentiment Analysis
Background
Sentiment Analysis detects identifies and extracts subjective information from text.
Example
Input:
总的感觉这台机器还不错,实用的有:阴阳历显示,时间与日期快速转换, 记事本等。
Output:
Standard Metrics
Accuracy
- The percentage of correctly classified samples on test set.
F1-score
- Combination of precision and recall.
- Wiki Page
SemEval-2016 Task 5.
SemEval-2016 Task 5 contains 2 test sets with over 5000 reviews in total from digital camera and mobile phone area.
Source |
Genre |
# Classes |
Size(sentences) |
Size(words) |
SemEval 2016 Task 5 – CAM Test |
Digital Camera reviews (Chinese) |
3 |
2256 |
~25k |
SemEval 2016 Task 5 – PHNS Test |
Mobile Phone reviews (Chinese) |
3 |
3191 |
~34k |
Metrics
Results
|
Accuracy(PHNS Test) |
Accuracy(CAM Test) |
SenHint |
0.7958 |
0.8711 |
Resources
Source |
Genre |
# Classes |
Size(sentences) |
Size(words) |
SemEval 2016 Task 5 – CAM Train |
Digital Camera reviews (Chinese) |
3 |
5784 |
~61k |
SemEval 2016 Task 5 – PHNS Train |
Mobile Phone reviews (Chinese) |
3 |
6330 |
~62k |
NLP&CC 2012.
NLP&CC 2012 Test: Chinese Weibo sentiment analysis evaluation data.
Source |
Genre |
# Classes |
Size(sentences) |
Topics |
NLP&CC 2012 Test |
Weibo reviews |
2 |
1908 |
10 |
Metrics
Results
Resources
Source |
Genre |
# Classes |
Size(sentences) |
Size(words) |
NLP&CC 2012 Train |
Weibo reviews(Chinese) |
2 |
1765 |
~116k |
ChnSentiCorp.
ChnSentiCorp: It contains 1021 documents in three domains: education, movie and house.
Source |
Genre |
# Classes |
Size(sentences) |
Size(words) |
ChnSentiCorp Test |
Hotel reviews(Chinese) |
2 |
1999 |
~725k |
Metrics
Results
*Bert accuracy result is cited from ERNIE paper.
**fastText accuracy result is cited from MCCNN paper.
Resources
Source |
Genre |
# Classes |
Size(sentences) |
Size(words) |
ChnSentiCorp Train |
Hotel reviews(Chinese) |
2 |
8000 |
~2.9M |
IT168TEST.
IT168TEST: A product review dataset presented by Zagibalov and Carroll. This dataset contains over 20000 reviews, in which 78% were manually labeled as positive and 22% labeled as negative.
Source |
Genre |
# Classes |
Size(sentences) |
IT168Test |
Product review |
2 |
29531 |
Metrics
Results
*Accuracy result is cited from MCCNN paper.
Dianping.
Dianping: Chinese restaurant reviews were evenly split as follows: 4 and 5 star reviews were assigned to the positive class while 1-3 star reviews were in the negative class.
Source |
Genre |
# Classes |
Size(sentences) |
Dianping |
restaurant reviews |
2 |
500,000 |
Metrics
Results
Resources
Source |
Genre |
# Classes |
Size(sentences) |
Dianping |
restaurant reviews |
2 |
2,000,000 |
JD Full.
JD Full: Chinese shopping reviews were evenly split for predicting full five stars.
Source |
Genre |
# Classes |
Size(sentences) |
JD Full |
shopping reviews |
5 |
250,000 |
Metrics
Results
Resources
Source |
Genre |
# Classes |
Size(sentences) |
JD Full |
shopping reviews |
5 |
3,000,000 |
JD Binary.
- JD Binary: Chinese shopping reviews are evenly split into positive (4-and-5 star reviews)and negative (1-and-2 star reviews) sentiments, ignoring 3-star reviews.
Source |
Genre |
# Classes |
Size(sentences) |
JD Binary |
shopping reviews |
2 |
360,000 |
Metrics
Results
Resources
Source |
Genre |
# Classes |
Size(sentences) |
JD Binary |
shopping reviews |
2 |
4,000,000 |
Other Resources
- Overview paper in this area:
- An incomplete list of new corpora (as of 2020)
Name |
Description |
Domain/ Source |
Size (positive/ negative where applicable) |
Accuracy |
F1 |
Link |
Chinese Sarcasm Dataset |
Text manually labelled as sarcastic or not |
news |
2500 / 90 000 |
0.7611 |
0.7368 |
Gong et al., 2020 |
CH-SIMS |
Individually labelled multi-modal (text, video, audio) |
movies, TV shows |
2281 video segments |
- |
0.827 |
Yu et al., 2020 |
FiTSA |
Aspect-based sentiment analysis for financial news |
news |
8314 sentences, 647 000 characters |
- |
0.798 |
Yuan et al., 2020 |
MPDD |
Emotion in multi-party dialogs |
TV shows |
25 500 utterances |
0.595 |
- |
Cheng et al., 2020 |
MIMN |
Multimodal (text, image) and aspect-based analysis |
zol.com (shopping site) |
5200 reviews |
0.616 |
0.605 |
github |
Suggestions? Changes? Please send email to chinesenlp.xyz@gmail.com