Glyce is an open-source toolkit built on top of PyTorch and is developed by Shannon.AI.
To appear in NeurIPS 2019.
Glyce: Glyph-vectors for Chinese Character Representations
(Yuxian Meng*, Wei Wu*, Fei Wang*, Xiaoya Li*, Ping Nie, Fan Yin, Muyu Li, Qinghong Han, Xiaofei Sun and Jiwei Li, 2019)
@article{meng2019glyce,
title={Glyce: Glyph-vectors for Chinese Character Representations},
author={Meng, Yuxian and Wu, Wei and Wang, Fei and Li, Xiaoya and Nie, Ping and Yin, Fan and Li, Muyu and Han, Qinghong and Sun, Xiaofei and Li, Jiwei},
journal={arXiv preprint arXiv:1901.10125},
year={2019}
}
- What is Glyce ?
- Experiment Results
- Requirements
- Installation
- Quick Start of Glyce
- Folder Description
- Implementation
- Download Task Data
- Welcome Contributions to Glyce Open Source Project
- Acknowledgement
- Contact
Glyce is a Chinese char representation based on Chinese glyph information. Glyce Chinese char embeddings are composed by two parts: (1) glyph-embeddings and (2) char-ID embeddings. The two parts are combined using concatenation, a highway network or a fully connected layer. Glyce word embeddings are also composed by two parts: (1) word glyph-embeddings and (2) word-ID embeddings. Glyph word embedding is the output of forwarding glyph embeddings for each char in the word through the max-pooling layer. Then glyph word embedding combine word-ID embedding using concatenation to get glyce word embedding.
Here are the some highlights in glyce:
Glyce utilizes useful logographic information by encoding images of historical and contemporary scripts, along with the scripts of different writing styles.
We combine Glyce with Pre-trained Chinese BERT model and adopt specific layer to downstream tasks. The Glyce-BERT model outperforms BERT and sets new SOTA results for tagging (NER, CWS, POS), sentence pair classification, single sentence classification tasks.
We propose the Tianzige-CNN(田字格) structure, which is tailored to Chinese character modeling. Tianzige-CNN(田字格) tackle the issue of small number of Chinese characters and small sacle of image compared to Imagenet.
During training process, image-classification loss performs as an auxiliary training objective with the purpose of preventing overfitting and promiting the model's ability to generalize.
MSRA(Levow, 2006), OntoNotes 4.0(Weischedel et al., 2011), Resume(Zhang et al., 2018).
Model | P(onto) | R(onto) | F1(onto) | P(msra) | R(msra) | F1(msra) |
---|---|---|---|---|---|---|
CRF-LSTM | 74.36 | 69.43 | 71.81 | 92.97 | 90.62 | 90.95 |
Lattice-LSTM | 76.35 | 71.56 | 73.88 | 93.57 | 92.79 | 93.18 |
Lattice-LSTM+Glyce | 82.06 | 68.74 | 74.81 | 93.86 | 93.92 | 93.89 |
(+0.93) | (+0.71) | |||||
BERT | 78.01 | 80.35 | 79.16 | 94.97 | 94.62 | 94.80 |
ERNIE | - | - | - | - | - | 93.8 |
Glyce+BERT | 81.87 | 81.40 | 81.63 | 95.57 | 95.51 | 95.54 |
(+2.47) | (+0.74) |
Model | P(resume) | R(resume) | F1(resume) | P(weibo) | R(weibo) | F1(weibo) |
---|---|---|---|---|---|---|
CRF-LSTM | 94.53 | 94.29 | 94.41 | 51.16 | 51.07 | 50.95 |
Lattice-LSTM | 94.81 | 94.11 | 94.46 | 52.71 | 53.92 | 53.13 |
Lattice-LSTM+Glyce | 95.72 | 95.63 | 95.67 | 53.69 | 55.30 | 54.32 |
(+1.21) | (+1.19) | |||||
BERT | 96.12 | 95.45 | 95.78 | 67.12 | 66.88 | 67.33 |
Glyce+BERT | 96.62 | 96.48 | 96.54 | 67.68 | 67.71 | 67.60 |
(+0.76) | (+0.76) |
CTB5, CTB6, CTB9 and UD1
Model | P(ctb5) | R(ctb5) | F1(ctb5) | P(ctb6) | R(ctb6) | F1(ctb6) |
---|---|---|---|---|---|---|
(Shao, 2017) (sig) | 93.68 | 94.47 | 94.07 | - | - | 90.81 |
(Shao, 2017) (ens) | 93.95 | 94.81 | 94.38 | - | - | |
Lattice-LSTM | 94.77 | 95.51 | 95.14 | 92.00 | 90.86 | 91.43 |
Glyce+Lattice-LSTM | 95.49 | 95.72 | 95.61 | 92.72 | 91.14 | 91.92 |
(+0.47) | (+0.49) | |||||
BERT | 95.86 | 96.26 | 96.06 | 94.91 | 94.63 | 94.77 |
Glyce+BERT | 96.50 | 96.74 | 96.61 | 95.56 | 95.26 | 95.41 |
(+0.55) | (+0.55) |
Model | P(ctb9) | R(ctb9) | F1(ctb9) | P(ud1) | R(ud1) | F1(ud1) |
---|---|---|---|---|---|---|
(Shao, 2017) (sig) | 91.81 | 94.47 | 91.89 | 89.28 | 89.54 | 89.41 |
(Shao, 2017) (ens) | 92.28 | 92.40 | 92.34 | 89.67 | 89.86 | 89.75 |
Lattice-LSTM | 92.53 | 91.73 | 92.13 | 90.47 | 89.70 | 90.09 |
Glyce+Lattice-LSTM | 92.28 | 92.85 | 92.38 | 91.57 | 90.19 | 90.87 |
(+0.25) | (+0.78) | |||||
BERT | 92.43 | 92.15 | 92.29 | 95.42 | 94.97 | 95.19 |
Glyce+BERT | 93.49 | 92.84 | 93.15 | 96.19 | 96.10 | 96.14 |
(+0.86) | (+1.35) |
PKU, CITYU, MSR and AS.
Model | P(pku) | R(pku) | F1(pku) | P(cityu) | R(cityu) | F1(cityu) |
---|---|---|---|---|---|---|
BERT | 96.8 | 96.3 | 96.5 | 97.5 | 97.7 | 97.6 |
Glyce+BERT | 97.1 | 96.4 | 96.7 | 97.9 | 98.0 | 97.9 |
(+0.2) | (+0.3) |
Model | P(msr) | R(msr) | F1(msr) | P(as) | R(as) | F1(as) |
---|---|---|---|---|---|---|
BERT | 98.1 | 98.2 | 98.1 | 96.7 | 96.4 | 96.5 |
Glyce+BERT | 98.2 | 98.3 | 98.3 | 96.6 | 96.8 | 96.7 |
(+0.2) | (+0.2) |
The BQ corpus, XNLI, LCQMC, NLPCC-DBQA
Model | P(bq) | R(bq) | F1(bq) | Acc(bq) | P(lcqmc) | R(lcqmc) | F1(lcqmc) | Acc(lcqmc) |
---|---|---|---|---|---|---|---|---|
BiMPM | 82.3 | 81.2 | 81.7 | 81.9 | 77.6 | 93.9 | 85.0 | 83.4 |
BiMPM+Glyce | 81.9 | 85.5 | 83.7 | 83.3 | 80.4 | 93.4 | 86.4 | 85.3 |
(+2.0) | (+1.4) | (+1.4) | (+1.9) | |||||
BERT | 83.5 | 85.7 | 84.6 | 84.8 | 83.2 | 94.2 | 88.2 | 87.5 |
ERNIE | - | - | - | - | - | - | - | 87.4 |
Glyce+BERT | 84.2 | 86.9 | 85.5 | 85.8 | 86.8 | 91.2 | 88.8 | 88.7 |
(+0.9) | (+1.0) | (+0.6) | (+1.2) |
Model | Acc(xnli) | P(nlpcc-dbqa) | R(nlpcc-dbqa) | F1(nlpcc-dbqa) |
---|---|---|---|---|
BIMPM | 67.5 | 78.8 | 56.5 | 65.8 |
BIMPM+Glyce | 67.7 | 76.3 | 59.9 | 67.1 |
(+0.2) | (+1.3) | |||
BERT | 78.4 | 79.6 | 86.0 | 82.7 |
ERNIE | 78.4 | - | - | 82.7 |
Glyce+BERT | 79.2 | 81.1 | 85.8 | 83.4 |
(+0.8) | (+0.7) |
ChnSentiCorp, Fudan and Ifeng.
Model | ChnSentiCorp | Fudan | iFeng |
---|---|---|---|
LSTM | 91.7 | 95.8 | 84.9 |
LSTM+Glyce | 93.1 | 96.3 | 85.8 |
(+1.4) | (+0.5) | (+0.9) | |
BERT | 95.4 | 99.5 | 87.1 |
ERNIE | 95.4 | - | - |
Glyce+BERT | 95.9 | 99.8 | 87.5 |
(+0.5) | (+0.3) | (+0.4) |
CoNLL-2009
Model | Precision | Recall | F1 |
---|---|---|---|
Roth and Lapata (2016) | 76.9 | 73.8 | 75.3 |
Marcheggiani and Titov (2017) | 84.6 | 80.4 | 82.5 |
K-order pruning (He et al., 2018) | 84.2 | 81.5 | 82.8 |
K-order pruning + Glyce-word | 85.4 | 82.1 | 83.7 |
(+0.8) | (+0.6) | (+0.9) |
Chinese Penn TreeBank 5.1. Dataset splits follows (Dozat and Manning, 2016).
Model | UAS | LAS |
---|---|---|
Ballesteros et al. (2016) | 87.7 | 86.2 |
Kiperwasser and Goldberg (2016) | 87.6 | 86.1 |
Cheng et al. (2016) | 88.1 | 85.7 |
Biaffine | 89.3 | 88.2 |
Biaffine+Glyce-word | 90.2 | 89.0 |
(+0.9) | (+0.8) |
- Python Version >= 3.6
- GPU, Use NVIDIA TITAN Xp with 12G RAM
- Chinese scripts could be found in Google Drive. Please refer to the description and download scripts files to
glyce/glyce/fonts/
.
NOTE: Some experimental results are obtained by training on multi-GPU machines. May use DIFFERENT PyTorch versions refer to previous open-source SOTA models. Experiment environment for Glyce-BERT is Python 3.6 and PyTorch 1.10.
# Clone glyce
git clone [email protected]:ShannonAI/glyce.git
cd glyce
python3.6 setup.py develop
# Install package dependency
pip install -r requirements.txt
import torch
import torch.nn as nn
from torch.nn import CrossEntropyLoss
from glyce import GlyceConfig
from glyce import CharGlyceEmbedding, WordGlyceEmbedding
# load glyce hyper-params
glyce_config = GlyceConfig()
# if input ids is char-level
glyce_embedding = CharGlyceEmbedding(glyce_config)
# elif input ids is word-level
glyce_embedding = WordGlyceEmbedding(glyce_config)
# forward input_ids into embedding layer
glyce_embedding, glyph_loss = glyce_embedding(input_ids)
# utilize image classifiation loss as the auxiliary loss
glyph_decay = 0.1
glyph_ratio = 0.01 # the proportion of image classification loss to total loss
current_epoch = 3
# compute loss
loss_fct = CrossEntropyLoss()
task_loss = loss_fct(logits, labels)
total_loss = task_loss * (1 - glyce_ratio) + glyph_ratio * glyph_decay ** (current_epoch+1) * glyph_loss
-
Download and unzip
BERT-Base, Chinese
pretrained model. -
Install PyTorch pretrained bert by pip as follows:
pip install pytorch-pretrained-bert
- Convert TF checkpoint into PyTorch
export BERT_BASE_DIR=/path/to/bert/chinese_L-12_H-768_A-12
pytorch_pretrained_bert convert_tf_checkpoint_to_pytorch \
$BERT_BASE_DIR/bert_model.ckpt \
$BERT_BASE_DIR/bert_config.json \
$BERT_BASE_DIR/pytorch_model.bin
- Sequence Labeling Task
模型的输入: [CLS] 我 爱 北 京 天 安 门
模型的输出: ‘我爱北京天安门' 对应的命名实体标签 O O B-LOC M-LOC M-LOC M-LOC E-LOC
Model Input: [CLS] I like Bei Jing Tian An Men
Model Output: labels corresponds to 'I like BeiJing Tian An men' are O O B-LOC M-LOC M-LOC M-LOC E-LOC
- Sentence Pair Classification
模型的输入: [CLS] 我 爱 北 京 天 安 门 [SEP] 北 京 欢 迎 您
模型的输出: [CLS] 位置对应的输出,假如是语意匹配任务则为 0,代表两句话的语意不相同
Model Input: [CLS] I like Bei Jing Tian An Men [SEP] Bei Jing Wel ##come you
Model Output: the output in [CLS] position. 0 if task sign is semantic matching. It means that the semantic of two sentences are different.
- Single Sentence Classification
模型的输入: [CLS] 我 爱 北 京 天 安 门
模型的输出: [CLS] 位置对应的输出,假如是情感分类任务则为1,代表情感为积极
Model Input: [CLS] I like Bei Jing Tian An Men.
Model Output: the output in [CLS] position. 1 if task is sentiment analysis. It means the sentiment polarity is positive.
scritps/*_bert.sh
are the commands we used to finetune BERT.scripts/*_glyce_bert.sh
are the commands we used to obtained the results of Glyce-BERT.scripts/ctb5_binaffine.sh
is the command that we used to reimplement PREVIOUS SOTA result on CTB5 for dependency parsing.scripts/ctb5_glyce_binaffine.sh
is the command that we used to obtain the SOTA result on CTB5 for dependency parsing.
For example, training command of Glyce-BERT for sentence pair dataset BQ is included in scripts/glyce_bert/bq_glyce_bert.sh
.
Start train and evaluate Glyce-BERT on BQ by,
bash scripts/glyce_bert/bq_glyce_bert.sh
Notes:
repo_path
refer to work directory of glyce.config_path
refer to the path of configuration file.bert_model
refer to the the directory of the pre-trained Chinese BERT model.task_name
refer to the task signiature.data_dir
andoutput_dir
refer to the directories of the "raw data" and "intermediate checkpoints" respectively.
Glyce toolkit provides implementations of previous SOTA models incorporated with Glyce embeddings.
- Glyce: Glyph-vectors for Chinese Character Representations. Refer to ./glyce/models/glyce_bert
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Refer to ./glyce/models/bert
- Chinese NER Using Lattice LSTM. Refer to ./glyce/models/latticeLSTM
- Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF. Refer to ./glyce/models/latticeLSTM/model
- Syntax for Semantic Role Labeling, To Be, Or Not To Be. Refer to ./glyce/models/srl
- Bilateral Multi-Perspective Matching for Natural Language Sentences. Refer to ./glyce/models/bimpm
- Deep Biaffine Attention for Neural Dependency Parsing. Refer to ./glyce/models/biaffine
We actively welcome researchers and practitioners to contribute to Glyce open source project. Please read this Guide and submit your Pull Request.
Acknowledgement. Vanilla Glyce is developed based on the previous SOTA model. Glyce-BERT is developed based on PyTorch implementation by HuggingFace. And pretrained BERT model is released by Google's pre-trained models.
Please feel free to discuss paper/code through issues or emails.