Transferability of Syntax-Aware Graph Neural Networks in Zero-Shot Cross-Lingual Semantic Role Labeling
This is the implementation code for our paper: "Transferability of Syntax-Aware Graph Neural Networks in Zero-Shot Cross-Lingual Semantic Role Labeling". Feel free to use this repository to reproduce our results for research purposes. We do not provide any data in this repository, therefore please prepare the data in advance. We refer to the other repositories when implementing the models for our experiments:
- diegma/neural-dep-srl for Syntactic Graph Convolutional Networks (SGCNs)
- dmlc/dgl for Relational Graph Convolutional Networks (RGCNs)
- AnWang-AI/towe-eacl for Attention-Based Graph Convolutional Networks (ARGCNs)
- gordicaleksa/pytorch-GAT for Graph Attention Networks (GATs)
- shenwzh3/RGAT-ABSA for Relational Graph Attention Networks (RGATs)
- thudm/hgb for Simple Heterogeneous Graph Neural Networks (SHGNs)
- deepakn97/relationPrediction for Knowledge-Based Graph Attention Networks (KBGATs)
- wasiahmad/GATE for Graph Attention Transformer Encoders (GATEs)
We provide the versions of libraries that we use:
huggingface-hub
:0.15.1
matplotlib
:3.7.1
numpy
:1.24.3
pandas
:2.0.2
prettytable
:3.7.0
python
:3.8.16
scipy
:1.10.1
seaborn
:0.12.2
tokenizers
:0.13.3
torch
:1.13.0+cu116
torch-scatter
:2.1.1+pt113cu116
tqdm
:4.65.0
transformers
:4.29.2
- Clone SRL annotations for target languages from Universal Proposition Bank (UPB) v2 and for English.
- Run
helper_scripts/fix_upb_2.py
to fix shifted annotation problem caused by enhanced dependency tree annotations in treebanks. Make sure the paths written in the Python script are correct. The treebanks that contain enhanced dependency tree annotations are:Czech-CAC
Czech-FicTree
Czech-PDT
Dutch-Alpino
Dutch-LassySmall
Finnish-TDT
Italian-ISDT
Spanish-AnCora
Ukrainian-IU
- Download annotations from Universal Dependencies (UD) v2.9.
- Clone UPB tools to merge UPB annotations with UD annotations.
- Setup UPB tools.
- Run this command from UPB tools to merge UPB annotations and UD annotations and obtain complete SRL annotations for a certain treebank:
python3 up2/merge_ud_up.py --input_ud=<ud-treebank> --input_up=<up-treebank> --output=<merged-treebank>
- Change the paths mentioned at the
metadata_by_version_to_lang_to_treebank
insideconstants/dataset.py
file to point to the correct paths.UP-2.0
must point to the complete SRL annotations using gold POS tags and dependency trees.UP-2.0-predicted
must point to complete SRL annotations using predicted POS tags and dependency trees.
- We provide a Python script in
helper_scripts/create_script.py
to generate a script to run the training and evaluation for each model. The codenames for models are as follows:gcn_syn
for SGCNsgcn_ar
for ARGCNsgat_plain
for GATslstm
for BiLSTMsgat_het
for SHGNsgat_two_att
for RGATsgat_kb
for KBGATsgcn_r
for RGCNstrans_rpr
for Self-Attention with Relative Position Representations (SAN-RPRs)gate
for GATEstrans
for Transformerstrans_spr
for Self-Attention with Structural Absolute Position Representations (SAN-SAPRs)trans_spr_rel
for Self-Attention with Structural Relative Position Representations (SAN-SRPRs)trans_x_spr_rel
for Transformers with SRPRs (Trans-SRPRs)trans_rpr_x_spr
for SAN-SAPRs with SAN-RPRs (SAPR-RPRs)trans_x_spr_rel_x_dr
for Trans-SRPRs with DR (Trans-SRPR-DRs)trans_rpr_x_spr_x_dr
for SAPR-RPRs with DR (SAPR-RPR-DRs)trans_rpr_x_spr_x_ldp
for SAPR-RPRs with LDPs (SAPR-RPR-LDPs)trans_rpr_x_spr_x_sdp
for SAPR-RPRs with SDPs (SAPR-RPR-SDPs)
- Fill the codename of the desirable model in
model_list
variable to generate the script for training a certain model. - Fill the desirable parameters for the model in
params
variable. The explanation for each parameter can be found in the Appendix of the paper. - Run the Python script in
helper_scripts/create_script.py
. Make sure the paths written in the Python script are correct. - The script to train and evaluate the model will be available at
scripts/tests
. - Run the script with
bash <script-name>
to start the training process. - After the training and evaluation finish, the logs and models will be available at directories with
_logs
and_models
suffixes.