WSD Model Based on Single-Twin Hybrid Tower Structure

Undergraduate Graduation Project.

Completed independently by Yue Han under the guidance of Prof. Yun Chen, from 2021.11 to 2022.4.

Introduction

This model solve the problem that there is no interaction between context and gloss in the traditional twin-tower model for the Word Sense Disambiguation (WSD) task, proposing a single-twin-tower hybrid model. Inspired by ALBEF , I concatenates the context and gloss together and input them into the Transformer encoders, so that the newly obtained context and gloss representations have learnt the interaction between each other. And then I use the original loss function to update the gradient; at the same time, I use two loss settings, one using the output of single tower to calculate the loss, and the other using the outputs of the single tower and twin tower together to calculate the Loss. Finally, the f1 score of the hybrid model using both outputs of single and dual towers reaches 78.2 on the Sensevel2 dataset, which is 0.4% higher than SOTA--wsd-biencoders.

The structure illustraion is below:.

Dependencies

To run this code, you'll need the following libraries:

Python 3
Pytorch 1.2.0
Pytorch Transformers 1.1.0
Numpy 1.17.2
NLTK 3.4.5
tqdm
and WSD Evaluation Framework which is standard evaluation framework for WSD task. Due to the limitation of store space of github, please download the WSD Evaluation Framework.zip to use it.

Architecture

wsd-biencoders-main # wsd package dir
- yuanmodel.py # original wsd-biencoders model for comparison
- singleloss.py # just use single tower output for training and evaluating
- doublelosssingleeval.py # use both the single tower and biencoder loss for training and just use single tower output for evaluating
- doublelossdoubleeval.py # use both the single tower and biencoder loss for training and evaluating
- wsd_models
  - models.py # context, gloss, biencoder encoders
  - utils.py # tokenizer, data processing, data loader, etc.

How to Run

Firstly compile Scorer.java of WSD Evaluation Framework on your server(sudo authorization needed) with:

javac ./WSD_Evaluation_Framework/Evaluation_Datasets/Scorer.java

Train model with:

python xxx.py --data-path $path_to_wsd_data --ckpt $path_to_checkpoint

with $path_to_wsd_data the directory of WSD_Evaluation_Framework.

Evaluate model with:

python xxx.py --data-path $path_to_wsd_data --ckpt $path_to_model_checkpoint --eval --split $wsd_eval_set

with $wsd_eval_set the name of [evaluation datasets].

xxx.py is the training python file you choose this time.

Notice: Running Time

Because of the use of Transformer blocks, the model training process will take several days according to your GPU for 20 epochs, so please use tools like tmux to keep the connection constantly.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
wsd-biencoders-main		wsd-biencoders-main
LICENSE		LICENSE
README.md		README.md
model_structure.png		model_structure.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WSD Model Based on Single-Twin Hybrid Tower Structure

Introduction

Dependencies

Architecture

How to Run

Notice: Running Time

About

Releases

Packages

Languages

License

llxblhyvia/single-twin-hybrid-wsd-model

Folders and files

Latest commit

History

Repository files navigation

WSD Model Based on Single-Twin Hybrid Tower Structure

Introduction

Dependencies

Architecture

How to Run

Notice: Running Time

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages