Skip to content

[Undergraduate Graduation Project. Completed independently by Yue Han under the guidance of Prof. Yun Chen from 2021.11 to 2022.4]This model solve the problem that there is no interaction between context and gloss in the traditional twin-tower model for the Word Sense Disambiguation (WSD) task, proposing a single-twin-tower hybrid model.

License

Notifications You must be signed in to change notification settings

llxblhyvia/single-twin-hybrid-wsd-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WSD Model Based on Single-Twin Hybrid Tower Structure

Undergraduate Graduation Project.

Completed independently by Yue Han under the guidance of Prof. Yun Chen, from 2021.11 to 2022.4.

Introduction

This model solve the problem that there is no interaction between context and gloss in the traditional twin-tower model for the Word Sense Disambiguation (WSD) task, proposing a single-twin-tower hybrid model. Inspired by ALBEF , I concatenates the context and gloss together and input them into the Transformer encoders, so that the newly obtained context and gloss representations have learnt the interaction between each other. And then I use the original loss function to update the gradient; at the same time, I use two loss settings, one using the output of single tower to calculate the loss, and the other using the outputs of the single tower and twin tower together to calculate the Loss. Finally, the f1 score of the hybrid model using both outputs of single and dual towers reaches 78.2 on the Sensevel2 dataset, which is 0.4% higher than SOTA--wsd-biencoders.

The structure illustraion is below:model_structure.

Dependencies

To run this code, you'll need the following libraries:

  • Python 3
  • Pytorch 1.2.0
  • Pytorch Transformers 1.1.0
  • Numpy 1.17.2
  • NLTK 3.4.5
  • tqdm
  • and WSD Evaluation Framework which is standard evaluation framework for WSD task. Due to the limitation of store space of github, please download the WSD Evaluation Framework.zip to use it.

Architecture

  • wsd-biencoders-main # wsd package dir
    • yuanmodel.py # original wsd-biencoders model for comparison
    • singleloss.py # just use single tower output for training and evaluating
    • doublelosssingleeval.py # use both the single tower and biencoder loss for training and just use single tower output for evaluating
    • doublelossdoubleeval.py # use both the single tower and biencoder loss for training and evaluating
    • wsd_models
      • models.py # context, gloss, biencoder encoders
      • utils.py # tokenizer, data processing, data loader, etc.

How to Run

Firstly compile Scorer.java of WSD Evaluation Framework on your server(sudo authorization needed) with:

javac ./WSD_Evaluation_Framework/Evaluation_Datasets/Scorer.java

Train model with:

python xxx.py --data-path $path_to_wsd_data --ckpt $path_to_checkpoint

with $path_to_wsd_data the directory of WSD_Evaluation_Framework.

Evaluate model with:

python xxx.py --data-path $path_to_wsd_data --ckpt $path_to_model_checkpoint --eval --split $wsd_eval_set

with $wsd_eval_set the name of [evaluation datasets].

xxx.py is the training python file you choose this time.

Notice: Running Time

Because of the use of Transformer blocks, the model training process will take several days according to your GPU for 20 epochs, so please use tools like tmux to keep the connection constantly.

About

[Undergraduate Graduation Project. Completed independently by Yue Han under the guidance of Prof. Yun Chen from 2021.11 to 2022.4]This model solve the problem that there is no interaction between context and gloss in the traditional twin-tower model for the Word Sense Disambiguation (WSD) task, proposing a single-twin-tower hybrid model.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages