Introduction

GPT-2 Fine-Tuning for Summarization on the ARTeLab dataset. The dataset is firstly unpacked into tokenized .json files which are passed to the training section.

Built With

The following technologies, frameworks and libraries have been used:

Python
Git

We strongly suggest to create a virtual env (i.e. 'GPT-2_Summarizer') providing the python version otherwise it will not install previous libraries:

conda create -n GPT-2_Summarizer python=3.8.9 
conda activate GPT-2_Summarizer python=3.8.9

If you want to run it manually you need to have python 3.8.9 (or later versions) configured on your machine.

Install all the libraries using the requirements.txt files that can be found in the main repository

pip install -r requirements.txt

Dataset Creation

The dataset can be created by passing a .csv file to the "dataset_creation.py" script which expects 2 columns, text and summary respectively.

python dataset_creation.py --path_csv "./path_to_csv" --path_directory "./path_to_directory" --model "model_used_for_tokenization"

The script will create tokenized .json files that can be fed to the "train.py" script.

Training

In order to run the training process on a GPU follow the Google Colab Notebook provided:

bash_train_GPT2.ipynb

Remember to change the Runtime to GPU !

Loading and using the model

Once the training has stopped and the best model has been saved you can load it and use it for summarization by running the script below on your terminal.

python loading_saved_model.py --text "Sarebbe stato molto facile per l'uomo estrarre la freccia dalla carne del malcapitato, eppure questo si rivelò complicato e fatale. La freccia aveva infatti penetrato troppo a fondo nella gamba e aveva provocato una terribile emorragia." --saved_model "./model_O0_trained_after_50_epochs_only_sum_loss_ignr_pad.bin"

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
bash_train_GPT2.ipynb		bash_train_GPT2.ipynb
dataset.py		dataset.py
dataset_creation.py		dataset_creation.py
loading_saved_model.py		loading_saved_model.py
pre_processing.py		pre_processing.py
prepare_data.py		prepare_data.py
requirements.txt		requirements.txt
train_new.py		train_new.py
utils_new.py		utils_new.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Built With

About

Releases

Packages

Languages

VioletRaven/GPT2Summarizer

Folders and files

Latest commit

History

Repository files navigation

Introduction

Built With

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages