Releases: RUCAIBox/TextBox
Releases · RUCAIBox/TextBox
TextBox 2.0 Release
TextBox 2.0 is an up-to-date text generation library based on Python and PyTorch focusing on building a unified and standardized pipeline for applying pre-trained language models to text generation:
- From a task perspective, we consider 13 common text generation tasks such as translation, story generation, and style transfer, and their corresponding 83 widely-used datasets.
- From a model perspective, we incorporate 47 pre-trained language models/modules covering the categories of general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight models (modules).
- From a training perspective, we support 4 pre-training objectives and 4 efficient and robust training strategies, such as distributed data parallel and efficient generation.
Compared with the previous version of TextBox, this extension mainly focuses on building a unified, flexible, and standardized framework for better supporting PLM-based text generation models. There are three advantages of TextBox 2.0:
- It is a significant innovation focusing on comprehensive tasks and PLMs.
- It is designed to be unified in implementation and interface.
- It can faithfully reproduce the results reported in existing work.
TextBox v0.2.1
TextBox v0.2.1 Release Notes
The TextBox v0.2.1 release includes a number of wonderful new features, some bug fixes and code refactor. A few of the highlights include:
- We add 6 new models: HRED, CVAE, T5, ProphetNet, Context2Seq and Attribute2Seq.
- We add 3 new datasets: Persona Chat for dialog system, Amazon Electronic for attribute to text generation and Chinese Classical Poetry Corpus for poem generation.
- We support Distributed Data Parallel (DDP) for training with multiple GPUs conveniently.
- We refactor the codes of pretrained language models (PLMs) for improving performance.
- We refactor the
dataset
anddataloader
to provide unified and convenient interface. - We unify and simplify the
generate
function for each model. - We unify the config parameters of different models and datasets.
TextBox v0.1.5
TextBox is an open-source library for building text generation system. It is developed based on Python and PyTorch.