GitHub - webis-de/small-text: Active Learning for Text Classification in Python

Active Learning for Text Classification in Python.

Installation | Quick Start | Contribution | Changelog | Docs

Small-Text provides state-of-the-art Active Learning for Text Classification. Several pre-implemented Query Strategies, Initialization Strategies, and Stopping Critera are provided, which can be easily mixed and matched to build active learning experiments or applications.

Features

Provides unified interfaces for Active Learning so that you can easily mix and match query strategies with classifiers provided by sklearn, Pytorch, or transformers.
Supports GPU-based Pytorch models and integrates transformers so that you can use state-of-the-art Text Classification models for Active Learning.
GPU is supported but not required. In case of a CPU-only use case, a lightweight installation only requires a minimal set of dependencies.
Multiple scientifically evaluated components are pre-implemented and ready to use (Query Strategies, Initialization Strategies, and Stopping Criteria).

What is Active Learning?

Active Learning allows you to efficiently label training data for supervised learning in a scenario where you have little to no labeled data.

News

Version 1.4.1 (v1.4.1) - August 18th, 2024
- Bugfix release.
Version 1.4.0 (v1.4.0) - June 9th, 2024
- New query strategy: AnchorSubsampling (aka AnchorAL).
  Special thanks to Pietro Lesci for the correspondence and code review.
Paper published at EACL 2023 🎉
- The paper introducing small-text has been accepted at EACL 2023. Meet us at the conference in May!
- Update: the paper was awarded EACL Best System Demonstration. Thank you, for your support!

For a complete list of changes, see the change log.

Installation

Small-Text can be easily installed via pip (or conda):

pip install small-text

The command results in a slim installation with only the necessary dependencies. For a full installation via pip, you just need to include the transformers extra requirement:

pip install small-text[transformers]

For conda, which lacks the extra requirements feature, a full installation can be achieved as follows:

conda install -c conda-forge "torch>=1.6.0" "torchtext>=0.7.0" transformers small-text

The library requires Python 3.7 or newer. For using the GPU, CUDA 10.1 or newer is required. More information regarding the installation can be found in the documentation.

Quick Start

For a quick start, see the provided examples for binary classification, pytorch multi-class classification, and transformer-based multi-class classification, or check out the notebooks.

Notebooks

#	Notebook
1	Intro: Active Learning for Text Classification with Small-Text
2	Using Stopping Criteria for Active Learning
3	Active Learning using SetFit
4	Using SetFit's Zero Shot Capabilities for Cold Start Initialization

Showcase

Tutorial: 👂 Active learning for text classification with small-text (Use small-text conveniently from the argilla UI.)

A full list of showcases can be found in the docs.

🎀 Would you like to share your use case? Regardless if it is a paper, an experiment, a practical application, a thesis, a dataset, or other, let us know and we will add you to the showcase section or even here.

Documentation

Read the latest documentation here. Noteworthy pages include:

Alternatives

modAL, ALiPy, libact, ALToolbox

Contribution

Contributions are welcome. Details can be found in CONTRIBUTING.md.

Acknowledgments

This software was created by Christopher Schröder (@chschroeder) at Leipzig University's NLP group which is a part of the Webis research network. The encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.

Citation

Small-Text has been introduced in detail in the EACL23 System Demonstration Paper "Small-Text: Active Learning for Text Classification in Python" which can be cited as follows:

@inproceedings{schroeder2023small-text,
    title = "Small-Text: Active Learning for Text Classification in Python",
    author = {Schr{\"o}der, Christopher  and  M{\"u}ller, Lydia  and  Niekler, Andreas  and  Potthast, Martin},
    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.eacl-demo.11",
    pages = "84--95"
}

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 512 Commits
.github		.github
docs		docs
examples		examples
small_text		small_text
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
DCO.md		DCO.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
THIRDPARTY-CODE		THIRDPARTY-CODE
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

What is Active Learning?

News

Installation

Quick Start

Notebooks

Showcase

Documentation

Alternatives

Contribution

Acknowledgments

Citation

License

About

Releases 15

Packages

Contributors 6

Languages

License

webis-de/small-text

Folders and files

Latest commit

History

Repository files navigation

Features

What is Active Learning?

News

Installation

Quick Start

Notebooks

Showcase

Documentation

Alternatives

Contribution

Acknowledgments

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 15

Packages 0

Contributors 6

Languages

Packages