Active Learning for Text Classification in Python.
Installation | Quick Start | Contribution | Changelog | Docs
Small-Text provides state-of-the-art Active Learning for Text Classification. Several pre-implemented Query Strategies, Initialization Strategies, and Stopping Critera are provided, which can be easily mixed and matched to build active learning experiments or applications.
- Provides unified interfaces for Active Learning so that you can easily mix and match query strategies with classifiers provided by sklearn, Pytorch, or transformers.
- Supports GPU-based Pytorch models and integrates transformers so that you can use state-of-the-art Text Classification models for Active Learning.
- GPU is supported but not required. In case of a CPU-only use case, a lightweight installation only requires a minimal set of dependencies.
- Multiple scientifically evaluated components are pre-implemented and ready to use (Query Strategies, Initialization Strategies, and Stopping Criteria).
Active Learning allows you to efficiently label training data for supervised learning in a scenario where you have little to no labeled data.
-
Version 1.4.1 (v1.4.1) - August 18th, 2024
- Bugfix release.
-
Version 1.4.0 (v1.4.0) - June 9th, 2024
- New query strategy: AnchorSubsampling (aka AnchorAL).
Special thanks to Pietro Lesci for the correspondence and code review.
- New query strategy: AnchorSubsampling (aka AnchorAL).
-
Paper published at EACL 2023 🎉
- The paper introducing small-text has been accepted at EACL 2023. Meet us at the conference in May!
- Update: the paper was awarded EACL Best System Demonstration. Thank you, for your support!
For a complete list of changes, see the change log.
Small-Text can be easily installed via pip (or conda):
pip install small-text
The command results in a slim installation with only the necessary dependencies.
For a full installation via pip, you just need to include the transformers
extra requirement:
pip install small-text[transformers]
For conda, which lacks the extra requirements feature, a full installation can be achieved as follows:
conda install -c conda-forge "torch>=1.6.0" "torchtext>=0.7.0" transformers small-text
The library requires Python 3.7 or newer. For using the GPU, CUDA 10.1 or newer is required. More information regarding the installation can be found in the documentation.
For a quick start, see the provided examples for binary classification, pytorch multi-class classification, and transformer-based multi-class classification, or check out the notebooks.
- Tutorial: 👂 Active learning for text classification with small-text (Use small-text conveniently from the argilla UI.)
A full list of showcases can be found in the docs.
🎀 Would you like to share your use case? Regardless if it is a paper, an experiment, a practical application, a thesis, a dataset, or other, let us know and we will add you to the showcase section or even here.
Read the latest documentation here. Noteworthy pages include:
modAL, ALiPy, libact, ALToolbox
Contributions are welcome. Details can be found in CONTRIBUTING.md.
This software was created by Christopher Schröder (@chschroeder) at Leipzig University's NLP group which is a part of the Webis research network. The encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.
Small-Text has been introduced in detail in the EACL23 System Demonstration Paper "Small-Text: Active Learning for Text Classification in Python" which can be cited as follows:
@inproceedings{schroeder2023small-text,
title = "Small-Text: Active Learning for Text Classification in Python",
author = {Schr{\"o}der, Christopher and M{\"u}ller, Lydia and Niekler, Andreas and Potthast, Martin},
booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
month = may,
year = "2023",
address = "Dubrovnik, Croatia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.eacl-demo.11",
pages = "84--95"
}