- Fixed an out of bounds error that occurred when
DiscriminativeActiveLearning
queries all remaining unlabeled data. - Fixed typos/wording in PoolBasedActiveLearner docstrings.
- Pinned SetFit version in notebook example. (#64)
- Fixed an out of bounds error that could occur in
SetFitClassification
for both 32bit systems and Windows. (#66) - Fixed errors in notebook examples that occurred with more recent seaborn / matplotlib versions.
- Documentation: added links to bibliography. (#65)
- New query strategy: AnchorSubsampling.
- Changed the way how the seed is controlled in
SetFitClassification
since the seed was fixed unless explicitly set via the respective trainer keyword argument.
- Documentation: added a section where compatible transformer models are listed.
- Documentation: updated showcase section.
- An errata section was added to the documentation.
- Fixed a deviation from the paper, where
DeltaFScore
also took into account the agreement in predictions of the negative label. (#51) - Fixed a bug in
KappaAverage
that affected the stopping behavior. (#52)
- Fixed a bug in
TransformerBasedClassification
, wherevalidations_per_epoch>=2
left the model in eval mode. (#40)
- Fixed a bug where parameter groups were omitted when using
TransformerBasedClassification
's layer-specific fine-tuning functionality. (#36, #38) - Fixed a bug where class weighting resulted in
nan
values. (#39)
- Added dropout sampling to
SetFitClassification <https://github.com/webis-de/small-text/blob/v1.3.0/small_text/integrations/transformers/classifiers/setfit.py>
__.
- Fixed broken link in README.md.
- Fixed typo in README.md. (#26)
- The
ClassificationChange <https://github.com/webis-de/small-text/blob/v1.3.0/small_text/stopping_criteria/change.py>
__ stopping criterion now supports multi-label classification. - Documentation:
- Updated the active learning setup figure.
- The documentation of integrations has been reorganized.
- Added new classifier: SetFitClassification which wraps huggingface/setfit.
- Active Learner:
- PoolBasedActiveLearner now handles keyword arguments passed to the classifier's
fit()
during theupdate()
step.
- PoolBasedActiveLearner now handles keyword arguments passed to the classifier's
- Query Strategies:
- New strategy: BALD.
- SubsamplingQueryStrategy now uses the remaining unlabeled pool when more samples are requested than are available.
- Notebook Examples:
- Revised both existing notebook examples.
- Added a notebook example for active learning with SetFit classifiers.
- Added a notebook example for cold start initialization with SetFit classifiers.
- Documentation:
- A showcase section has been added to the documentation.
- Distances in lightweight_coreset were not correctly projected onto the [0, 1] interval (but ranking was unaffected).
- Coreset implementations now use the distance-based (as opposed to the similarity-based) formulation.
- Model selection raised an error in cases where no model was available for selection (#21).
-
General:
- Small-Text package is now available via conda-forge.
- Imports have been reorganized. You can import all public classes and methods from the top-level package (
small_text
):from small_text import PoolBasedActiveLearner
-
Classification:
- All classifiers now support weighting of training samples.
- Early stopping has been reworked, improved, and documented (#18).
- Model selection has been reworked and documented.
- [!]
KimCNNClassifier.__init()__
: The default value of the (now deprecated) keyword argumentearly_stopping_acc
has been changed from0.98
to-1
in order to matchTransformerBasedClassification
. - [!] Removed weight renormalization after gradient clipping.
-
Datasets:
- The
target_labels
keyword argument in__init()__
will now raise a warning if not passed. - Added
from_arrays()
toSklearnDataset
,PytorchTextClassificationDataset
, andTransformersDataset
to construct datasets more conveniently.
- The
-
Query Strategies:
- New multi-label strategy: CategoryVectorInconsistencyAndRanking.
-
Stopping Criteria:
- New stopping criteria: ClassificationChange, OverallUncertainty, and MaxIterations.
small_text.integrations.pytorch.utils.misc.default_tensor_type()
is deprecated without replacement (#2).TransformerBasedClassification
andKimCNNClassifier
: The keyword arguments for early stopping (early_stopping / early_stopping_no_improvement, early_stopping_acc) that are passed to__init__()
are now deprecated. Use theearly_stopping
keyword argument in thefit()
method instead (#18).
- Classification:
KimCNNClassifier.fit()
andTransformerBasedClassification.fit()
now correctly process thescheduler
keyword argument (#16).
- Removed the strict check that every target label has to occur in the training data. (This is intended for multi-label settings with many labels; apart from that it is still recommended to make sure that all labels occur.)
Minor bug fix release.
Links to notebooks and code examples will now always point to the latest release instead of the latest main branch.
First stable release.
- Datasets:
SklearnDataset
now checks if the dimensions of the features and labels match.
- Query Strategies:
- ExpectedGradientLengthMaxWord: Cleaned up code and added checks to detect invalid configurations.
- Documentation:
- The documentation is now available in full width.
- Repository:
- Versions in this can now be referenced using the respective Zenodo DOI.
- General:
- We now have a concept for optional dependencies which allows components to rely on soft dependencies, i.e. python dependencies which can be installed on demand (and only when certain functionality is needed).
- Datasets:
- The
Dataset
interface now has aclone()
method that creates an identical copy of the respective dataset.
- The
- Query Strategies:
- New strategies: DiscriminativeActiveLearning and SEALS.
- Datasets:
- Separated the previous
DatasetView
implementation into interface (DatasetView
) and implementation (SklearnDatasetView
). - Added
clone()
method which creates an identical copy of the dataset.
- Separated the previous
- Query Strategies:
EmbeddingBasedQueryStrategy
now only embeds instances that are either in the label or in the unlabeled pool (and no longer the entire dataset).
- Code examples:
- Code structure was unified.
- Number of iterations can now be passed via an cli argument.
small_text.integrations.pytorch.utils.data
:- Method
get_class_weights()
now scales the resulting multi-class weights so that the smallest class weight is equal to1.0
.
- Method
- New query strategy: ContrastiveActiveLearning.
- Added Reproducibility Notes.
-
Cleaned up and unified argument naming: The naming of variables related to datasets and indices has been improved and unified. The naming of datasets had been inconsistent, and the previous
x_
notation for indices was a relict of earlier versions of this library and did not reflect the underlying object anymore.-
PoolBasedActiveLearner
:- attribute
x_indices_labeled
was renamed toindices_labeled
- attribute
x_indices_ignored
was unified toindices_ignored
- attribute
queried_indices
was unified toindices_queried
- attribute
_x_index_to_position
was named to_index_to_position
- arguments
x_indices_initial
,x_indices_ignored
, andx_indices_validation
were renamed toindices_initial
,indices_ignored
, andindices_validation
. This affects most methods of thePoolBasedActiveLearner
.
- attribute
-
QueryStrategy
- old:
query(self, clf, x, x_indices_unlabeled, x_indices_labeled, y, n=10)
- new:
query(self, clf, dataset, indices_unlabeled, indices_labeled, y, n=10)
- old:
-
StoppingCriterion
- old:
stop(self, active_learner=None, predictions=None, proba=None, x_indices_stopping=None)
- new:
stop(self, active_learner=None, predictions=None, proba=None, indices_stopping=None)
- old:
-
-
Renamed environment variable which sets the small-text temp folder from
ALL_TMP
toSMALL_TEXT_TEMP
Bugfix release.
- Fix links to the documentation in README.md and notebooks.
First beta release with multi-label functionality and stopping criteria.
- Added a changelog.
- All provided classifiers are now capable of multi-label classification.
- Documentation has been overhauled considerably.
PoolBasedActiveLearner
: Renamedincremental_training
kwarg toreuse_model
.SklearnClassifier
: Changed__init__(clf)
to__init__(model, num_classes, multi_Label=False)
SklearnClassifierFactory
:__init__(clf_template, kwargs={})
to__init__(base_estimator, num_classes, kwargs={})
.- Refactored
KimCNNClassifier
andTransformerBasedClassification
.
- Removed
device
kwarg fromPytorchDataset.__init__()
,PytorchTextClassificationDataset.__init__()
andTransformersDataset.__init__()
.