Skip to content
This repository has been archived by the owner on Dec 18, 2020. It is now read-only.

Releases: stickeritis/sticker2

Mixed-precision training

01 Oct 13:42
Compare
Choose a tag to compare

The most important new feature of this release is mixed-precision training 🎉. This speeds up training and lowers memory use on GPUs with Tensor Cores. Mixed-precision training can be enabled using the --mixed-precision option of sticker2 finetune and sticker2 distill.

Other notable changes:

  • Use fast AVX2 kernels on AMD Zen CPUs, without setting any special environment variables.
  • Update the sentencepiece crate dependency to 0.4. This version compiles the sentencepiece library statically if it is not available, removing the dependency on an external sentencepiece build.
  • The TensorBoard summary writer support that was added in 0.4.2 is now feature-gated (tensorboard). This makes it possible to compile sticker2 without TensorBoard support for quicker compiles and smaller binaries.

Tensorboard summaries

09 Sep 11:55
Compare
Choose a tag to compare

The most important new features in this release is addition of support for writing TensorBoard summaries to sticker2 annotate and sticker2 distill. The options --log-prefix is added to both subcommands. This option enables writing of TensorBoard summaries to the given log prefix. Losses and accuracies are logged for each layer, as well as the average loss.

This release also contains a fix for a bug where the variables had an additional spurious encoder prefix in finetuned models.

Support for ALBERT models and update to PyTorch 1.6

13 Aug 07:30
Compare
Choose a tag to compare
  • Add support for the ALBERT model. This provides two additional features over BERT:
    • The embedding size can be different than the hidden size. A linear transformation is applied to embeddings to make their sizes the same as the hidden state size. The embedding size is set through the embedding_size option of the model configuration.
    • Multiple layers can share the same weights. The number of hidden layers is specified through num_hidden_layers as before. The additional num_hidden_groups option determines the number weight groups. E.g. if num_hidden_layers is set to 12 and num_hidden_groups to 3, then each grouping of 4 layers consecutive layers share the same weights.
    • The ALBERT model can be used by setting pretrain_type = "albert" in the sticker2 configuration file.
  • Tokenizer types are separated from model types. Before this change, picking a particular model would select a tokenizer. Now the tokenizer type can be selected separately from the model. The tokenizer is selected through the tokenizer option, which replaces vocab. The possible values are:
    • ALBERT: tokenizer = { albert = { vocab = "vocab.model" } }
    • BERT: tokenizer = { bert = { vocab = "vocab.txt" } }
    • XLM-R: tokenizer = { xlm_roberta = { vocab = "vocab.model" } }
  • Update to sticker-transformers 0.8, tch 0.2, and libtorch 1.6.0.
  • Update to sentencepiece 0.3. This version is compatible with sentencepiece 0.1.9x.

Switch to CoNLL-U format

19 May 12:01
Compare
Choose a tag to compare

The most visible change is that from version 0.3.0 onwards, sticker2 uses the CoNLL-U format. Besides that there were many other improvements:

  • Switch from CoNLL-X to CoNLL-U as the file format.
  • Much-improved error messages.
  • Add TdzLemmaEncoder This encoder uses the edit tree encoder, but performs the necessary pre- and postprocessing to produce TüBa-D/Z style lemmas.
  • Add an option to ℓ2-normalize sinusoidal embeddings and make it the default. This improves model convergence (suggested by @twuebi).
  • Support encoding of the full features column as a string (rather than individual attributes/values).
  • Permit setting a default value for features. This is useful for using features that are not annotated on every token.
  • Add the filter-len subcommand. This filters a corpus by the sentence length in word or sentence pieces.
  • Improvements to serialization of encoders: remove phantom data and storing the feature <-> number bijection twice.
  • Update to libtorch 1.5.0.

Models trained with versions prior to 0.3.0 are not compatible with this version. At the moment we only provide compatibility of models with each version y in 0.y.z.

Support for XLM-RoBERTa & distillation improvements

20 Feb 19:01
Compare
Choose a tag to compare
  • Add support for finetuning XLM-RoBERTa models.
  • Support for distillation with separate teacher/student vocabularies.
  • Make it possible to set the number of PyTorch threads in sticker2 annotate and sticker2 server.
  • Remove word pieces vectorizer (this is now handled by the wordpieces crate).
  • sticker 0.1.0 models are not fully compatible with 0.2.0, but can be patched to work.