Warning: sticker is succeeded by SyntaxDot, which supports many new features:
- Multi-task learning.
- Pretrained transformer models, suchs as BERT and XLM-R.
- Biaffine parsing in addition to parsing as sequence labeling.
- Lemmatization.
sticker is a sequence labeler using neural networks.
sticker is a sequence labeler that uses either recurrent neural networks, transformers, or dilated convolution networks. In principle, it can be used to perform any sequence labeling task, but so far the focus has been on:
- Part-of-speech tagging
- Topological field tagging
- Dependency parsing
- Named entity recognition
- Input representations:
- finalfusion embeddings with subword units
- Bidirectional byte LSTMs
- Hidden representations:
- Bidirectional recurrent neural networks (LSTM or GRU)
- Transformers
- Dillated convolutions
- Classification layers:
- Softmax (best-N)
- CRF
- Deployment:
- Standalone binary that links against
libtensorflow
- Very liberal license
- Docker containers with models
- Standalone binary that links against
sticker is almost production-ready and we are preparing for release 1.0.0. Graphs and models crated with the current version must work with sticker 1.x.y. There may still be breaking API or configuration file changes until 1.0.0 is released.
sticker uses techniques from or was inspired by the following papers:
- Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation. Wang Ling, Chris Dyer, Alan W Black, Isabel Trancoso, Ramón Fermandez, Silvio Amir, Luís Marujo, Tiago Luís, 2015, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
- Transition-based dependency parsing with topological fields. Daniël de Kok, Erhard Hinrichs, 2016, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics
- Viable Dependency Parsing as Sequence Labeling. Michalina Strzyz, David Vilares, Carlos Gómez-Rodríguez, 2019, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
You can report bugs and feature requests in the sticker issue tracker.
sticker is licensed under the Blue Oak Model License version
1.0.0. The Tensorflow protocol buffer definitions in
tf-proto
are licensed under the Apache License version 2.0. The
list of contributors is also available.
- sticker is developed by Daniël de Kok & Tobias Pütz.
- The Python precursor to sticker was developer by Erik Schill.
- Sebastian Pütz and Patricia Fischer reviewed a lot of code across the sticker projects.