Skip to content

Commit

Permalink
*** next steps
Browse files Browse the repository at this point in the history
  • Loading branch information
gbenson committed May 16, 2024
1 parent e4d4c12 commit 87543c0
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 5 deletions.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@

# DOM tokenizers

DOM-aware tokenizers for [🤗 Hugging Face](https://huggingface.co/)
language models.
DOM-aware tokenizers for Hugging Face language models.

## Installation

Expand All @@ -30,7 +29,9 @@ pip install --upgrade pip
pip install -e .[dev,train]
```

## Train a tokenizer
## Load a pretrained tokenizer from the Hub

## Train your own
```sh
train-tokenizer gbenson/interesting-dom-snapshots -n 10000
```
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
name = "dom-tokenizers"
version = "0.0.2"
authors = [{ name = "Gary Benson" }]
description = "DOM-aware tokenizers for Hugging Face language models"
description = "DOM-aware tokenizers for 🤗 Hugging Face language models"
readme = "README.md"
license = { text = "Apache Software License (Apache-2.0)" }
license = { text = "Apache-2.0" }
requires-python = ">=3.10" # match..case
classifiers = [
"Development Status :: 4 - Beta",
Expand Down

0 comments on commit 87543c0

Please sign in to comment.