Skip to content

Python Package With A Trie Implementation in Rust

License

Notifications You must be signed in to change notification settings

jamaliki/rust-trie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

rust-trie

Python Package With A Trie Implementation in Rust

Installation

Simply run

pip install -e git+https://github.com/jamaliki/rust-trie.git#egg=rust-trie&subdirectory=rust_trie

Use case

This is mainly meant to help with tokenization. A simple use case would be the following Python code, from protein-lm-scaling project:

from rust_trie import Trie
from typing import List, Optional


class Tokenizer:
    def __init__(self, tokens: List[str], unk_token_id: Optional[int] = None):
        self.tokens = tokens
        self.trie = Trie(unk_token_id)
        for token in tokens:
            self.trie.add(token)
        if unk_token_id is None:
            self.ids_to_tokens += ["<unk>"]
    
    def __call__(self, sequence: str) -> List[int]:
        return self.trie.tokenize(sequence)

About

Python Package With A Trie Implementation in Rust

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages