Skip to content
This repository has been archived by the owner on Nov 28, 2024. It is now read-only.
/ Tboard Public archive

A LLM project for Android keyboard.

Notifications You must be signed in to change notification settings

Owen1u/Tboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tboard

A LLM project for Android keyboard.

Dirs

  • train: training & data processing
  • parse: parsing raw files
  • data: dataset for training and test
  • tokenizer: tokenizer model for LLM (e.g. Llama2 or custom)
  • script: .sh files
  • out: output model and log files
  • inference: C code for inference
  • evaluate: Python for eval

Usages

Download

Manually download your own TXT file and put it under ./data/.

Split data into Train and Test

sh script/dataset_split.sh

Train custom vocabulary source

Caution

请确保在任何时候(特别是小语种的定制化开发),训练和推理过程的tokenizer编码过程是一致的。

训练过程由Python sentencepiece实现;推理过程由C语言实现。

sh script/train_vocab.sh

Pretokenize queries in dataset

sh script/pretokenize.sh

Training LLM

Important

Mention on your device (CPU/GPU/MPS). Ensure that GPUs support torch.compile and PyTorch version > 2.0.

sh script/train.sh

Evaluation

Important

根据产品需求设定TopK数

cd evaluate
python eval.py

Inference

cd inference
make run
cd ..
sh script/inference.sh

About

A LLM project for Android keyboard.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published