An Embeddings for Ideographic Description Sequence (IDS)
A blog can be find here
antlr4
: use to generate the ANTLR4 parser, based onids_embed/parse/ids.g4
pytorch
: Others please just refer torequirements.txt
- make sure
assets/kanjivg.eids
exists. - run
script/prepare.sh
it mainly uses ANTLR to generate the parsing code. (mainly calling this commandantlr4 -Dlanguage=Python3 -visitor -o ./antlr ids.g4
) - Train the embedding model to find similar words.
./main.py --runner ids_embedding_runner --config ids_embedding.yml
Note that training is optional. I am also uploading the model file in the repo because it is small anyway (yeah you can really just use a small model)
In order to use your own IDS, just edit in config/ids_embedding.yml
, replace the line 7 test_ids: "⿱⿰耳口之"
to anything you like.