-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WordDumb WrongTranslation Issue #134
Comments
You could click the "other meanings" button to select the correct definition. This plugin only matches words with one definition, you could also change the default meaning in the plugin's "customize Kindle word wise" window. |
Maybe I could train a machine learning model to match the word to its gloss, and also match person name or location to Wikipedia summary or Wikidata item. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
It is a super interesting question. I randomly stumbled upon this problem for my thesis and tried using llama.cpp with an instruction-fine-tuned language model from Llama such Wizard-Vicuna-7B. I simply gave it the task in the format:
I benchmarked it for Russian (to copy a WIP graphic) Disclaimer: I benchmarked the association of words with etymologies, not with senses. (The accuracy in reality is maybe 5 percent higher, the test data has a few mistakes). In English the results will surely be better. The runtime will probably suck though, but if the users are very patient it might be possible. Of course, training an own model, maybe with synthetic GPT3.5/4 data looks also pretty promising. But no idea. This is maybe also interesting, but apparently only works for English (didn't test it): https://github.com/alvations/pywsd |
I think I'll need to take a deep learning course first... Using existing model is easier to start but the performance could be bad. Training a model might be unavoidable because the model needs to output customized data(Kindle Word Wise database id or Wiktionary gloss). And for that same reason, pywsd might not be suitable, or maybe I could replace the default gloss data they're using. The ultimate goal is to find(or build) a model or a library the could take a chunk of text then magically mark the words in it with correct gloss and Wikipedia summary(output data should also have the token offset location). |
I think large language models such as Llama would work out of the box, but be extremely slow. For Worddumb they would only be viable (but probably still a bit slow) if the user has a GPU with at least 8 GB VRAM, which probably almost nobody has. Compared to English, Llama does have pretty mediocre multilingual skills unfortunately. pywsd uses oldschool algorithms, if I understood it correctly they might be applied to the Wiktionary data and not even be too slow, but the accuracy will likely be garbage. (But I don't know a lot about this.)
True. I tried asking GPT-4 to add a short translation after each word of a specific text in [brackets], and it did what I asked. But it was still a bit buggy and will probably hallucinate a lot and give wrong answers with more exotic languages or rarer words. It might only be a matter of time before something like this gets more viable. 👍 |
Using large language model for WSD maybe a little bit overkill IMO. I found this EWISER library: https://github.com/SapienzaNLP/ewiser, and they also have spacy plugin. Their paper is more recent and I'll see how I could integrate their work, look like I have a lot to learn... The EWISER paper's authors' university also created babelfy.org, which has almost all the features I need but it has API limit(1000 per day). |
I find the state-of-the-art WSD model at here: https://paperswithcode.com/sota/word-sense-disambiguation-on-supervised, and the best model is ConSeC: https://paperswithcode.com/paper/consec-word-sense-disambiguation-as But I never trained a model before and don't have a GPU card, this would take some time... |
I tried the LLaMA-3-Instruct-8B llamafile, I think accuracy is good but performance is ridiculously slow on CPU. I killed the process after waiting 4 hours. Maybe it's more usable with a powerful GPU? Code pushed to the wsd branch: https://github.com/xxyzz/WordDumb/tree/wsd |
Hi ~guys, I try to use WordDumb to read HarryPotter. But I get a lot of wrong meanings of words. for example , it explain drills as a type of strong cotton cloth instead a hand tool, power tool, or machine with a rotating cutting tip used for making holes. It sometimes choose the very rare and useless meaning of a word.
How can I adjust Translation Settings? Does it related with the dictionary in kindle?
The text was updated successfully, but these errors were encountered: