Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mecab-dict-gen crashes after a long time #44

Open
yosato opened this issue Aug 5, 2018 · 0 comments
Open

mecab-dict-gen crashes after a long time #44

yosato opened this issue Aug 5, 2018 · 0 comments

Comments

@yosato
Copy link

yosato commented Aug 5, 2018

after a parameter learning from a corpus and a dictionary, neither of which is particularly big, I try to generate the dictionary from the built model (CRF parameter file) like below

F/seed$ mecab-dict-gen -m csj_f.mdl -o ../
csj_f.mdl is not a binary model. reopen it as text mode...
reading ./unk.def ... 36
reading ./csj_dic.csv ... 35243
emitting ../left-id.def/ ../right-id.def
emitting ../unk.def ... 36
emitting ../csj_dic.csv ... 35243
emitting matrix : 3% |#

but without success, since it crashes with just the error message 'killed'.

The parameter file is 352M with 5 million lines, while the dictionary is 2M with 40 thousand items. Then I do mecab-dict-gen, which takes a long time, about 5 mins every 1% of progress. And frustratingly, around 50% ie after 8 hours, 'gets killed'.

First of all i wonder what makes it take so long and if there is a way to investigate / debug. Perhaps the param file is unusually big? And then, if there's any recipe how to avoid this type of problem, please advise. If you need more info please get back to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant