-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgiza++ force alignment: segmentation fault when reloading a big N table #2
Comments
Hi, We are going to load previous N model from giza.ja-en/ja-en.n3.final I have 787264 entires in the ja-en.n3.final file. I reduced the N table size and it also worked. Any suggestion on how to solve it? Many thanks |
Hello, I think that this problem occurs in the file: NTables.cpp More specifically the following lines of code: while(!inf.eof()){ Maybe at some point of time an index violation is done. Perhaps: MAX_FERTILITY is at fault ??? I am just speculating. Hope this helps. |
I'm closing this issue 'cos it hasn't been answered for a while. Reopen if u wanna carry on chatting |
This a show-stopper for the force alignment feature and as it seems it has not been solved. I would like to keep this open. I would be happy to help in futher debugging. |
no worries. It might be a good idea to make your data available so people can reproduce it. Otherwise the issue isn't gonna get anywhere |
I'm having the same problem when it's chinese-english.
I suspect it's it's the fertility too but it's rather strange This unusually high fertility will almost always happen esp. when aligning logographic (Japanese/Chinese) languages to alphabetic ones. But they are rather rare < 200K sentence pairs from my 10M sample and most probably part of it is misaligned sentences or non-monotonic sentence alignments.
So the training works when I have fertility set to 5, 6, 7, 8 and even 9. I've doubled checked, if ratio is set <= 9 when cleaning this shouldn't occur, i don't know how but i had rogue lines with ration > 9 that snugged in and |
I am trying to produce word alignment for individual sentences. For this purpose I am using the "force align" functionality of mgiza++ Unfortunately when I am loading a big N table (fertility), mgiza crashes with a segmentation fault.
In particular, I have initially run mgiza on the full training parallel corpus using the default settings of the Moses script:
Afterwards, by executing the mgiza force-align script, I run the following command
This runs fine, until I get the following error:
The n-table that is failing has about 300k entries. For this reason, I thought I should try to see if the size is a problem. So I concatenated the table to 60k entries. And it works! But the alignments are not good.
I am struggling to fix this, so any help would be appreciated. I am running a freshly installed mgiza, on Ubuntu 12.04
The text was updated successfully, but these errors were encountered: