Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watcher: blank lines as bad number of sentences #77

Open
Gldkslfmsd opened this issue Jan 17, 2018 · 2 comments
Open

watcher: blank lines as bad number of sentences #77

Gldkslfmsd opened this issue Jan 17, 2018 · 2 comments

Comments

@Gldkslfmsd
Copy link

Hello,

I have following issue:

u-pl0:~/tmp/MT-ComparEval$ bash bin/watcher.sh 
Watcher is watching folder: ./data
[17-Jan-2018 11:19:13]	New experiment called de-cs_BPE_boundary_mark was found
[17-Jan-2018 11:19:13]	source.txt used as a source source.
[17-Jan-2018 11:19:13]	de-cs_BPE_boundary_mark has 3000 source sentences
[17-Jan-2018 11:19:13]	reference.txt used as a reference source.
[17-Jan-2018 11:19:13]	de-cs_BPE_boundary_mark has 3000 reference sentences
[17-Jan-2018 11:19:16]	Experiment de-cs_BPE_boundary_mark uploaded successfully.
[17-Jan-2018 11:19:17]	Importing task: de-cs_BPE_boundary_mark:AZ
[17-Jan-2018 11:19:17]	translation.txt used as a translation source.
[17-Jan-2018 11:19:17]	AZ has 2999 translation sentences
[17-Jan-2018 11:19:17]	AZ has bad number of sentences
[17-Jan-2018 11:19:17]	Parsing of AZ aborted!

Some of my test sentences were translated as blank lines, including the last one in a file. Watcher doesn't recognize it correctly and refuses to load such translations.

I tried to avoid this problem by sed 's/^$/###EMPTY###/', so no lines are empty anymore. I get this when I restart the watcher:

u-pl0:~/tmp/MT-ComparEval$ bash bin/watcher.sh 
Watcher is watching folder: ./data
[17-Jan-2018 11:22:08]	New experiment called de-cs_BPE_boundary_mark was found
[17-Jan-2018 11:22:08]	source.txt used as a source source.
[17-Jan-2018 11:22:08]	de-cs_BPE_boundary_mark has 3000 source sentences
[17-Jan-2018 11:22:08]	reference.txt used as a reference source.
[17-Jan-2018 11:22:08]	de-cs_BPE_boundary_mark has 3000 reference sentences
[17-Jan-2018 11:22:11]	Experiment de-cs_BPE_boundary_mark uploaded successfully.
[17-Jan-2018 11:22:12]	Importing task: de-cs_BPE_boundary_mark:AZ
[17-Jan-2018 11:22:12]	translation.txt used as a translation source.
[17-Jan-2018 11:22:12]	AZ has 3000 translation sentences
[17-Jan-2018 11:22:12]	AZ has bad number of sentences
[17-Jan-2018 11:22:12]	Parsing of AZ aborted!

I think I can avoid this by reinstalling MT-ComparEval or by deleting some temporary files, if I knew which ones, but it's inconvenient, it shouldn't be default.

@ondrejklejch
Copy link
Owner

Hi,

thank you for the report. Could you share the experiment de-cs_BPE_boundary_mark with me so I can try it myself? Alternatively, you can try to fix the problem by replacing trim function here: https://github.com/choko/MT-ComparEval/blob/f061f2983bf6579f4d127e1c786636fb5154e542/libs/Iterators/FileSentencesIterator.php#L15

Ondrej

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants