Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple references #64

Open
elliottd opened this issue Jun 20, 2016 · 3 comments
Open

Support for multiple references #64

elliottd opened this issue Jun 20, 2016 · 3 comments

Comments

@elliottd
Copy link

README.md suggests that MT-ComparEval currently supports only one reference translation.

reference.txt - a plain text file with reference translations (in target language).

It would be really useful if MT-ComparEval could support multiple reference translations!

@martinpopel
Copy link
Collaborator

martinpopel commented Jun 21, 2016

I agree it would be useful. (@chrhad also asked about multi-reference BLEU.) So we will give it a higher priority, but @choko cannot start sooner than July 22 (SLT deadline), so any help (pull requests) is welcome. Here is a task list to coordinate the effort:

  • uploading (and storing) more references
  • computing 4-reference BLEU:
    a) as an external script in a similar way as Hjerson is implemented - this may be easier to to start with
    b) natively in PHP (taking advantage of precomputed matching ngrams) - this is the preferred way
  • the default way of adapting other metrics for more references is for each sentence to choose the reference which gives the best sentence score.
  • Visualization:
    Showing more references should be simple. But MT-ComparEval also highlights the matching/improving n-grams and underlines diffs. This would be more difficult to adapt (I am not sure what the diff with "the reference" should show - but this is the least important feature, we can show diff with the first reference).

@Gldkslfmsd
Copy link

I'd also appreciate multiple sources.
This sounds weird generally, but in my research it makes sense. I compare impact of different preprocessing strategies.

@martinpopel
Copy link
Collaborator

It's not weird, we all agree multiple references would be useful. Just there is no one to work on it. So as noted above, PRs are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants