VISA

VISA is a dataset that consists of 40k Japanese-English parallel sentence pairs and corresponding video clips with the following key features:

The parallel sentences are subtitles from movies and TV episodes
The source subtitles are ambiguous, which means they have multiple possible translations with different meanings
We divide the dataset into Polysemy and Omission according to the cause of ambiguity

Examples:

Polysemy:

放せ！ --> Let me go!

Omission:

銃を持ってる。 --> I have a gun.

Splits:

Split	Train	Validation	test
Polysemy	18,666	1,000	1,000
Omission	17,214	1,000	1,000
Combined	35,880	2,000	2,000

Usage:

You can read json files to find the mapping from videos to parallel subtitle pairs.

Json Files Structure:

video_file_name: {  
    { "ja": Japanese_subtitle },  
    { "en": English_subtitle }  
}

Note:

Please, note that by downloading the dataset, you agree to the following conditions:

Do not re-distribute the dataset without our permission.
The dataset can only be used for research purposes. Any other use is explicitly prohibited.

Downloadable Features:

If you are interested in the video features of VISA, you can download them from the following links:

The I3D Features of VISA: http://lotus.kuee.kyoto-u.ac.jp/~yihang/dataset/VISA_i3d.zip
The RCNN Features of VISA: http://lotus.kuee.kyoto-u.ac.jp/~yihang/dataset/VISA_rcnn.zip

Citation:

If you find this dataset helpful, please cite our publication "VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation":

@inproceedings{li-etal-2022-visa,
    title = "{VISA}: An Ambiguous Subtitles Dataset for Visual Scene-aware Machine Translation",
    author = "Li, Yihang  and
      Shimizu, Shuichiro  and
      Gu, Weiqi  and
      Chu, Chenhui  and
      Kurohashi, Sadao",
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.lrec-1.725",
    pages = "6735--6743",
}

Contact:

If you have any questions about this dataset, please contact [email protected].

License:

GNU General Public License v3.0

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
omission		omission
polysemy		polysemy
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VISA

Examples:

Splits:

Usage:

Json Files Structure:

Note:

Downloadable Features:

Citation:

Contact:

License:

About

Releases

Packages

Contributors 3

License

ku-nlp/VISA

Folders and files

Latest commit

History

Repository files navigation

VISA

Examples:

Splits:

Usage:

Json Files Structure:

Note:

Downloadable Features:

Citation:

Contact:

License:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages