You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kaede treebank is a Japanese constituent treebank,
which has clause level annotations with syntactic function labels,
e.g., syntactic role and clause type, and coordinated construction.
The treebank is designed to have complete binary trees, and
is currently composed of about 10,000 sentences from
the Kyoto University Text Corpus (the Mainichi Shimbun Newspaper).
Due to the copyright issue, this repository provides only annotations, and does not include original raw text data. In order to obtain a treebank with original raw texts, follow this procedure.
Buy Mainichi Shimbun News Data (毎日新聞記事データ集). You can purchase the data from Nichigai Associates.
Run auto_conv (specify the directory of the Mainichi Shimbun News Data as a command-line argument).
Takaaki Tanaka and Masaaki Nagata.: Constructing a Practical Constituent Parser from a Japanese Treebank with Function Labels. In Proceedings of 4th Workshop on Statistical Parsing of Morphologically-Rich Languages (SPMRL 2013), pp.108-118 (2013).
Sumire Uematsu, Takuya Matsuzaki, Hiroaki Hanaoka, Yusuke Miyao, and Hideki Mima.: Integrating Multiple Dependency Corpora for Inducing Wide Coverage Japanese CCG Resources. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), pp. 1042-1051 (2013).