-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoNLL-U metadata validation/cleansing #251
Comments
Thank you for the submission. Your request addresses several issues. First, the dependency visualizer does not work, because graphml export sets the node key wrong. This is being fixed by #252 About supporting the data you provided and/or other versions of CoNLL: We suggest to stick to the notation using Are there any other features of CoNLL-X that you consider necessary? |
Thank you, #257 is the best way to deal with that IMHO. As for other features of CoNLL-X, the last two columns have different functions (cf. https://aclanthology.org/W06-2920.pdf). I guess it's not worth supporting that because they were not widely used, in the first place and this pertains to legacy data, only, which does not seem to be publicly available anymore (at least not from https://ilk.uvt.nl/conll/post_task_data.html). It is still used by some older parsers, though, and sometimes required as input for downstream tasks. So, while I would not advise to go for full CoNLL-X support, I would suggest to be robust against CoNLL-X input, i.e., check whether CoNLL-X data with You can synthesize such data from CoNLL-U data by just copying the values from the |
version: Annatto 0.8.0 - 2024-06-17
issue: After the conversion of (valid) CoNLL-U v1 data with non-standard metadata (see below), the output could be imported into ANNIS, but only partially visualized (no dependency view).
suggestion
tests/data/import/conll
background: In the CoNLL-U format, CoNLL comments before the sentence can be used to provide metadata, where a metadata attribute (e.g.,
text
) is assigned a value (separated by=
). In CoNLL-U v2, there are two obligatory metadata fields,text
andsent_id
, in CoNLL-U v1, metadata is optional, in CoNLL-X, metadata is treated as comment. In the following data snippet, an invalid separator is used, causing the ANNIS visualizer to break (p.c. by Thomas Krause). Apparently, this is because the converter tried to quietly recover the invalid metadata.example
The text was updated successfully, but these errors were encountered: