You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m struggling to convert TEI-encoded parallel corpora with Pepper.
The most straightforward approach proposed by TEI seems to involve constructing link groups connecting the aligned linguistic units together. Such is the approach I have witnessed in the Opus-MontenegrinSubs corpus, where along with the English and Montenegrin texts themselves there is a separate file containing nothing but the alignment links:
However, the TEI importer fails to process this corpus with the errors of this kind:
Cannot map 'salt:/0/OpusMonte.TEI/opusmonte_cnr.ana' with module 'TEIImporter', because of a mapping result was 'FAILED'.
Cannot map 'salt:/0/OpusMonte.TEI/opusmonte_en.ana' with module 'TEIImporter', because of a mapping result was 'FAILED'.
An exception was thrown by the mapper threads 'Thread[TEIImporter_mapper(salt:/OpusMonte.TEI/opusmonte_cnr.ana),5,TEIImporter_mapperGroup]'.
org.corpus_tools.pepper.modules.exceptions.PepperModuleXMLResourceException: Cannot read xml-file'file:/D:/Users/k.sipunin/Downloads/OpusMonte.TEI/opusmonte_cnr.ana.xml', because of a nested exception.
at org.corpus_tools.pepper.common.PepperUtil.readXMLResource(PepperUtil.java:661)
at org.corpus_tools.pepper.impl.PepperMapperImpl.readXMLResource(PepperMapperImpl.java:278)
at org.corpus_tools.peppermodules.TEIModules.TEIMapper.mapSDocument(TEIMapper.java:58)
at org.corpus_tools.pepper.impl.PepperMapperControllerImpl.map(PepperMapperControllerImpl.java:251)
at org.corpus_tools.pepper.impl.PepperMapperControllerImpl.run(PepperMapperControllerImpl.java:188)
Caused by: org.corpus_tools.salt.exceptions.SaltInsertionException: Cannot insert object 'lemma=opasni' into container 'SStructureImpl(null)[lemma=opasni], salt::unit=word], ana=mte:Agpfpny]'. Because an id already exists: lemma=opasni.
What might be the problem? And more generally, what is the proper way to encode parallel corpora importable into ANNIS (the presence of a sample here suggests that it’s doable)?
The text was updated successfully, but these errors were encountered:
Hi,
I’m struggling to convert TEI-encoded parallel corpora with Pepper.
The most straightforward approach proposed by TEI seems to involve constructing link groups connecting the aligned linguistic units together. Such is the approach I have witnessed in the Opus-MontenegrinSubs corpus, where along with the English and Montenegrin texts themselves there is a separate file containing nothing but the alignment links:
Additionally, every aligned segment has a
@corresp
attribute pointing to the@xml:id
of its translation equivalent, like this:However, the TEI importer fails to process this corpus with the errors of this kind:
What might be the problem? And more generally, what is the proper way to encode parallel corpora importable into ANNIS (the presence of a sample here suggests that it’s doable)?
The text was updated successfully, but these errors were encountered: