You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am using ufal.udpipe for Python for processing files encoded in xml format. There is an example part of my input file:
<p>Звери</p>
<p>***</p>
<p>Посадили утку в курятник. Она ходит себе, оглядывается. Потом спрашивает:</p>
<p>-- Господа, а где здесь пруд?</p>
<p>А сверху, с насеста:</p>
<p>-- Мадам, здесь не прут, здесь топчут!</p>
Is it possible to store xml tags in SpaceAfter and SpaceBefore fields? The manual mentions:
Note that in theory not only spaces, but also other original content can be saved in this way (for example XML tags if the input was encoded in a XML file).
Unfortunately, I couldn't find any directions on how to do that. Please, can anyone give any suggestions?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
Currently there is no tokenizer capable of doing it automatically (i.e., you would need to parse the XML, pass only the raw text through UDPipe tokenizer and then manually add the tags to SpacesBefore/SpacesAfter). The functionality is planned for UDPipe 3 where we overhaul the tokenization completely.
Hi,
I am using ufal.udpipe for Python for processing files encoded in xml format. There is an example part of my input file:
Is it possible to store xml tags in SpaceAfter and SpaceBefore fields? The manual mentions:
Unfortunately, I couldn't find any directions on how to do that. Please, can anyone give any suggestions?
Thanks in advance!
The text was updated successfully, but these errors were encountered: