UdifyTextPredictor fails when output_conllu=true #22

ranjita-naik · 2021-01-14T18:07:38Z

I'm feeding this raw input to the predict.py - "Il est assez sûr de lui pour danser et chanter en public ." by setting --raw_text flag and since I want the output in CoNLLU format, I've set output_conllu=True in UdifyTextPredictor.

The dump_line in UdifyPredictor is erroring out.

File udify/udify/predictors/text_predictor.py", line 63, in dump_line
return self.predictor.dump_line(outputs)
File udify/udify/predictors/predictor.py", line 82, in dump_line
multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]]
File udify/udify/predictors/predictor.py", line 82, in
multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]]
File udify/udify/predictors/predictor.py", line 82, in
multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]]
ValueError: invalid literal for int() with base 10: 'N'

Could you please take a look?

Thanks,
Ranjita

Hyperparticle · 2021-02-06T19:30:49Z

Sorry for the late reply. I think there might be a bug in how the multiword IDs are handled. In this case, you don't have any multiword IDs because you input raw text. Can you try commenting out the block starting with if outputs["multiword_ids"]:?

huberemanuel · 2021-06-11T00:28:54Z

I can relate to the same problem, even with the suggested solution the error persists.

gifdog97 · 2021-10-31T08:47:51Z

I also came across this issue.
The problem is that outputs["multiword_ids"] is "None" (str), not None. Due to this, the condition if outputs["multiword_ids"]: is always True even if there's no multiword ids actually.
That is, even if there's no multiword in a predicted tree, the following code block is executed, causing Error because it tries to apply int() to string 'N', the first letter of "None".

udify/udify/predictors/predictor.py

Lines 81 to 84 in 18d63ac

    
           if outputs["multiword_ids"]: 
        
               multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]] 
        
               multiword_forms = outputs["multiword_forms"] 
        
               multiword_map = {start: (id_, form) for (id_, start, end), form in zip(multiword_ids, multiword_forms)}

I think the error should be removed by commenting out these four lines.

gifdog97 · 2021-10-31T08:52:53Z

But actually I found another problem... outputs["ids"] is also "None" (str) somehow, generating weird conllu as a result:

N	Un	uno	DET	_	Definite=Ind|Gender=Masc|Number=Sing|PronType=Art	2	det	_	_
o	oppioide	oppioide	NOUN	_	Gender=Masc|Number=Sing	6	nsubj	_	_
n	è	essere	AUX	_	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	6	cop	_	_
e	un	uno	DET	_	Definite=Ind|Gender=Masc|Number=Sing|PronType=Art	6	det	_	_

We can temporarily fix it by using instead the list with the length of sentence [1,2,...,n], but I think the essential issue is that the outputs['ids'] maps to an unexpected value..
And this might be related to the issue I posted as well (not for sure). Could you check it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UdifyTextPredictor fails when output_conllu=true #22

UdifyTextPredictor fails when output_conllu=true #22

ranjita-naik commented Jan 14, 2021

Hyperparticle commented Feb 6, 2021

huberemanuel commented Jun 11, 2021

gifdog97 commented Oct 31, 2021

gifdog97 commented Oct 31, 2021 •

edited

Loading

UdifyTextPredictor fails when output_conllu=true #22

UdifyTextPredictor fails when output_conllu=true #22

Comments

ranjita-naik commented Jan 14, 2021

Hyperparticle commented Feb 6, 2021

huberemanuel commented Jun 11, 2021

gifdog97 commented Oct 31, 2021

gifdog97 commented Oct 31, 2021 • edited Loading

gifdog97 commented Oct 31, 2021 •

edited

Loading