Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UdifyTextPredictor fails when output_conllu=true #22

Open
ranjita-naik opened this issue Jan 14, 2021 · 4 comments
Open

UdifyTextPredictor fails when output_conllu=true #22

ranjita-naik opened this issue Jan 14, 2021 · 4 comments

Comments

@ranjita-naik
Copy link

I'm feeding this raw input to the predict.py - "Il est assez sûr de lui pour danser et chanter en public ." by setting --raw_text flag and since I want the output in CoNLLU format, I've set output_conllu=True in UdifyTextPredictor.

The dump_line in UdifyPredictor is erroring out.

File udify/udify/predictors/text_predictor.py", line 63, in dump_line
return self.predictor.dump_line(outputs)
File udify/udify/predictors/predictor.py", line 82, in dump_line
multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]]
File udify/udify/predictors/predictor.py", line 82, in
multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]]
File udify/udify/predictors/predictor.py", line 82, in
multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]]
ValueError: invalid literal for int() with base 10: 'N'

Could you please take a look?

Thanks,
Ranjita

@Hyperparticle
Copy link
Owner

Sorry for the late reply. I think there might be a bug in how the multiword IDs are handled. In this case, you don't have any multiword IDs because you input raw text. Can you try commenting out the block starting with if outputs["multiword_ids"]:?

@huberemanuel
Copy link

I can relate to the same problem, even with the suggested solution the error persists.

@gifdog97
Copy link

I also came across this issue.
The problem is that outputs["multiword_ids"] is "None" (str), not None. Due to this, the condition if outputs["multiword_ids"]: is always True even if there's no multiword ids actually.
That is, even if there's no multiword in a predicted tree, the following code block is executed, causing Error because it tries to apply int() to string 'N', the first letter of "None".

if outputs["multiword_ids"]:
multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]]
multiword_forms = outputs["multiword_forms"]
multiword_map = {start: (id_, form) for (id_, start, end), form in zip(multiword_ids, multiword_forms)}

I think the error should be removed by commenting out these four lines.

@gifdog97
Copy link

gifdog97 commented Oct 31, 2021

But actually I found another problem... outputs["ids"] is also "None" (str) somehow, generating weird conllu as a result:

N	Un	uno	DET	_	Definite=Ind|Gender=Masc|Number=Sing|PronType=Art	2	det	_	_
o	oppioide	oppioide	NOUN	_	Gender=Masc|Number=Sing	6	nsubj	_	_
n	è	essere	AUX	_	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	6	cop	_	_
e	un	uno	DET	_	Definite=Ind|Gender=Masc|Number=Sing|PronType=Art	6	det	_	_

We can temporarily fix it by using instead the list with the length of sentence [1,2,...,n], but I think the essential issue is that the outputs['ids'] maps to an unexpected value..
And this might be related to the issue I posted as well (not for sure). Could you check it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants