French morphologizer mislabeling future, conditional, imperative #13717
Replies: 1 comment
-
You could try the pipeline i've trained, which is available here: https://github.com/thjbdvlt/solipCysme I've made it because I have the same issue you. It's trained mostly on novels (19e-21e) and texts with a lot of interactions, personnal pronouns and differents moods. The morphologizer uses HunSpell output ( It has some limits. First, there is no Of course I could provide some information if you need it. Let me know! |
Beta Was this translation helpful? Give feedback.
-
Hello,
I'm using spaCy to model French conversations, and I see that the morphologizer is not performing as well as I'd expect for unambiguous irrealis forms (specifically future tense, conditional mood, imperatives). I understand the underlying reason is probably that these forms aren't frequent in the training data, but are there any potential updates or recommended workarounds?
For example:
POS/TAG
). This is similar to [French morphologizer] Mislabelisation of Mood=Imp|Number=Sing|Tense=Present #8147, but broader in that "Remplacez" is unambiguously a verb.MORPH
containsMood=Imp
,Tense=Pres
), even though it is unambiguously the future.MORPH
containsMood=Ind
,Tense=Fut
), even though it is unambiguously the conditional.I'm working with a set of ~160 common French verbs and tested their whole paradigms in this way. 98% of infinitives are recognized correctly, but only 13% of second person plural imperatives (34% even had incorrect POS like in the example above), 37% of future tense, and 7% of conditional mood forms. Sure enough, I see that these three categories are uncommon in the UD French Sequoia data.
How to reproduce the behaviour
Info about spaCy
spaCy version: 3.8.2
Python version: 3.11.9
Pipelines: fr_core_news_sm (3.8.0)
Beta Was this translation helpful? Give feedback.
All reactions