You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The result is: [{"body":"30m","start":0,"value":{"value":30,"type":"value","unit":"metre"},"end":3,"dim":"distance","latent":false},{"body":"30m","start":0,"value":{"value":30000000,"type":"value"},"end":3,"dim":"number","latent":false}]
The second result body is incorrect because '30m' cannot mean '30 million' under locale de_DE. In German, 'million' is abbreviated 'Mio.' (see Wikipedia). Unlike in English, 'million' is never abbreviated "M" (let alone "m") as this would be too ambiguous ("Meter" vs. "Millionen" vs. "Milliarden").
Variation: dim=number
Let's change our request to restrict the dim to 'number':
Conversely, if I query '30 Mio.' (the correct abbreviation for 'millions'), I get [{"body":"30","start":0,"value":{"value":30,"type":"value"},"end":2,"dim":"number","latent":false}], which is also incorrect.
Steps to reproduce
curl -XPOST http://0.0.0.0:8000/parse --data 'locale=de_DE&text=30m'
[{"body":"30m","start":0,"value":{"value":30,"type":"value","unit":"metre"},"end":3,"dim":"distance","latent":false},{"body":"30m","start":0,"value":{"value":30000000,"type":"value"},"end":3,"dim":"number","latent":false}]
Expected result:
[{"body":"30m","start":0,"value":{"value":30,"type":"value","unit":"metre"},"end":3,"dim":"distance","latent":false}
What's wrong
The second result body is incorrect because '30m' cannot mean '30 million' under locale de_DE. In German, 'million' is abbreviated 'Mio.' (see Wikipedia). Unlike in English, 'million' is never abbreviated "M" (let alone "m") as this would be too ambiguous ("Meter" vs. "Millionen" vs. "Milliarden").
Variation: dim=number
Let's change our request to restrict the dim to 'number':
curl -XPOST http://0.0.0.0:8000/parse --data 'locale=de_DE&dims="["number"]"&text=30m'
[{"body":"30m","start":0,"value":{"value":30000000,"type":"value"},"end":3,"dim":"number","latent":false}]
Now, the expected behavior would be to simply ignore the 'm' and extract the number 30.
The text was updated successfully, but these errors were encountered: