Skip to content

Conversation

@kareglazie
Copy link

Issue #324

Copy link
Owner

@pemistahl pemistahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • There is a copy of the accuracy reports directory in src. Please remove it. This is not where it should be located.
  • Please restrict the number of lines in the test data files to 1000 lines per file. This is totally sufficient to test the accuracy of a language model.

.gitignore Outdated
/pkg/
/target/
**/*.rs.bk
src/lingua_notes.ipynb
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this line. It refers to a file that is not available in this PR.

Cargo.toml Outdated
[package]
name = "lingua"
version = "1.5.0"
version = "1.5.1"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New language models satisfy a jump to version 1.6.0.

Cargo.toml Outdated
Comment on lines 142 to 157
lingua-amharic-language-model = { path = "language-models/am", version = "1.1.0", optional = true }
lingua-burmese-language-model = { path = "language-models/my", version = "1.1.0", optional = true }
lingua-chechen-language-model = { path = "language-models/ce", version = "1.1.0", optional = true }
lingua-kyrgyz-language-model = { path = "language-models/ky", version = "1.1.0", optional = true }
lingua-malayalam-language-model = { path = "language-models/ml", version = "1.1.0", optional = true }
lingua-nepali-language-model = { path = "language-models/ne", version = "1.1.0", optional = true }
lingua-pashto-language-model = { path = "language-models/ps", version = "1.1.0", optional = true }
lingua-sanskrit-language-model = { path = "language-models/sa", version = "1.1.0", optional = true }
lingua-sinhala-language-model = { path = "language-models/si", version = "1.1.0", optional = true }
lingua-sindhi-language-model = { path = "language-models/sd", version = "1.1.0", optional = true }
lingua-tatar-language-model = { path = "language-models/tt", version = "1.1.0", optional = true }
lingua-tajik-language-model = { path = "language-models/tg", version = "1.1.0", optional = true }
lingua-turkmen-language-model = { path = "language-models/tk", version = "1.1.0", optional = true }
lingua-uzbek-language-model = { path = "language-models/uz", version = "1.1.0", optional = true }
lingua-lao-language-model = { path = "language-models/lo", version = "1.1.0", optional = true }
lingua-khmer-language-model = { path = "language-models/km", version = "1.1.0", optional = true }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a language model is completely new, then the initial version should be 1.0.0.

Cargo.toml Outdated
Comment on lines 199 to 201
"welsh", "xhosa", "yoruba", "zulu", "chechen", "amharic", "burmese",
"kyrgyz", "nepali", "malayalam", "pashto", "sanskrit", "sinhala",
"tatar", "tajik", "turkmen", "uzbek", "sindhi", "lao", "khmer"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort all languages alphabetically, here and everywhere else in the project.

Comment on lines 856 to 857
// Some(WhatlangLanguage::Nob) => Some(Language::Bokmal),
Some(WhatlangLanguage::Nob) => Some(Language::Norwegian),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no entry for Norwegian in the Language enum anymore. So this will probably not compile at all. Please remove this line again and replace it with Bokmal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello again, I've made some changes according to your review a while ago, was not sure if you'd received the notification

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I haven't had time so far to continue working on my open source projects. Been busy with my job and family. But I did receive your notification and will look at it again as soon as I find the time.

@kareglazie
Copy link
Author

Thanks a lot for your review! I've made corrections to comply with your remarks.

@flyingleafe
Copy link

@pemistahl @kareglazie What is the status of that PR? I consider it very useful, especially given that it includes languages with unique writing systems (Amharic, Khmer, Lao, Sinhala, etc.), which are detected with 100% accuracy due to that fact, and it should be a no-brainer to include those in the list of supported languages.

#[cfg(feature = "slovak")]
Language::Slovak => Some("Ĺ弾Ŕŕ"),

#[cfg(feature = "spanish")]
Language::Spanish => Some("¿¡"),

#[cfg(feature = "turkmen")]
Language::Turkmen => Some("ň"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ň is not specific only to Turkmen

is a letter in the Czech, Slovak and Turkmen alphabets
https://en.wikipedia.org/wiki/%C5%87

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants