Implementation on bigdata #132
Unanswered
St-mlengineer
asked this question in
Q&A
Replies: 1 comment 1 reply
-
SymSpell can process about 10,000 words per second, on a single processor core. With multithreading up to 100,000 words per second. Of course it depends on
Which SymSpell port/language you are using? Python is known for its slow execution times as an interpreted language. For big data you should use C#, C++, Java or Rust instead of Python. Also, make sure that LoadDictionary() is called only once, when you are initializing SymSpell - but NOT every single time before you do a Lookup() spelling correction. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to implement symspell on a large dataset consisting of text strings. The udf is running infinitely on databricks cluster. Is there a way to make it faster?
Beta Was this translation helpful? Give feedback.
All reactions