Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Swedish stopwords #195

Merged
merged 3 commits into from
Nov 10, 2023
Merged

Conversation

RobertMartinis
Copy link
Contributor

Adds Swedish stop words for the stemmer.

@miso-belica
Copy link
Owner

miso-belica commented Aug 5, 2023

Hello, thank you for the PR. Can you please explain some words to me? I don't speak Swedish but after translating those words I think some are suspicious and not a suitable for stopwords.

Stopwords

aderton (Not a real stopword)
adertonde (Not a real stopword)
adjö (goodbye)
aldrig (never)
alla (all)
allas (everyone's)
allt (everything)
alltid (always)
alltså (therefore)
andra (other)
andras (others')
annan (another)
annat (another)
artonde (eighteenth) (Not a real stopword)
arton (eighteen) (Not a real stopword)
att (to)
av (of)
bakom (behind)
bara (only)
behöva (need)
behövas (needed)
behövde (needed)
behövt (needed)
beslut (decision)
beslutat (decided)
beslutit (decided)
bland (among)
blev (became)
bli (become)
blir (becomes)
blivit (become)
bort (away)
borta (away)
bra (good)
bäst (best)
bättre (better)
båda (both)
bådas (both's) (Not a real stopword)
dag (day)
dagar (days)
dagarna (the days)
dagen (the day)
de (they, the)
del (part)
delen (the part)
dem (them)
den (the)
denna (this)
deras (their)
dess (its)
dessa (these)
det (it)
detta (this)
dig (you, object form)
din (your)
dina (your)
dit (there)
ditt (your)
dock (though)
dom (they) (informal)
du (you)
där (there)
därför (therefore)
då (then)
e (and) (Not a real stopword)
efter (after)
eftersom (because)
ej (not) (Not a real stopword)
elfte (eleventh) (Not a real stopword)
eller (or)
elva (eleven) (Not a real stopword)
emot (against)
en (a, an, one)
enkel (simple)
enkelt (simply)
enkla (simple) (Not a real stopword)
enligt (according to)
ens (even)
er (your)
era (yours) (Not a real stopword)
ers (yours) (Not a real stopword)
ert (yours) (Not a real stopword)
ett (a, an, one)
ettusen (one thousand) (Not a real stopword)
fanns (was, were) (Not a real stopword)
fem (five) (Not a real stopword)
femte (fifth) (Not a real stopword)
femtio (fifty) (Not a real stopword)
femtionde (fifty) (Not a real stopword)
femton (fifteen) (Not a real stopword)
femtonde (fifteenth) (Not a real stopword)
fick (got) (Not a real stopword)
fin (nice) (Not a real stopword)
finnas (exist) (Not a real stopword)
finns (exist) (Not a real stopword)
fjorton (fourteen) (Not a real stopword)
fjortonde (fourteenth) (Not a real stopword)
fjärde (fourth) (Not a real stopword)
fler (more) (Not a real stopword)
flera (several) (Not a real stopword)

@RobertMartinis
Copy link
Contributor Author

RobertMartinis commented Oct 14, 2023

Hi!

I used the spaCy package's list of swedish stop words as my source, and i believe this list is widely used for Swedish language processing.

@miso-belica
Copy link
Owner

Yes, I know other projects use a lot of stopwords. I always tried to use only the general ones without any meaning at all. I translated some as "eighteen" or "goodbye" which I would omit. But because I have no knowledge of Swedish language we can merge. Maybe someone improves it if needed. Can I ask you for the test too to make sure the language works with all component?

@RobertMartinis
Copy link
Contributor Author

Yes, I know other projects use a lot of stopwords. I always tried to use only the general ones without any meaning at all. I translated some as "eighteen" or "goodbye" which I would omit. But because I have no knowledge of Swedish language we can merge. Maybe someone improves it if needed. Can I ask you for the test too to make sure the language works with all component?

I see, i've removed some stop words now that i belive were not as suitable. I have also added a test for the Swedish stemmer aswell.

@RobertMartinis
Copy link
Contributor Author

Can we merge, or is it something more that needs to be addressed?

Copy link
Owner

@miso-belica miso-belica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, it is fine like this :)

@miso-belica miso-belica merged commit 208b5d1 into miso-belica:main Nov 10, 2023
12 checks passed
@JakobPaulsson
Copy link

Great contribution, good job @RobertMartinis!

@RobertMartinis RobertMartinis deleted the add_swedish branch January 20, 2024 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants