-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support setting language on posts #579
base: main
Are you sure you want to change the base?
Conversation
LGTM You can also update the content_vector_gin to use the new |
@pauloxnet interesting. just to make sure that I'm getting this right since the Mastodon languages are stored as 2-char strings (ISO-639-1, |
It seems right. In the Django Project code I used a dictionary to map 2 characters long iso languages into language names for PostgreSQL config. Maybe there's a similar way to map language code into config names without an additional fiepd? |
I think the problem with changing the SearchVector language is that it's embedded into the index, is it not? We can't have 20-odd indexes on the content of posts, one for each language, and doing a search query without an index for it sounds painful. |
Actually, I think you can have an index based on the language stored in a column in the same table, but I'd leave the change until after this PR is merged. |
Did a bit more research on adding the -- adding a column of type `regconfig` to store each record's tsvector config
ALTER TABLE activities_post ADD COLUMN tsvector_config regconfig DEFAULT 'simple';
-- creating the GIN index
DROP INDEX content_vector_gin;
CREATE INDEX content_vector_gin ON activities_post USING GIN (to_tsvector(tsvector_config, content)); |
The SQL code is ok for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the migration - let's get this landed without it, and we'll figure that out separately. The new Django index objects should support this, but if it's complex or weird enough then we'll just do a raw SQL one, since we only support PostgreSQL anyway.
users/views/settings/posting.py
Outdated
[ | ||
(lang.alpha_2, lang.name) | ||
for lang in pycountry.languages | ||
if hasattr(lang, "alpha_2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably good enough for now - I think it's how most of the other clients work - but we might need to add support for separating the various Spanish dialects in future, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the Mastodon docs are pretty explicit that it's the two-letter language code. Not sure how other clients would behave if the language code would be different. Already increasing the max length of the database field sounds safe enough to do already 👍
df29a43
to
b6f9df6
Compare
@chdorner since the |
@chdorner @AstraLuma this is absolutely great feature. any chance get this updated / merged? happy to do anything I can to help. |
@@ -1152,7 +1174,6 @@ def to_mastodon_json(self, interactions=None, bookmarks=None, identity=None): | |||
if isinstance(self.type_data, QuestionData) | |||
else None, | |||
"card": None, | |||
"language": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be updated to forward the stored language?
I think this is everything that's needed to implement Mastodon's post language feature.
language
key in the request.contentMap
lang
attribute on the post contentNote on client compatibility:
posting:default:language
preference, but uses its interface language as the default value when creating a new post