Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instruction for ElasticSearch setup for Chinese support no longer current #1428

Open
mhkhung opened this issue Apr 6, 2024 · 1 comment
Open
Labels

Comments

@mhkhung
Copy link

mhkhung commented Apr 6, 2024

Steps to reproduce the problem

Try to follow the setup here:
https://docs.joinmastodon.org/admin/elasticsearch/#search-optimization-for-other-languages

The current code does not match the diff.

Expected behaviour

Docs can be followed

Actual behaviour

Diff no longer valid

Detailed description

The diff is no longer current. It's unclear how this can be fixed and to fix existing indexes.
Also, code-level patch is very undesired for administrators - does the patch need to be there all the time or just when the index is created? I do not want to maintain a fork of the code with all the recent security issues - can't this be handled with code/config?

Mastodon instance

No response

Mastodon version

main-latest

Technical details

If this is happening on your own Mastodon server, please fill out those:

  • Ruby version: (from ruby --version, eg. v3.1.2)
  • Node.js version: (from node --version, eg. v18.16.0)
@mhkhung mhkhung added the bug label Apr 6, 2024
@renchap renchap transferred this issue from mastodon/mastodon Apr 7, 2024
@mogita
Copy link

mogita commented Sep 22, 2024

For what it's worth, here's the patch I came down from v4.2.12. Not a pro of ElasticSearch here, just copied everything from the current docs and it got my server (kind of) working.

BTW I'm using it in my mastodon devops setup, link for anyone who's interested.

diff --git a/app/chewy/accounts_index.rb b/app/chewy/accounts_index.rb
--- a/app/chewy/accounts_index.rb
+++ b/app/chewy/accounts_index.rb
@@ -23,7 +23,7 @@ class AccountsIndex < Chewy::Index
 
     analyzer: {
       natural: {
-        tokenizer: 'standard',
+        tokenizer: 'ik_max_word',
         filter: %w(
           lowercase
           asciifolding
@@ -36,7 +36,7 @@ class AccountsIndex < Chewy::Index
       },
 
       verbatim: {
-        tokenizer: 'standard',
+        tokenizer: 'ik_max_word',
         filter: %w(lowercase asciifolding cjk_width),
       },
 
diff --git a/app/chewy/statuses_index.rb b/app/chewy/statuses_index.rb
--- a/app/chewy/statuses_index.rb
+++ b/app/chewy/statuses_index.rb
@@ -21,14 +21,23 @@ class StatusesIndex < Chewy::Index
       },
     },
 
+    char_filter: {
+      tsconvert: {
+        type: 'stconvert',
+        keep_both: false,
+        delimiter: '#',
+        convert_type: 't2s',
+      },
+    },
+
     analyzer: {
       verbatim: {
-        tokenizer: 'uax_url_email',
+        tokenizer: 'ik_max_word',
         filter: %w(lowercase),
       },
 
       content: {
-        tokenizer: 'standard',
+        tokenizer: 'ik_max_word',
         filter: %w(
           lowercase
           asciifolding
@@ -38,6 +47,7 @@ class StatusesIndex < Chewy::Index
           english_stop
           english_stemmer
         ),
+        char_filter: %w(tsconvert),
       },
 
       hashtag: {
diff --git a/app/chewy/tags_index.rb b/app/chewy/tags_index.rb
--- a/app/chewy/tags_index.rb
+++ b/app/chewy/tags_index.rb
@@ -4,15 +4,25 @@ class TagsIndex < Chewy::Index
   include DatetimeClampingConcern
 
   settings index: index_preset(refresh_interval: '30s'), analysis: {
+    char_filter: {
+      tsconvert: {
+        type: 'stconvert',
+        keep_both: false,
+        delimiter: '#',
+        convert_type: 't2s',
+      },
+    },
+
     analyzer: {
       content: {
-        tokenizer: 'keyword',
+        tokenizer: 'ik_max_word',
         filter: %w(
           word_delimiter_graph
           lowercase
           asciifolding
           cjk_width
         ),
+        char_filter: %w(tsconvert),
       },
 
       edge_ngram: {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants