New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

データセットごとに文字数制限を追加する #18

Open

lsz05 opened this issue Apr 19, 2024 · 0 comments

Collaborator

lsz05 commented Apr 19, 2024 •

edited

Loading

OpenAIEmbedderにおいて，インプットのtoken数制限が設けられています。
そのため，事前にtoken truncationを行う必要があり，文字数を制限しないまま全てencoderに入れると，処理速度が低下します。
#17 (comment)

そして，文字数が0（空ストリング）の場合も事前に処理しておくことが望ましいです。特にOpenAI text embedding APIに対しては #15 の処理がまだ危ないので，ダブルチェックとしてスペース以外の文字を入れておこうと思います。

The text was updated successfully, but these errors were encountered:

lsz05 mentioned this issue

[Fix] OpenAIEmbedderにおけるempty stringの処理 #19

Merged

1 task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment