-
-
Notifications
You must be signed in to change notification settings - Fork 160
[Store] Add RecursiveCharacterTextTransformer
#1258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: asrar <[email protected]>
| * @param non-empty-list<string> $separators | ||
| * @return iterable<string> | ||
| */ | ||
| private static function splitText(string $text, array $separators, int $chunkSize): iterable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
langchain seems to do this with regex, maybe that's faster and more flexible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using UnicodeString?
RecursiveCharacterTextTransformer
| use Symfony\AI\Store\Document\TransformerInterface; | ||
| use Symfony\Component\Uid\Uuid; | ||
|
|
||
| class RecursiveCharacterTextTransformer implements TransformerInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add @author
| public function transform( | ||
| iterable $documents, | ||
| array $options = [], | ||
| ): iterable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public function transform( | |
| iterable $documents, | |
| array $options = [], | |
| ): iterable { | |
| public function transform(iterable $documents, array $options = []): iterable | |
| { |
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| { | ||
| #[Test] | ||
| #[DataProvider('provideDocumentsContents')] | ||
| public function it_works(array $inputDocumentsText, array $options, array $expectedSplittedTexts): void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public function it_works(array $inputDocumentsText, array $options, array $expectedSplittedTexts): void | |
| public function ...(array $inputDocumentsText, array $options, array $expectedSplittedTexts): void |
Please use camelCase and be more descriptive
|
Do you want to finish this PR? |
I became a bit busy this month so if anyway wants to pick this up, feel free to do so. |
Not an good implementation , just exploring