Skip to content

Commit

Permalink
fix: correct WhitespaceSplit Pretokenizer handling of invisible space…
Browse files Browse the repository at this point in the history
… chars
  • Loading branch information
CodeWithKyrian committed Sep 13, 2024
1 parent e8a8a9a commit 6ec3e3e
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion src/PreTokenizers/WhitespaceSplit.php
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ public function __construct(protected array $config)

public function preTokenizeText(string|array $text, array $options): array
{
return explode(' ', $text);
preg_match_all('/\S+/', $text, $matches);

return $matches[0] ?? [];
}
}

0 comments on commit 6ec3e3e

Please sign in to comment.