Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎯 poc/string length #5001

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

🎯 poc/string length #5001

wants to merge 1 commit into from

Conversation

damianpumar
Copy link
Contributor

Screenshot 2024-06-12 at 12 58 01

@damianpumar damianpumar changed the base branch from develop to feat/v2.0.0 June 12, 2024 10:58
@damianpumar damianpumar requested a review from jfcalvo June 12, 2024 10:59
Copy link

The URL of the deployed environment for this PR is https://argilla-quickstart-pr-5001-ki24f765kq-no.a.run.app

@@ -0,0 +1,12 @@
def jslen(string):
return int(len(string.encode(encoding='utf_16_le'))/2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is that I don't understand why we need to specify the encoding to be utf_16_le when we are using utf-8 for everything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the problem to solve is to keep aligned string lengths between python and javascript. Spans boundaries are defined in python, and probably will be processed in python. We need to ensure that the same piece of text will be processed in both python and javascript. Here is an article about span processing

https://betterprogramming.pub/slicing-strings-containing-emoji-differences-between-python-and-javascript-4716c419718f

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team, this feat does not work with utf 8

Base automatically changed from feat/v2.0.0 to develop June 19, 2024 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants