You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current ingestion implementation will take invalid utf-8 code sequences and replace them with a placeholder utf-8 sequence (this follows the behaviour of get_string(true) in simdjson). This allows us to automatically handle invalid utf-8, and ensure that archives always contain valid utf-8 data.
However, some users may want to instead fail ingestion when encountering invalid utf-8 so that they can make fixes upstream (or maybe replace the utf-8 with a placeholder and allow ingestion to succeed but somehow notify the user).
Possible implementation
Allow users to pass a flag to ingestion indicating that they want to fail on invalid utf-8
Fail ingestion and notify user where invalid utf-8 was encountered
The text was updated successfully, but these errors were encountered:
Request
The current ingestion implementation will take invalid utf-8 code sequences and replace them with a placeholder utf-8 sequence (this follows the behaviour of
get_string(true)
in simdjson). This allows us to automatically handle invalid utf-8, and ensure that archives always contain valid utf-8 data.However, some users may want to instead fail ingestion when encountering invalid utf-8 so that they can make fixes upstream (or maybe replace the utf-8 with a placeholder and allow ingestion to succeed but somehow notify the user).
Possible implementation
The text was updated successfully, but these errors were encountered: