-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better sanitiziation of analysis results #1162
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general question: Isn't this sanitation in the wrong place?
Should it be needed at all?
As far as I understand plugin results can contain invalid things, so they are sanitized.
But shouldn't this sanitation be done by the plugin?
How does the sanitation make sure that the plugin results stay valid?
Regarding #1161 this fix would make the result invalid, wouldn't it?
# saved in the PostgreSQL database | ||
json_string = json.dumps(string) | ||
if JSON_UNICODE_REGEX.search(json_string): | ||
logging.warning(f'Sanitizing unicode characters in string {json_string[100:]}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the [100:]
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string could be very long (there are some big analysis results which are stored as strings) and I thought not printing a few MBs worth of string would generally be a good idea
I generally agree but it isn't that easy to sanitize the string (it is a regular string but with unicode characters, so it is in fact valid JSON -- just the database can't handle it) and I don't think this should be handled by each plugin individually either. Also we have the potential problem of custom plugins which we can't control. Maybe it could be handled in the analysis plugin base class but I don't know if that is a better place than the module which explicitly handles data conversion for the database |
This seems to only be an issue if the database encoding is not set to "UTF8". My database has this encoding (you can test it with the command |
Has not become an issue. Will close. |
@dorpvom please see my reply in original issue. |
resolves #1161