LTD-5730-eu-xml-bad-chars #2287

depsiatwal · 2024-12-27T15:52:51Z

https://uktrade.atlassian.net/browse/LTD-5730

This is attempting to prevent an issue that manifests itself when the EU XML failed causing a P1.

This is the because the end user address typically has ASCII -- Nonprintable Characters
which has accidentally made it's way in during cut/paste in the address field.

I have for now added the validation to address/name as these have been the most problematic.

depsiatwal · 2024-12-27T15:54:10Z

exporter/core/validators.py

+
+class SpecialCharacterStringValidator:
+    message = "There's a invalid special charactor in this field."
+    regex_string = r"^[\000-\031]"


looking for comments around this as this may not be the best regex
doesn't cover everything in the table. But for some reason couldn't get the more generic ones to work as expected.

depsiatwal · 2024-12-30T18:48:21Z

Since These Characters can't be seen by the end user.
Rather than validation errors this has been changed to silently remove these bad characters.

markj0hnst0n · 2024-12-31T16:30:33Z

exporter/applications/forms/parties.py

+    def clean_address(self):
+        address = self.cleaned_data["address"]
+        return remove_non_printable_characters(address)
+
    def get_layout_fields(self):


Should we be adding these for exporter address and name inputs too? Does that have an impact on the xml data?

Those form components are different during registration .
I guess potentially it's possible but never seen the issue and the risk is much lower since only non uk based companies can free enter the address.

I guess the risk here comes from the fact they copy and paste party details from other forms during Licence submission.

I understand. You're probably doing the right thing in only fixing one thing at a time. Gives comfort that you've investigated though.

markj0hnst0n · 2025-01-02T11:48:58Z

core/helpers.py

+
+
+def remove_non_printable_characters(str):
+    return "".join([c for c in str if ord(c) > 31 or ord(c) in [9, 10, 13]])


Cleaning the data is the best approach I can think of as the user probably wouldn't know how to react to the validation message.

If I understand this part correctly though it seems to me that the tab, newline and carriage return ascii are being removed completely rather than being replaced by something that can be parsed in python e.g spaces, /n or /r

I know that this is to output correctly in xml but I'm think our data should show as an accurate representation of what the user inputs maybe?

no it's the other way round anything less 31 with the exception of [9, 10, 13] is removed from printable chars.
If you have a look in the tests it confirms these chars stay but everything else is removed.

Anything less than 31 and 9, 10 and 13 is being removed but they aren't being replaced with anything. Your test does confirm this but what I'm saying is that it might better represent user input if they are replaced with equivalent python parseable version of what the ascii char represents

No still not correct anything less then 31 is removed (because these are meaningless unix chars i.e /x02

how we want [9, 10, 13] to stay as this are carriage return linefeed , tab etc and hence stays as is.

markj0hnst0n · 2025-01-02T16:02:33Z

unit_tests/exporter/applications/forms/test_parties.py

+    form = parties.EndUserAddressForm(request=request, data=data)
+
+    form.is_valid()
+


Is there an assert missing for the expected variable here here?

no this test doesn't check if form is valid as everything is valid but we need to call .is_valid() so I can access cleaned_data and check the output.

add validator

130a526

depsiatwal commented Dec 27, 2024

View reviewed changes

changed so it cleans data

84faff4

depsiatwal force-pushed the LTD-5730-eu-xml-bad-chars branch from 0a2ff7a to 84faff4 Compare December 31, 2024 09:42

markj0hnst0n reviewed Dec 31, 2024

View reviewed changes

markj0hnst0n reviewed Jan 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LTD-5730-eu-xml-bad-chars #2287

LTD-5730-eu-xml-bad-chars #2287

depsiatwal commented Dec 27, 2024

depsiatwal Dec 27, 2024

depsiatwal commented Dec 30, 2024

markj0hnst0n Dec 31, 2024

depsiatwal Jan 2, 2025

markj0hnst0n Jan 2, 2025

markj0hnst0n Jan 2, 2025

depsiatwal Jan 2, 2025

markj0hnst0n Jan 2, 2025

depsiatwal Jan 3, 2025

markj0hnst0n Jan 2, 2025

depsiatwal Jan 3, 2025



		def remove_non_printable_characters(str):
		return "".join([c for c in str if ord(c) > 31 or ord(c) in [9, 10, 13]])

		form = parties.EndUserAddressForm(request=request, data=data)

		form.is_valid()

LTD-5730-eu-xml-bad-chars #2287

Are you sure you want to change the base?

LTD-5730-eu-xml-bad-chars #2287

Conversation

depsiatwal commented Dec 27, 2024

Choose a reason for hiding this comment

depsiatwal commented Dec 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment