You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently received a spam containing a non-breaking space (encoded as =C2=A0 in quoted-printable UTF-8 if that is relevant). When running pyzor predigest, the non-breaking space is kept in the predigest output. I have no idea if spammers do this but they could randomly replace spaces with non-breaking spaces before sending mail to generate a different fingerprint each time and evade detection.
I believe that simply changing
ws_ptrn=re.compile(r'\s')
to
ws_ptrn=re.compile(r'\s', flags=re.UNICODE)
would address this (including all the other unicode space characters), but at the cost of breaking compatibility with signatures from older versions of pyzor.
The text was updated successfully, but these errors were encountered:
I recently received a spam containing a non-breaking space (encoded as =C2=A0 in quoted-printable UTF-8 if that is relevant). When running pyzor predigest, the non-breaking space is kept in the predigest output. I have no idea if spammers do this but they could randomly replace spaces with non-breaking spaces before sending mail to generate a different fingerprint each time and evade detection.
I believe that simply changing
to
would address this (including all the other unicode space characters), but at the cost of breaking compatibility with signatures from older versions of pyzor.
The text was updated successfully, but these errors were encountered: