Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queued Mail Delivery Sets Wrong Charset #41

Open
treinhard opened this issue Aug 11, 2017 · 4 comments
Open

Queued Mail Delivery Sets Wrong Charset #41

treinhard opened this issue Aug 11, 2017 · 4 comments

Comments

@treinhard
Copy link
Contributor

Sending an email via pyramid_mailer's send_to_queue() and then delivering it with the qp console script results in an incorrect charset of us-ascii in the Content-Type on Python 3.6.2.

The initial Message with unicode content is correctly encoded with iso-8859-1 and the charset is set on the message:

>>> from pyramid_mailer.message import Message
>>> msg = Message(subject='Test', sender='[email protected]', recipients=['[email protected]'], body='Test französisches Email')
>>> msg = msg.to_message()
>>> msg.get_charset()
'iso-8859-1'
>>> msg.as_string()
'Content-Type: text/plain; charset="iso-8859-1"\nMIME-Version: 1.0\nContent-Transfer-Encoding: quoted-printable\nFrom: [email protected]\nSubject: Test\nTo: [email protected]\nContent-Disposition: inline\n\nTest=20franz=F6sisches=20Email'

The content is written to a file and later (during delivery) parsed in the QueueProcessor:

>>> from email.parser import Parser
>>> from io import StringIO
>>> msg = parser.parse(StringIO(msg.as_string()))
>>> print(msg.get_charset())
None
>>> msg.get_content_charset()
'iso-8859-1'
>>> msg.as_string()
'Content-Type: text/plain; charset="iso-8859-1"\nMIME-Version: 1.0\nContent-Transfer-Encoding: quoted-printable\nFrom: [email protected]\nSubject: Test\nTo: [email protected]\nContent-Disposition: inline\n\nTest=20franz=F6sisches=20Email'

Message looks ok (except that msg.get_charset() returns now None). SMTPMailer.send() runs this message through repoze.sendmail.encoding.cleanup_message() which replaces the initial message charset of iso-8859-1 with us-ascii:

>>> from repoze.sendmail.encoding import cleanup_message
>>> msg = cleanup_message(msg)
>>> msg.get_charset()
'us-ascii'
>>> msg.get_content_charset()
'us-ascii'
>>> msg.as_string()
'MIME-Version: 1.0\nContent-Transfer-Encoding: quoted-printable\nFrom: [email protected]\nSubject: Test\nTo: [email protected]\nContent-Disposition: inline\nContent-Type: text/plain; charset="us-ascii"\n\nTest=20franz=F6sisches=20Email'

The message.get_charset() call at https://github.com/repoze/repoze.sendmail/blob/master/repoze/sendmail/encoding.py#L74 returns None (because the charset is None after parsing the message from the file). The fallback on the following lines results in a us-ascii encoding because the message is already encoded.

The result is a message with Content-Type: text/plain; charset="us-ascii" containing iso-8859-1 encoded content.

@tseaver
Copy link
Member

tseaver commented Aug 11, 2017

@treinhard Thanks for the careful analysis of the problem!

@unikmhz
Copy link

unikmhz commented Sep 21, 2017

Experiencing the same issue.
Couple of differences in my case:

  • Original message is multipart/alternative, consisting of one plaintext and one HTML parts.
  • Both have charset="utf-8" appended to content-type.
  • Sending is done via SMTPMailer+QueueProcessor directly, instead of using qp utility.
  • Resulting delivered mail had charset="us-ascii" in content-type for both parts.
  • When not using queued delivery (via pyramid_mailer's .send() method), headers remain unaltered.

lugensa pushed a commit to lugensa/repoze.sendmail that referenced this issue Apr 17, 2018
@WuShell
Copy link

WuShell commented Feb 20, 2019

I've hit this same problem too.

Looking at pending pull requests for the project, found a couple of them that seem related to this issue:

#26 (specially this one seems good)

#25

But both seem to be there for a long, long time without reply :-(

WuShell added a commit to WuShell/repoze.sendmail that referenced this issue Feb 20, 2019
Fixes the bug described here: repoze#41

Rationale:

get_content_charset() has been there in email.message.Message for a long
time now, returning the charset of the body of the Message (which is what we
are looking for there).

get_charset() has been deprecated for some time now, returns None by default
(even if a charset is set in the Content-Type header).

There are some other ways to retrieve the charset (get_params(),
get_param('charset'), message['Content-Type'].params) but get_content_charset()
is the only one that exists through all existing python versions, and it is
there even in the new email.message.EmailMessage class.
@WuShell
Copy link

WuShell commented Feb 21, 2019

I've made a PR that fixes this bug, here: #43 @treinhard, @unikmhz I'd be glad if you can give it a try and confirm if it works for you (thanks!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants