-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_encoding_from_headers
fails if charset name not specified
#6646
Comments
Hello @batterseapower Best Regards |
I wonder if I have checked RFC 2045, RFC 2616, RFC 7231, RFC 9110 and they all define a parameter as essentially Comparing what some other implementations do: >>> from mimeparse import parse_mime_type
>>> parse_mime_type("application/json; charset")
('application', 'json', {}) stdlib >>> def parse_content_type(content_type):
... from email.policy import EmailPolicy
... header = EmailPolicy.header_factory('content-type', content_type)
... return (header.content_type, dict(header.params))
...
>>> parse_content_type("application/json; charset")
('application/json', {'charset': ''}) stdlib >>> from email.message import Message
>>>
>>> _CONTENT_TYPE = "content-type"
>>>
>>> def parse_content_type(content_type: str) -> tuple[str, dict[str,str]]:
... email = Message()
... email[_CONTENT_TYPE] = content_type
... params = email.get_params()
... # The first param is the mime-type the later ones are the attributes like "charset"
... return params[0][0], dict(params[1:])
...
>>> parse_content_type("application/json; charset")
('application/json', {'charset': ''}) (Also, checking those implementations, you can see that they are more correct about quoted strings -- matching quotes/unquoting -- but requests' simpler version of just splitting on ";" and stripping any quote characters has been around for a long time and apparently not caused problems, so...) |
requests.utils.get_encoding_from_headers
assumes that the charset parameter always specifies a name. In very rare cases a server can send a malformed content-type header which does not specify a name. In these cases, requests should probably just treat it as if no charset had been specified.Expected Result
requests.utils.get_encoding_from_headers({'content-type': 'text/html; charset'}) == 'ISO-8859-1'
Actual Result
System Information
The text was updated successfully, but these errors were encountered: