Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEP 0015 Content Type Parsing #88

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

smithdc1
Copy link
Member

Hi @carltongibson 👋

I've made a start on writing up a DEP following the discussion on the forum. This is based upon previous discussions both on the mailing list, the forum and the linked PRs.

There's a couple of "TODO"s which need a bit more thought. But thought worth sharing where I have got to.

I hope that I have presented a fair and unbiased proposal of where we are and the concerns raised.

Copy link
Sponsor Member

@adamchainz adamchainz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant, thanks for getting this started as a DEP 🥳

draft/0015-content-types.rst Outdated Show resolved Hide resolved
- Introduce content type parsing and only add the new ``data`` attribute.

The new names for unchanged attributes is proposed as it's considered this a worthwhile improvement in its own right and introduces consistent naming across ``HttpRequest`` attributes. That is, without renaming the change only the new ``data`` attribute would be an outlier.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can also add a note that django-upgrade could automate refactoring code to use the new lowercased attributes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial work showing this is possible: smithdc1/django-upgrade@c043761

draft/0015-content-types.rst Outdated Show resolved Hide resolved

Parsed data from an ``HttpRequest`` is accessed via its ``POST`` attribute. It would be a breaking change if Django were to start parsing content types where currently a string is returned. To avoid introducing a breaking change it is proposed that a new ``data`` attribute is added for the new behaviour.

While introducing a new name for ``POST`` it is proposed that the names for the other attributes are modernized with an equivalent behaviour.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally proposed request.form_data as the new lowercase name for request.POST back in Ticket 32259, I’d like that we still keep that. We can’t find-and-replace request.POST with request.data without adding new functionality unsafely.

Also, let’s list the renames right here in the abstract, so they’re easy for future readers to find:

Suggested change
While introducing a new name for ``POST`` it is proposed that the names for the other attributes are modernized with an equivalent behaviour.
While introducing a new name for ``POST`` it is proposed that the names for the other attributes are modernized with an equivalent behaviour:
* ``GET`` -> ``query_params``
* ``POST`` -> ``form_data``
* ``COOKIES`` -> ``cookies`
* ``META`` -> ``meta``
* ``FILES`` -> ``files``

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think form_data is confusing.

In the first place, GET forms are a thing. But once you're parsing other content types from the body, there's no form even in play at all.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post_form_data ? I would just like another lowercase name for the existing attribute so it’s not left uppercase-only, requiring users to adopt data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you mean as well as then adding the data attribute? I'd misunderstood.

OK, yes, something like that makes sense.

Let's discuss in Vigo, where we can likely bikeshed it to death over less than a single coffee ☕️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having spoken with @adamchainz at DjangoCon, I agree with him. Adding form_data as an alias to POST, maintaining the existing behaviour is a good idea. 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case could this DEP become just about renaming?

That would allow django/django#17546 to make progress without this?

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think?

My honest take is that we need both. I don't think renaming things without adding anything is great, and just sticking the data object on without updating the request seems an error.

But, logically, yes, maybe. One strategy would be to plough on and see where we get.

@carltongibson
Copy link
Member

Thanks for picking this up @smithdc1! 🎁 Good timing. Let's chat in Vigo.

@collinanderson
Copy link

collinanderson commented Jul 22, 2024

Hi All,

Not to sure where exactly to leave/put this feedback but wanted to at least mention some concerns:

First of all I think keeping form_data as the exact same behavior as POST is a really good idea, along with not touching the almost 20-year old original names. I do think having two names for everything could be confusing, but the new names are probably more clear for beginners (who aren't PHP programmers like 20 years ago). However meta could probably use some improvement to be more clear: cgi_variables? cgi_environ? cgi_environment_meta_variables? (Edit: form_data also has the potential to be confusing because of <form method="GET">, but I don't have a good solution to that. Edit 2: <form method="PUT"> maybe isn't allowed? so maybe post_form_data would be more clear if only POST method is allowed?)

Regarding content-negotiation, I have some of concerns:

  • If Django does any json parsing (including a request.get_json() attr), are there going be denial-of-service security issues? Like does Django need to respect DATA_UPLOAD_MAX_NUMBER_FIELDS? Does Django need to worry about how deeply nested the json can possibly go? Does Django need to worry about cases where field/key names are repeated? (I guess that one was already mentioned on Jun 7th.)

  • json and form_data aren't really as interchangeable as much as urlencoded/multipart are. urlencoded and multipart both allow duplicate keys as a way of specifying a list (like multiple checkboxes or select multiple). It's all standardized as part of html, but there's no such thing as <form enctype="application/json">, so there's no way to auto-translate between the two. (Same with json and xml: they are not quite interchangeable.)

  • I'm not sure it's generally a good idea to let the client decide what content-type to use when sending data. I could see some occasional cases to allow content negotiation, but I think in general it might be more secure-by-default to lock a view down to only accepting one content type? Like if my code processing form data from an html form is expecting only string-to-string key value pairs, it should probably stick to the .form_data attribute. Using a .data attribute and opening up the possibility of the client sending in json to that attribute might become a surprise issue. It might not even be dictionary.

  • Similarly, people might set up Web-Application-Firewall rules to do some filtering on json that's getting passed to a view, and a client sending urlencoded or multipart might be able to bypass that filtering. (and vice-versa).

  • Also re denial-of-service, I also think Django shouldn't parse json just because it was sent, unless the view is actually asking for the json. Like, any current view that's currently accepting form data shouldn't start auto-parsing json like it does with GET/POST/FILES/COOKIES/headers, just because Django starts to allow json. Wait until the view actually calls request.json() before parsing it. The view might not actually want it and parsing json is more complicated and slower than any of the other parsing, except maybe multipart, but there's already a bunch of denial-of-service checks for that.

  • I kinda hinted at this before https://code.djangoproject.com/ticket/32646#comment:5, but it seems to me what people actually want is to have an easy way to parse json, but Django is trying to solve that by making a generic framework to handle "any" content type, which I personally think is actually going to lead to a lot of issues and confusion and wishy-washy views. If your view is expecting json then get it from request.json() (which should check content-type before parsing, like .form_data does). If you're expecting urlencoded/multipart, use request.form_data. If you're expecting something else, then parse it yourself. It makes the code more clear what's going on and therefore easier to read, and probably more secure.

  • In general, if Django does allow this sort of auto-content negotiation, I'd personally recommend not using it by default, unless you know you actually need that feature and know what you're doing. This seems kinda similar to the issue with request.REQUEST (union of GET and POST) that was removed because of security issues.

  • Explicit is better than implicit, Simple is better than complex, Readability counts, etc.

  • I don't think you need to add content-negotiation in order to justify renaming request vars, because I think the temptation will be for people to switch to using content-negotation as part of the rename because it's the shiny new exciting feature, and I don't think that's something that should be encouraged by default for the reasons above.

Anyway that's just a quick rant / two cents about content-negotiation, doesn't need to hold things up, but I thought I'd at least mention these issues for my own sake so they're not swirling around in my head forever.

Thanks,
Collin

@smithdc1
Copy link
Member Author

Thank you for your thoughts -- A couple of responses from me. I'm happy to add these to the DEP if that would be helpful.

If Django does any json parsing (including a request.get_json() attr), are there going be denial-of-service security issues?

I think ensuring compliance with the DATA_UPLOAD_MAX_MEMORY_SIZE setting would be the defense here? This would be consistent with how other requests are currently handled.

Also re denial-of-service, .... Wait until the view actually calls request.json() before parsing it.

I agree that the behavior for .data should be the same as .POST and so on -- i.e. only parse when required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants