-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RequestUrl and ResponseUrl using yarl #45
Open
BurnzZ
wants to merge
9
commits into
url-page-inputs
Choose a base branch
from
url-page-inputs-yarl
base: url-page-inputs
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
71bb150
use yarl underneath ResponseURL and RequestURL
BurnzZ b3c7a0a
fix naming, annotations, and cleanup files
BurnzZ 2ebf7a3
Update the __repr__ of _Url class
BurnzZ b658ab5
update the internal yarl.URL reference to be private
BurnzZ be37f39
expose 'encoded' parameter in _Url class
BurnzZ 4e0e103
revert dunder private attribute in _Url
BurnzZ 292a3b4
handle equality on the base url
BurnzZ 912ba77
prevent str and _Url instance when having the same value
BurnzZ d869024
fix type annotation on _Url init
BurnzZ File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The property methods below can be added mapped dynamically with
yarl
's. However, we lose the benefit of defining docstrings within them.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would also lose API governance, so +1 to manual definition.
However, I wonder if we should define them at all for the initial implementation. We want to make sure we get the API right encoding-wise, and if we expose a part of the Yarl interface already as is, I imagine we are introducing the encoding issue in our implementation, with the caveat of not supporting
encoded=True
in__init__
to at least prevent Yarl from messing things up.Maybe the initial implementation should use a string internally instead, and we can convert it into Yarl later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good point about handling the encoding. What do you think about setting
encoded=False
by default to prevent yarl from messing things up due to incorrect encoding? be37f39. This would be equivalent to having astr
internally, aside from the "smart" helper methods.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is, Yarl exposes
encoded
in__init__
as a workaround for proper encoding handling, which they set off not to implement. But I believe what @kmike has in mind is for us to have a URL class that does proper encoding handling, in which case we should probably not exposeencoded
at all (maybeencoding
instead, defaulting to"utf8"
).I would wait for feedback from @kmike before making more API decisions. I am personally not sure of the best approach here, what parts of
w3lib.url
we want to apply and how.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the goal shouldn't be to implement a general URL class; the goal is to implement URL class useful for web scraping.
If that's hard to use yarl or other library's URL class directly, and we're defining the API anyways, we probably should think about it from the API point of view: what's the API we want, what are the features commonly used in web scraping? After figuring out how we'd like API to look like, we can see what's the best way to implement it - wrap yarl, wrap w3lib, do something else.
Based on our previous discussions, I think a scraping-ready URL class should have:
/
operationIn addition to this, there is whole bunch of questions about encoding, normalization, converting URLs to ascii-only encoded strings suitable for downloading, etc. The best API to handle all that might require some thought. I wonder if we can side-step it for now somehow.
At the same time, I'm not sure properties like .scheme are that essential. They're are essential for a general-purpose URL class, but are people who write scraping code commonly parse URLs to get their scheme? We can add such methods and properties for sure, but we can do it later. These methods are probably useful for authors of web scraping frameworks / http clients, but less so for people who write web scraping code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also about return types - some yarl.URL methods are going to return yarl.URL objects, while here it would make more sense to return
_Url
objects.