-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Un-parametrize" webcrawler tests #2966
Merged
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
e820b88
"Un-parametrize" webcrawler tests
lunkwill42 18db455
Add news fragment
lunkwill42 10c4d28
Keep track of all unreachable/invalid pages
lunkwill42 34d019d
Fix small crawler bug
lunkwill42 809a18a
Make webcrawler assertions more legible
lunkwill42 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,15 +3,12 @@ | |
The crawler attempts to retrieve any NAV web UI page that can be reached with | ||
parameterless GET requests, while logged in as an administrator. | ||
|
||
We want one test for each such URL, but since generating more tests while | ||
running existing tests isn't easily supported under pytest (yield tests are | ||
becoming deprecated under pytest 4), the crawler is the de-facto reachability | ||
tester. A dummy test will be generated for each seen URL, and the dummy tests | ||
will assert that the response code of the URL was 200 OK. | ||
|
||
In addition, HTML validation tests (using libtidy) will be generated for all | ||
URLs that report a Content-Type of text/html. | ||
|
||
In some respects, it would be preferable to generate 1 named test for each | ||
reachable page, but the tests need to be generated during the test collection | ||
phase, which means that a full web server needs to be running before pytest | ||
runs - and it would also be preferable that the web server is started from a | ||
fixture. Instead, the webcrawler is itself a fixture that allows iteration over | ||
all reachable pages. | ||
""" | ||
|
||
from collections import namedtuple | ||
|
@@ -76,6 +73,14 @@ | |
|
||
Page = namedtuple('Page', 'url response content_type content') | ||
|
||
if not HOST_URL: | ||
pytest.skip( | ||
msg="Missing environment variable TARGETURL " | ||
"(ADMINUSERNAME, ADMINPASSWORD) , skipping crawler " | ||
"tests!", | ||
allow_module_level=True, | ||
) | ||
|
||
|
||
def normalize_path(url): | ||
url = urlsplit(url).path.rstrip('/') | ||
|
@@ -216,32 +221,27 @@ def _quote_url(url): | |
|
||
|
||
# | ||
# test functions | ||
# fixtures | ||
# | ||
|
||
# just one big, global crawler instance to ensure it's results are cached | ||
# throughout all the tests in a single session | ||
if HOST_URL: | ||
|
||
@pytest.fixture(scope="session") | ||
def webcrawler(): | ||
crawler = WebCrawler(HOST_URL, USERNAME, PASSWORD) | ||
else: | ||
crawler = Mock() | ||
crawler.crawl.return_value = [] | ||
yield crawler | ||
|
||
|
||
def page_id(page): | ||
"""Extracts a URL as a test id from a page""" | ||
return normalize_path(page.url) | ||
# | ||
# test functions | ||
# | ||
|
||
|
||
@pytest.mark.skipif( | ||
not HOST_URL, | ||
reason="Missing environment variable TARGETURL " | ||
"(ADMINUSERNAME, ADMINPASSWORD) , skipping crawler " | ||
"tests!", | ||
) | ||
@pytest.mark.parametrize("page", crawler.crawl(), ids=page_id) | ||
def test_link_should_be_reachable(page): | ||
assert page.response == 200, _content_as_string(page.content) | ||
def test_all_links_should_be_reachable(webcrawler): | ||
for page in webcrawler.crawl(): | ||
if page.response != 200: | ||
# No need to fill up the test report files with contents of OK pages | ||
print(_content_as_string(page.content)) | ||
assert page.response == 200, "{} is not reachable".format(page.url) | ||
|
||
|
||
def _content_as_string(content): | ||
|
@@ -251,23 +251,17 @@ def _content_as_string(content): | |
return content.decode('utf-8') | ||
|
||
|
||
@pytest.mark.skipif( | ||
not HOST_URL, | ||
reason="Missing environment variable TARGETURL " | ||
"(ADMINUSERNAME, ADMINPASSWORD) , skipping crawler " | ||
"tests!", | ||
) | ||
@pytest.mark.parametrize("page", crawler.crawl_only_html(), ids=page_id) | ||
def test_page_should_be_valid_html(page): | ||
if page.response != 200: | ||
pytest.skip("not validating non-reachable page") | ||
if not page.content: | ||
pytest.skip("page has no content") | ||
def test_page_should_be_valid_html(webcrawler): | ||
for page in webcrawler.crawl_only_html(): | ||
if page.response != 200 or not page.content: | ||
continue | ||
|
||
document, errors = tidy_document(page.content, TIDY_OPTIONS) | ||
errors = filter_errors(errors) | ||
document, errors = tidy_document(page.content, TIDY_OPTIONS) | ||
errors = filter_errors(errors) | ||
if errors: | ||
print(errors) | ||
|
||
assert not errors, "Found following validation errors:\n" + errors | ||
assert not errors, "{} did not validate as HTML".format(page.url) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment here as above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed |
||
|
||
|
||
def should_validate(page: Page): | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only complaint here is that it stops as soon as the first page is not reachable. Before you could see all pages that aren't reachable, which might help when trouble shooting, to easier figure out what is broken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed