-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Py3 upgrade and Pacer Refactoring #171
Conversation
… tricks. added tox for testing. still an issue with the title case function due to how python handles unicode strings now.
…t tries to see if a string starts with unicode or not.
…ompatability wrapper function around calls to the requests response objects.
…he mock not closing a connection. removed my stupid broken non-fix for test_pacer.py
…kie jar instance. refactored out posts to PACER as it turns out you need some black magick voodoo to form the post body into something it will enjoy.
added mocks dependency for unit tests (to tox.ini and requirements-dev.txt start refactoring some of the Pacer stuff into a PacerSession class that extends requests.Session to handle PACER nuances tests passing locally with tox using free login.
…irements.txt file. still need to update README.rst about changes. refactored the BadLoginException into the juriscraper.pacer.http module as it fits better next to the place that raises it. added default timeout value of 300 to pacer sessions since it seemed commonly set elsewhere
…ll supporting the legacy test site that does not seem supported at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! This was a big PR, but it's essentially a go. I made a lot of comments for tweaks, but I don't think there's much that's substantive.
if isinstance(s, unicode): | ||
s = s.encode('ascii', 'ignore') | ||
#if isinstance(s, six.text_type): | ||
# s = s.encode('ascii', 'ignore') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to remember...but I believe it was doing nothing but cause problems since Py3 works fine with unicode and Py2 does better. I couldn't figure out the point of forcing things to ASCII as it didn't seem to impact the tests. Either I'm missing something or it's legacy stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. This code isn't used often, so we can leave this and uncomment it if needed.
juriscraper/lib/date_utils.py
Outdated
|
||
# Fix misspellings | ||
for i, j in MISSPELLINGS.iteritems(): | ||
for i, j in six.iteritems(MISSPELLINGS): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just do .items()
in Python 2.7 and in 3.x?
juriscraper/lib/string_utils.py
Outdated
@@ -31,6 +32,8 @@ | |||
ALL_CAPS = re.compile(r'^[A-Z\s%s%s%s]+$' % (PUNCT, WEIRD_CHARS, NUMS)) | |||
UC_INITIALS = re.compile(r"^(?:[A-Z]{1}\.{1}|[A-Z]{1}\.{1}[A-Z]{1})+,?$") | |||
MAC_MC = re.compile(r'^([Mm]a?c)(\w+.*)') | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine.
juriscraper/lib/string_utils.py
Outdated
return CAPFIRST.sub(lambda m: m.group(0).upper(), word) | ||
except UnicodeEncodeError: | ||
# starts with unicode | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a corner case within a corner case, but can't we uppercase unicode characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was all due to the test case of ['Reading between the lines of steve jobs’s ‘thoughts on music’'
expecting to be converted to u'Reading Between the Lines of Steve Jobs’s ‘thoughts on Music’']
.
That character (\u2018
) before thoughts
seems to break the regex patterns in Python 3. In Python 3.x it was giving this as output 'Reading Between the Lines of Steve Jobs’s ‘Thoughts on Music’'. Which failed the original test. Since I'm presuming the tests reflect the expected design I figured the workaround in Python3 was to simply test if the first character was unicode and if so skip the word.
I guess now that I think of it, something like canelo álvarez
would end up Canelo álvarez
with this logic. I'm not sure what the original expectation was beyond what I saw in the test cases. ¯_(ツ)_/¯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually seems like the tests are wrong. We should be capitalizing thoughts
in the example above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The very next test case has the same situation:
Input: 'seriously, ‘repair permissions’ is voodoo'
Expected Output: u'Seriously, ‘repair Permissions’ is Voodoo'
Should I change these and fix the unicode stuff I did?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, please. This code is pretty old. Not sure why the test cases suck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it's been this way a loooong time maybe since first commit: https://github.com/freelawproject/juriscraper/blob/626723b4373d65463e03d233bc1756128d039e1b/tests/tests.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, seems about right.
juriscraper/lib/string_utils.py
Outdated
@@ -182,7 +205,7 @@ def fix_camel_case(s): | |||
s_out = s | |||
else: | |||
s_out = s[0] | |||
for i in xrange(1, len(s)): | |||
for i in six.moves.range(1, len(s)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just make this range
instead of xrange
? Remove the need for six
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to iteritems
...yeah we shouldn't have an issue simplifying. I believe the catch is in Python 3.x range
acts like xrange
in that I think it's a lazy sequence while in Python 2.x range
returns a list immediately. The length of s
shouldn't be an issue here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, should be fine.
tests/test_everything.py
Outdated
@@ -60,7 +61,7 @@ def test_fix_future_year_typo(self): | |||
'12/01/2806': '12/01/2806', # Should not change | |||
'12/01/2886': '12/01/2886', # Should not change | |||
} | |||
for before, after in expectations.iteritems(): | |||
for before, after in six.iteritems(expectations): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use .items()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, wasn't aware of that. Seems to be functionally the same although it doesn't return an iterator. I think for this case since the data isn't large it's moot. I'll back out the calls to six.iteritems
.
tests/test_everything.py
Outdated
@@ -387,7 +385,7 @@ def test_make_short_name(self): | |||
def test_quarter(self): | |||
answers = {1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2, 7: 3, 8: 3, 9: 3, 10: 4, | |||
11: 4, 12: 4} | |||
for month, q in answers.iteritems(): | |||
for month, q in six.iteritems(answers): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.items()
?
tests/test_everything.py
Outdated
@@ -400,7 +398,7 @@ def test_is_first_month_in_quarter(self): | |||
6: False, | |||
7: True, | |||
} | |||
for month, is_first in answers.iteritems(): | |||
for month, is_first in six.iteritems(answers): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
tests/test_everything.py
Outdated
@@ -907,7 +905,7 @@ def test_colo_coloctapp(self): | |||
} | |||
|
|||
scraper = colo.Site() | |||
for raw_string, data in tests.iteritems(): | |||
for raw_string, data in six.iteritems(tests): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
tests/test_pacer.py
Outdated
""" | ||
data = {'name': ('filename', 'junk')} | ||
|
||
self.session.post('http://free.law', files=data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https! :)
…ode raw string literals. minor tweaks per code review.
…ed login to test site based on "psc" court_id instead of username of tr1234
Currently the
test_extract_written_documents_report
test is skipped/disabled. Didn't delete it while we review if the logic is covered by the new tests.Did some small cleanup in support of Py3 using
six
. Wasn't major.Pacer refactoring now supports transition to newer versions of
requests
when needed and should be easier to follow. Added a bunch of tests around it as well.Changed the
setup.py
to have the test requirements stuff baked in formock
andvcrpy
, but since they aren't needed for execution they're not in therequirements.txt
file. If someone runs eithertox
orpython setup.py test
they get installed on the fly.