Py3 upgrade and Pacer Refactoring #171

voutilad · 2017-01-31T18:37:39Z

Currently the test_extract_written_documents_report test is skipped/disabled. Didn't delete it while we review if the logic is covered by the new tests.

Did some small cleanup in support of Py3 using six. Wasn't major.

Pacer refactoring now supports transition to newer versions of requests when needed and should be easier to follow. Added a bunch of tests around it as well.

Changed the setup.py to have the test requirements stuff baked in for mock and vcrpy, but since they aren't needed for execution they're not in the requirements.txt file. If someone runs either tox or python setup.py test they get installed on the fly.

… tricks. added tox for testing. still an issue with the title case function due to how python handles unicode strings now.

…t tries to see if a string starts with unicode or not.

…ompatability wrapper function around calls to the requests response objects.

…he mock not closing a connection. removed my stupid broken non-fix for test_pacer.py

…kie jar instance. refactored out posts to PACER as it turns out you need some black magick voodoo to form the post body into something it will enjoy.

added mocks dependency for unit tests (to tox.ini and requirements-dev.txt start refactoring some of the Pacer stuff into a PacerSession class that extends requests.Session to handle PACER nuances tests passing locally with tox using free login.

…irements.txt file. still need to update README.rst about changes. refactored the BadLoginException into the juriscraper.pacer.http module as it fits better next to the place that raises it. added default timeout value of 300 to pacer sessions since it seemed commonly set elsewhere

…ll supporting the legacy test site that does not seem supported at the moment.

…courts.

mlissner

OK! This was a big PR, but it's essentially a go. I made a lot of comments for tweaks, but I don't think there's much that's substantive.

mlissner · 2017-01-31T20:10:30Z

juriscraper/lib/date_utils.py

-    if isinstance(s, unicode):
-        s = s.encode('ascii', 'ignore')
+    #if isinstance(s, six.text_type):
+    #    s = s.encode('ascii', 'ignore')


Why commented out?

Trying to remember...but I believe it was doing nothing but cause problems since Py3 works fine with unicode and Py2 does better. I couldn't figure out the point of forcing things to ASCII as it didn't seem to impact the tests. Either I'm missing something or it's legacy stuff.

OK. This code isn't used often, so we can leave this and uncomment it if needed.

mlissner · 2017-01-31T20:10:59Z

juriscraper/lib/date_utils.py


    # Fix misspellings
-    for i, j in MISSPELLINGS.iteritems():
+    for i, j in six.iteritems(MISSPELLINGS):


Can't we just do .items() in Python 2.7 and in 3.x?

mlissner · 2017-01-31T20:13:04Z

juriscraper/lib/string_utils.py

@@ -31,6 +32,8 @@
 ALL_CAPS = re.compile(r'^[A-Z\s%s%s%s]+$' % (PUNCT, WEIRD_CHARS, NUMS))
 UC_INITIALS = re.compile(r"^(?:[A-Z]{1}\.{1}|[A-Z]{1}\.{1}[A-Z]{1})+,?$")
 MAC_MC = re.compile(r'^([Mm]a?c)(\w+.*)')
+
+


mlissner · 2017-01-31T20:14:46Z

juriscraper/lib/string_utils.py

+            return CAPFIRST.sub(lambda m: m.group(0).upper(), word)
+        except UnicodeEncodeError:
+            # starts with unicode
+            pass


This is a corner case within a corner case, but can't we uppercase unicode characters?

This was all due to the test case of ['Reading between the lines of steve jobs’s ‘thoughts on music’'expecting to be converted to u'Reading Between the Lines of Steve Jobs’s ‘thoughts on Music’'].

That character (\u2018) before thoughts seems to break the regex patterns in Python 3. In Python 3.x it was giving this as output 'Reading Between the Lines of Steve Jobs’s ‘Thoughts on Music’'. Which failed the original test. Since I'm presuming the tests reflect the expected design I figured the workaround in Python3 was to simply test if the first character was unicode and if so skip the word.

I guess now that I think of it, something like canelo álvarez would end up Canelo álvarez with this logic. I'm not sure what the original expectation was beyond what I saw in the test cases. ¯_(ツ)_/¯

This actually seems like the tests are wrong. We should be capitalizing thoughts in the example above.

The very next test case has the same situation:
Input: 'seriously, ‘repair permissions’ is voodoo'
Expected Output: u'Seriously, ‘repair Permissions’ is Voodoo'

Should I change these and fix the unicode stuff I did?

Yep, please. This code is pretty old. Not sure why the test cases suck.

Seems like it's been this way a loooong time maybe since first commit: https://github.com/freelawproject/juriscraper/blob/626723b4373d65463e03d233bc1756128d039e1b/tests/tests.py

Yep, seems about right.

mlissner · 2017-01-31T20:15:21Z

juriscraper/lib/string_utils.py

@@ -182,7 +205,7 @@ def fix_camel_case(s):
        s_out = s
    else:
        s_out = s[0]
-        for i in xrange(1, len(s)):
+        for i in six.moves.range(1, len(s)):


Should we just make this range instead of xrange? Remove the need for six?

Similar to iteritems...yeah we shouldn't have an issue simplifying. I believe the catch is in Python 3.x range acts like xrange in that I think it's a lazy sequence while in Python 2.x range returns a list immediately. The length of s shouldn't be an issue here, right?

Nah, should be fine.

mlissner · 2017-01-31T20:39:14Z

tests/test_everything.py

@@ -60,7 +61,7 @@ def test_fix_future_year_typo(self):
            '12/01/2806': '12/01/2806',                     # Should not change
            '12/01/2886': '12/01/2886',                     # Should not change
        }
-        for before, after in expectations.iteritems():
+        for before, after in six.iteritems(expectations):


Use .items() here?

Huh, wasn't aware of that. Seems to be functionally the same although it doesn't return an iterator. I think for this case since the data isn't large it's moot. I'll back out the calls to six.iteritems.

mlissner · 2017-01-31T20:40:57Z

tests/test_everything.py

@@ -387,7 +385,7 @@ def test_make_short_name(self):
    def test_quarter(self):
        answers = {1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2, 7: 3, 8: 3, 9: 3, 10: 4,
                   11: 4, 12: 4}
-        for month, q in answers.iteritems():
+        for month, q in six.iteritems(answers):


.items()?

mlissner · 2017-01-31T20:41:04Z

tests/test_everything.py

@@ -400,7 +398,7 @@ def test_is_first_month_in_quarter(self):
            6: False,
            7: True,
        }
-        for month, is_first in answers.iteritems():
+        for month, is_first in six.iteritems(answers):


mlissner · 2017-01-31T20:41:19Z

tests/test_everything.py

@@ -907,7 +905,7 @@ def test_colo_coloctapp(self):
        }

        scraper = colo.Site()
-        for raw_string, data in tests.iteritems():
+        for raw_string, data in six.iteritems(tests):


mlissner · 2017-01-31T20:43:46Z

tests/test_pacer.py

+        """
+        data = {'name': ('filename', 'junk')}
+
+        self.session.post('http://free.law', files=data)


…ode raw string literals. minor tweaks per code review.

…ed login to test site based on "psc" court_id instead of username of tr1234

voutilad added 12 commits January 29, 2017 18:00

lots of changes to bring into line for Python 3.6 using six and other…

378c78e

… tricks. added tox for testing. still an issue with the title case function due to how python handles unicode strings now.

found a possible fix for the unicode issue in py3. bit of a hack...bu…

796a5f5

…t tries to see if a string starts with unicode or not.

added python 3.5 and 3.6 to travis file.

617d04f

turning off Debug in the title case test.

761faee

refixing the requirements to be exact versions for now. put a py2/3 c…

a531132

…ompatability wrapper function around calls to the requests response objects.

set requests to new version that works locally. fixed an issue with t…

e002d20

…he mock not closing a connection. removed my stupid broken non-fix for test_pacer.py

refactored cookie creation so be a bit more explicit in setting a coo…

53675ad

…kie jar instance. refactored out posts to PACER as it turns out you need some black magick voodoo to form the post body into something it will enjoy.

relaxing error condition for logins

40c77d1

attempt to refactor PACER login to use central auth service while sti…

4719f01

…ll supporting the legacy test site that does not seem supported at the moment.

slimming down the tests to focus on key functionality vs. breadth of …

679dd0d

…courts.

mlissner requested changes Jan 31, 2017

View reviewed changes

voutilad added 4 commits January 31, 2017 16:34

changes to README.rst, minor tweaks related to code review.

6d87ad5

segregated python2 and python3 specific regex due to issues with unic…

f36106d

…ode raw string literals. minor tweaks per code review.

Merge branch 'master' into py3

72a6e34

added new exception class to distinguish bad pacer credentials, chang…

450ee17

…ed login to test site based on "psc" court_id instead of username of tr1234

voutilad merged commit 33756fc into master Feb 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Py3 upgrade and Pacer Refactoring #171

Py3 upgrade and Pacer Refactoring #171

voutilad commented Jan 31, 2017

mlissner left a comment

mlissner Jan 31, 2017

voutilad Jan 31, 2017

mlissner Jan 31, 2017

mlissner Jan 31, 2017

mlissner Jan 31, 2017

mlissner Jan 31, 2017

voutilad Jan 31, 2017

mlissner Jan 31, 2017

voutilad Jan 31, 2017

mlissner Jan 31, 2017

voutilad Jan 31, 2017

mlissner Jan 31, 2017

mlissner Jan 31, 2017

voutilad Jan 31, 2017

mlissner Jan 31, 2017

mlissner Jan 31, 2017

voutilad Jan 31, 2017

mlissner Jan 31, 2017

mlissner Jan 31, 2017

mlissner Jan 31, 2017

mlissner Jan 31, 2017

Py3 upgrade and Pacer Refactoring #171

Py3 upgrade and Pacer Refactoring #171

Conversation

voutilad commented Jan 31, 2017

mlissner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment