Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vl5x.py中算法可能有问题,有的cookie值算不对。 #4

Open
easyfast opened this issue Dec 18, 2018 · 4 comments
Open

vl5x.py中算法可能有问题,有的cookie值算不对。 #4

easyfast opened this issue Dec 18, 2018 · 4 comments

Comments

@easyfast
Copy link

有时执行正常,有时执行错误。报错如下

2018-12-18 17:46:21 [scrapy.core.scraper] ERROR: Spider error processing <GET http://wenshu.court.gov.cn/list/list/?sorttype=1> (referer: None)
Traceback (most recent call last):
File "c:\users\qiqing\appdata\local\programs\python\python37\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
File "c:\users\qiqing\appdata\local\programs\python\python37\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 30, in process_spider_output
for x in result:
File "c:\users\qiqing\appdata\local\programs\python\python37\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in
return (_set_referer(r) for r in result or ())
File "c:\users\qiqing\appdata\local\programs\python\python37\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "c:\users\qiqing\appdata\local\programs\python\python37\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "D:\Work\JingShOnline\Spider\wenshu\wenshu\spiders\wenshu.py", line 20, in parse
vjkl5 = getvjkl5(cookie)
File "D:\Work\JingShOnline\Spider\wenshu\wenshu\utils\vl5x.py", line 1806, in getvjkl5
vjkl5 = arrFunfunIndex
File "D:\Work\JingShOnline\Spider\wenshu\wenshu\utils\vl5x.py", line 755, in makeKey_150
return md5(makeKey_14(str1) + makeKey_19(str1))[1: 1 + 24]
File "D:\Work\JingShOnline\Spider\wenshu\wenshu\utils\vl5x.py", line 175, in makeKey_14
b = base64.b64encode(s[1:] + s[5:] + s[1:4])
File "c:\users\qiqing\appdata\local\programs\python\python37\lib\base64.py", line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'

@easyfast
Copy link
Author

测试代码如下:
`# -- coding: utf-8 --
import scrapy
import datetime

from wenshu.utils.vl5x import getvjkl5

class wenshu(scrapy.Spider):
name = "wenshu"
start_urls = ['http://wenshu.court.gov.cn/list/list/?sorttype=1']

# def start_requests(self):
#     url = 'http://wenshu.court.gov.cn/list/list/?sorttype=1'
#     yield scrapy.Request(url, callback=self.parse)

def parse(self, response):
    cookie = response.headers['Set-Cookie'].decode().split(';')[0][6:]
    print(cookie)
    # print(response.body)
    vjkl5 = getvjkl5(cookie)
    print(vjkl5)

    # # 案例筛选参数
    data = {'Param': u'裁判日期:2017-12-06 TO 2017-12-06', 'Index': '1', 'Page': '20', 'Order': u'法院层级', 'Direction': 'asc', 'vl5x': vjkl5}
    yield scrapy.FormRequest('http://wenshu.court.gov.cn/List/ListContent', headers={'Cookie': cookie},callback=self.testlist,
                                                  formdata=data,
                                                 meta={'cookie': cookie, 'vjkl5': vjkl5})



def testlist(self,response):
    # print(response.body)
    filename = 'mingyan.html'
    with open(filename, 'wb') as f:        #python文件操作,不多说了;
        f.write(response.body)             #刚才下载的页面去哪里了?response.body就代表了刚才下载的页面!
    self.log('保存文件: %s' % filename)`

@zc3945
Copy link
Owner

zc3945 commented Dec 20, 2018

测试代码如下:
`# - - coding:utf-8 - -
import scrapy
import datetime

来自wenshu.utils.vl5x导入getvjkl5

class wenshu(scrapy.Spider):
name =“ wenshu ”
start_urls = ['http://wenshu.court.gov.cn/list/list/?sorttype=1']

# def start_requests(self):
#     url = 'http://wenshu.court.gov.cn/list/list/?sorttype=1'
#     yield scrapy.Request(url, callback=self.parse)

def parse(self, response):
    cookie = response.headers['Set-Cookie'].decode().split(';')[0][6:]
    print(cookie)
    # print(response.body)
    vjkl5 = getvjkl5(cookie)
    print(vjkl5)

    # # 案例筛选参数
    data = {'Param': u'裁判日期:2017-12-06 TO 2017-12-06', 'Index': '1', 'Page': '20', 'Order': u'法院层级', 'Direction': 'asc', 'vl5x': vjkl5}
    yield scrapy.FormRequest('http://wenshu.court.gov.cn/List/ListContent', headers={'Cookie': cookie},callback=self.testlist,
                                                  formdata=data,
                                                 meta={'cookie': cookie, 'vjkl5': vjkl5})



def testlist(self,response):
    # print(response.body)
    filename = 'mingyan.html'
    with open(filename, 'wb') as f:        #python文件操作,不多说了;
        f.write(response.body)             #刚才下载的页面去哪里了?response.body就代表了刚才下载的页面!
    self.log('保存文件: %s' % filename)`

你发的代码我测试没有问题。如果有时执行错误,可以降低频率

@ghost
Copy link

ghost commented Dec 26, 2018

还可以正常运行么

@chengguojun
Copy link

# 计算页码 page_count = int(result[0]['Count']) // 10 if int(result[0]['Count']) // 10 == 0 else int(result[0]['Count']) // 10 + 1 if int(Index) < page_count: data = {'Param': Param, 'Index': str(Index+1), 'Page': '10', 'Order': u'法院层级', 'Direction': 'asc', 'vl5x': vjkl5, 'number': number, 'guid': guid} yield scrapy.FormRequest('http://wenshu.court.gov.cn/List/ListContent', headers=headers, callback=self.get_doc_list, formdata=data, meta={'cookie': cookie, 'vjkl5': vjkl5, 'Param': Param, 'number': number, 'Index': str(Index+1), 'guid': guid})
你好,测试了一下你的代码,有些确实写的很好,不过翻页有问题,我建议用该过的这个,还有递归调用已经是循环了,再加上for翻页的话,会把网站搞死的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants