Skip to content
This repository was archived by the owner on Aug 25, 2022. It is now read-only.
This repository was archived by the owner on Aug 25, 2022. It is now read-only.

爬虫失败 #70

@xhongyi

Description

@xhongyi

2018-07-04 04:16:17,288 net.py :72 ERROR Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/bcloud/net.py", line 70, in urlopen_simple
return urllib.request.urlopen(req)
File "/usr/lib/python3.5/urllib/request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.5/urllib/request.py", line 472, in open
response = meth(req, response)
File "/usr/lib/python3.5/urllib/request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.5/urllib/request.py", line 510, in error
return self._call_chain(*args)
File "/usr/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/usr/lib/python3.5/urllib/request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

大致感觉是没能伪装成百度客户端?
我试着加了个火狐的header伪装了一下也不行:
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
req = urllib.request.Request(url=url, headers=headers)
return urllib.request.urlopen(req)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions