Skip to content

Commit 573bfe0

Browse files
authored
Merge pull request #40 from D4Vinci/dev
v0.2.95
2 parents 5251abb + 1dc79d4 commit 573bfe0

File tree

5 files changed

+17
-13
lines changed

5 files changed

+17
-13
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,8 @@ This class is built on top of [httpx](https://www.python-httpx.org/) with additi
220220

221221
For all methods, you have `stealthy_headers` which makes `Fetcher` create and use real browser's headers then create a referer header as if this request came from Google's search of this URL's domain. It's enabled by default. You can also set the number of retries with the argument `retries` for all methods and this will make httpx retry requests if it failed for any reason. The default number of retries for all `Fetcher` methods is 3.
222222

223+
> Hence: All headers generated by `stealthy_headers` argument can be overwritten by you through the `headers` argument
224+
223225
You can route all traffic (HTTP and HTTPS) to a proxy for any of these methods in this format `http://username:password@localhost:8030`
224226
```python
225227
>> page = Fetcher().get('https://httpbin.org/get', stealthy_headers=True, follow_redirects=True)

scrapling/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
from scrapling.parser import Adaptor, Adaptors
66

77
__author__ = "Karim Shoair ([email protected])"
8-
__version__ = "0.2.94"
8+
__version__ = "0.2.95"
99
__copyright__ = "Copyright (c) 2024 Karim Shoair"
1010

1111

scrapling/engines/static.py

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -42,16 +42,19 @@ def _headers_job(self, headers: Optional[Dict]) -> Dict:
4242
:return: A dictionary of the new headers.
4343
"""
4444
headers = headers or {}
45-
46-
# Validate headers
47-
if not headers.get('user-agent') and not headers.get('User-Agent'):
48-
headers['User-Agent'] = generate_headers(browser_mode=False).get('User-Agent')
49-
log.debug(f"Can't find useragent in headers so '{headers['User-Agent']}' was used.")
45+
headers_keys = set(map(str.lower, headers.keys()))
5046

5147
if self.stealth:
5248
extra_headers = generate_headers(browser_mode=False)
49+
# Don't overwrite user supplied headers
50+
extra_headers = {key: value for key, value in extra_headers.items() if key.lower() not in headers_keys}
5351
headers.update(extra_headers)
54-
headers.update({'referer': generate_convincing_referer(self.url)})
52+
if 'referer' not in headers_keys:
53+
headers.update({'referer': generate_convincing_referer(self.url)})
54+
55+
elif 'user-agent' not in headers_keys:
56+
headers['User-Agent'] = generate_headers(browser_mode=False).get('User-Agent')
57+
log.debug(f"Can't find useragent in headers so '{headers['User-Agent']}' was used.")
5558

5659
return headers
5760

setup.cfg

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
[metadata]
22
name = scrapling
3-
version = 0.2.94
3+
version = 0.2.95
44
author = Karim Shoair
55
author_email = [email protected]
6-
description = Scrapling is an undetectable, powerful, flexible, adaptive, and high-performance web scraping library for Python.
6+
description = Scrapling is an undetectable, powerful, flexible, high-performance Python library that makes Web Scraping easy again!
77
license = BSD
88
home_page = https://github.com/D4Vinci/Scrapling

setup.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,9 @@
66

77
setup(
88
name="scrapling",
9-
version="0.2.94",
10-
description="""Scrapling is a powerful, flexible, and high-performance web scraping library for Python. It
11-
simplifies the process of extracting data from websites, even when they undergo structural changes, and offers
12-
impressive speed improvements over many popular scraping tools.""",
9+
version="0.2.95",
10+
description="""Scrapling is an undetectable, powerful, flexible, high-performance Python library that makes Web Scraping easy again! In an internet filled with complications,
11+
it simplifies web scraping, even when websites' design changes, while providing impressive speed that surpasses almost all alternatives.""",
1312
long_description=long_description,
1413
long_description_content_type="text/markdown",
1514
author="Karim Shoair",

0 commit comments

Comments
 (0)