Skip to content

Batch Request or Async Support #15

@fasihhussain00

Description

@fasihhussain00

I've been using this library for over four months and have encountered a performance issue related to synchronous network requests. Currently, while decoding thousands of URLs one at a time because batch execution is not working, CPU-intensive tasks get blocked due to these requests, leading to inefficiencies. To address this, I had to use ThreadPoolExecutor to enable parallel execution. However, since the primary overhead comes from network latency rather than processing, a more efficient approach would be to integrate aiohttp for asynchronous network calls. This would allow multiple decoding operations to run concurrently without blocking CPU-bound tasks, significantly improving performance.

e.g.

proxy="http://user:pass@localhost:80"
async with session.post(url, headers=headers, proxy=proxy) as response:
    response.raise_for_status()
    raw_data = await response.text()

Moreover, using aiohttp could help avoid the need for creating multiple threads to execute the scraping jobs quickly we can just use event loop, which in turn would reduce the complexity and improve the overall performance of the application.

It would be great if the library could consider switching to or adding an option for aiohttp for the network requests.

Looking forward to hearing from you guys.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions