Python async library for web scraping

Installing

pip install aioscrapy

Usage

Plain text scraping

import asyncio
import json

from aioscrapy import Client, WebTextClient, SingleSessionPool, Dispatcher, SimpleWorker


class CustomClient(Client[str, dict]):
    def __init__(self, client: WebTextClient):
        self._client = client

    async def fetch(self, key: str) -> dict:
        data = await self._client.fetch(key)
        return json.loads(data)


async def main():
    pool = SingleSessionPool()
    dispatcher = Dispatcher(['https://httpbin.org/get'])
    client = CustomClient(WebTextClient(pool))
    worker = SimpleWorker(dispatcher, client)

    result = await worker.run()
    return result

loop = asyncio.get_event_loop()
print(loop.run_until_complete(main()))

Byte content downloading

import asyncio

from aioscrapy import Client, WebByteClient, SingleSessionPool, Dispatcher, SimpleWorker


class CustomClient(Client[str, bytes]):
    def __init__(self, client: WebByteClient):
        self._client = client

    async def fetch(self, key: str) -> bytes:
        data = await self._client.fetch(key)
        return data


async def main():
    pool = SingleSessionPool()
    dispatcher = Dispatcher(['https://httpbin.org/image'])
    client = CustomClient(WebByteClient(pool))
    worker = SimpleWorker(dispatcher, client)

    result = await worker.run()
    return result

loop = asyncio.get_event_loop()
data: dict = loop.run_until_complete(main())
for url, byte_content in data.items():
    print(url + ": " + str(len(byte_content)) + " bytes")

Name	Name	Last commit message	Last commit date
Latest commit yevgeniy Renamed WebClient to WebTextClient and added WebByteClient Jun 24, 2019 0c7f5fc · Jun 24, 2019 History 20 Commits
aioscrapy	aioscrapy	Renamed WebClient to WebTextClient and added WebByteClient	Jun 24, 2019
examples	examples	Renamed WebClient to WebTextClient and added WebByteClient	Jun 24, 2019
tests	tests	fix codecov target	May 15, 2019
.coveragerc	.coveragerc	fix codecov target	May 15, 2019
.gitignore	.gitignore	change setup.py	May 12, 2019
.travis.yml	.travis.yml	fix codecov target	May 15, 2019
LICENSE	LICENSE	Create LICENSE	May 15, 2019
README.md	README.md	Renamed WebClient to WebTextClient and added WebByteClient	Jun 24, 2019
pytest.ini	pytest.ini	init	May 12, 2019
requirements.txt	requirements.txt	Add codecov	May 13, 2019
setup.py	setup.py	release	May 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python async library for web scraping

Installing

Usage

About

Releases

Packages

Languages

License

james-dow/aioscrapy

Folders and files

Latest commit

History

Repository files navigation

Python async library for web scraping

Installing

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages