Skip to content

Python asynchronous library for web scraping

License

Notifications You must be signed in to change notification settings

james-dow/aioscrapy

This branch is 18 commits behind eugen1j/aioscrapy:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

0c7f5fc · Jun 24, 2019

History

20 Commits
Jun 24, 2019
Jun 24, 2019
May 15, 2019
May 15, 2019
May 12, 2019
May 15, 2019
May 15, 2019
Jun 24, 2019
May 12, 2019
May 13, 2019
May 17, 2019

Repository files navigation

Python async library for web scraping

PyPI version License: MIT

Build Status codecov codebeat badge Codacy Badge

Installing

pip install aioscrapy

Usage

Plain text scraping

import asyncio
import json

from aioscrapy import Client, WebTextClient, SingleSessionPool, Dispatcher, SimpleWorker


class CustomClient(Client[str, dict]):
    def __init__(self, client: WebTextClient):
        self._client = client

    async def fetch(self, key: str) -> dict:
        data = await self._client.fetch(key)
        return json.loads(data)


async def main():
    pool = SingleSessionPool()
    dispatcher = Dispatcher(['https://httpbin.org/get'])
    client = CustomClient(WebTextClient(pool))
    worker = SimpleWorker(dispatcher, client)

    result = await worker.run()
    return result

loop = asyncio.get_event_loop()
print(loop.run_until_complete(main()))

Byte content downloading

import asyncio

from aioscrapy import Client, WebByteClient, SingleSessionPool, Dispatcher, SimpleWorker


class CustomClient(Client[str, bytes]):
    def __init__(self, client: WebByteClient):
        self._client = client

    async def fetch(self, key: str) -> bytes:
        data = await self._client.fetch(key)
        return data


async def main():
    pool = SingleSessionPool()
    dispatcher = Dispatcher(['https://httpbin.org/image'])
    client = CustomClient(WebByteClient(pool))
    worker = SimpleWorker(dispatcher, client)

    result = await worker.run()
    return result

loop = asyncio.get_event_loop()
data: dict = loop.run_until_complete(main())
for url, byte_content in data.items():
    print(url + ": " + str(len(byte_content)) + " bytes")

About

Python asynchronous library for web scraping

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%