Skip to content

jjonescz/awe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

aa823ca Β· Feb 25, 2023
Apr 27, 2022
Apr 30, 2022
Apr 30, 2022
Apr 27, 2022
Feb 25, 2023
Apr 23, 2022
Apr 23, 2022
Apr 30, 2022
Apr 24, 2022
Mar 22, 2022
Mar 22, 2022
Oct 11, 2021
Oct 11, 2021
Apr 23, 2022
Apr 2, 2022
Apr 23, 2022
Feb 25, 2023
Feb 25, 2023
Mar 23, 2022

Repository files navigation

AI-based web extractor

This repository contains source code of AI-based structured web data extractor.

Directory structure

  • πŸ“‚ awe/: Python module (data manipulation and machine learning). See awe/README.md.
  • πŸ“‚ js/: Node.js app (visual attribute extractor and inference demo). See js/README.md.
  • πŸ“‚ docs/
    • πŸ“‚ dev/
      • πŸ“„ env.md: development environment setup.
      • πŸ“„ tips.md: development guidelines and bash snippets.
    • πŸ“„ data.md: dataset preparation.
    • πŸ“„ extractor.md: running the visual extractor.
    • πŸ“„ train.md: training instructions.
    • πŸ“„ release.md: release instructions.
    • πŸ“‚ demo/
      • πŸ“„ run.md: developing and running the demo.
      • πŸ“„ deploy.md: online demo deployment.

Quickstart

Running the pre-trained demo locally

docker pull janjones/awe-demo
docker run --rm -it -p 3000:3000 janjones/awe-demo

Open a web browser and navigate to http://localhost:3000/.

For more details, see docs/demo/run.md.

Training on the SWDE dataset

docker pull janjones/awe-gradient
docker run --rm -it -v awe:/storage -p 3000:3000 janjones/awe-gradient bash

Then, run inside the Docker container:

git clone https://github.com/jjonescz/awe .
git clone https://github.com/jjonescz/swde-visual data/swde
python -m awe.training.params
python -m awe.training.train
# Model is trained, now you can run the demo.
cd js
pnpm install
DEBUG=1 pnpm run server

For more details, see

  1. docs/dev/env.md,
  2. docs/data.md,
  3. docs/train.md, and
  4. docs/demo/run.md.

Examples

Generated by the live demo.

E-shop 1

E-shop 2