Skip to content

REST api built using TS, Node/Express, and Cheerio. Allows users to extract data from an HTML files based on CSS selectors provided via JSON.

Notifications You must be signed in to change notification settings

Ramzi-Abidi/HTML-Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraper API with Node and Express

REST api built using TS, Node/Express, and Cheerio. The API allows users to extract data from an HTML files based on CSS selectors provided via JSON. The scraper reads a local HTML file, processes it, and returns the scraped data as a JSON response.

Getting Started

These instructions will help you set up and run the project locally for development and testing purposes.

Prerequisites

  • Node.js
  • npm

Installation

  1. Clone the repository:

    git clone https://github.com/Ramzi-Abidi/HTML-Parser.git
  2. Install deps:

    cd folder
    npm install
  3. Run

    Build: npm run build
    Start: npm run dev
    Test: npm test

Endpoints

There is one endpoint.

  1. POST /extract scrape data from index.html file.

Accepts a JSON body:

 // Body
 {
     "title": "body h1:first-child",
     "prices": "tr > td:first-child",
 };

 // Response:
 {
     "title": "This is the title of the page",
     "prices": [
         "Item",
         "Awesome USB mouse",
         "Vintage PS/2 mouse"
     ]
 }
 // Body
 {

     "title": "h1:first-child",
     "prices": {
         "__root": "html body table tr",
         "itemName": "td:nth-child(1)",
         "price": "td:nth-child(2)"
     }
 }

 // Response:
 {
 "title": "This is the title of the page",
 "prices": [
         {
             "itemName": "Item",
             "price": "Price"
         },
         {
             "itemName": "Awesome USB mouse",
             "price": "3.20"
         },
         {
             "itemName": "Vintage PS/2 mouse",
             "price": "6.60"
         }
     ]
 }

Demo

2024-10-20.19-47-44.mp4

To enable the functionality for users to upload an HTML file via request, you can switch to the feature branch:

git checkout feat/upload-html-file
npm i
npm run dev

This branch includes the necessary updates to handle HTML file uploads and validate the file type.

Make sure to install any required dependencies before running the app.

Demo - Upload html file support

2024-10-21.07-30-36.mp4

About

REST api built using TS, Node/Express, and Cheerio. Allows users to extract data from an HTML files based on CSS selectors provided via JSON.

Topics

Resources

Stars

Watchers

Forks