REST api built using TS, Node/Express, and Cheerio. The API allows users to extract data from an HTML files based on CSS selectors provided via JSON. The scraper reads a local HTML file, processes it, and returns the scraped data as a JSON response.
These instructions will help you set up and run the project locally for development and testing purposes.
- Node.js
- npm
-
Clone the repository:
git clone https://github.com/Ramzi-Abidi/HTML-Parser.git
-
Install deps:
cd folder npm install
-
Run
Build: npm run build Start: npm run dev Test: npm test
There is one endpoint.
POST /extract
scrape data from index.html file.
Accepts a JSON body:
// Body
{
"title": "h1:first-child",
"prices": {
"__root": "html body table tr",
"itemName": "td:nth-child(1)",
"price": "td:nth-child(2)"
}
}
// Response:
{
"title": "This is the title of the page",
"prices": [
{
"itemName": "Item",
"price": "Price"
},
{
"itemName": "Awesome USB mouse",
"price": "3.20"
},
{
"itemName": "Vintage PS/2 mouse",
"price": "6.60"
}
]
}
2024-10-20.19-47-44.mp4
To enable the functionality for users to upload an HTML file via request, you can switch to the feature branch:
git checkout feat/upload-html-file
npm i
npm run dev
This branch includes the necessary updates to handle HTML file uploads and validate the file type.
Make sure to install any required dependencies before running the app.