DevJobsScraper is a web scraping project designed to collect and process job listings from various online sources. The project aims to provide a comprehensive and up-to-date database of job openings, making it easier for job seekers to find relevant opportunities.
- Multi-source scraping: DevJobsScraper can extract job listings from multiple websites and online platforms.
- Data processing: The project includes data processing capabilities to clean, normalize, and store the scraped data.
- Configurable: DevJobsScraper allows users to configure the scraping process, including specifying sources, filtering criteria, and output formats.
- Python 3.8+
- Node.js 14+
- npm or yarn
- Clone the repository:
git clone https://github.com/YounesBensafia/DevJobsScraper.git - Navigate to the project directory:
cd DevJobsScraper - Install Python dependencies:
pip install -r requirements.txt - Install Node.js dependencies:
npm installoryarn install
- Configure the scraper by editing the
config.pyfile. - Run the scraper using Python:
python main.py
The scraper will output the collected job listings in the specified format (e.g., JSON, CSV).
The project consists of the following directories and files:
src/: Source code for the scraper and data processing components.config/: Configuration files for the scraper.data/: Sample data and output files.requirements.txt: Python dependencies.package-lock.json: Node.js dependencies.
- Python: Primary programming language for the scraper and data processing components.
- Node.js: Used for dependency management and potential future development.
- JavaScript: Used for client-side scripting (if applicable).
- HTML/CSS: Used for documentation and potential future development.
We welcome contributions to DevJobsScraper! If you'd like to contribute, please:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Submit a pull request with a clear description of your changes.