A robust and flexible web scraper for Forex Factory calendar events. This tool leverages Selenium and pandas to efficiently collect, update, and manage Forex Factory event data, supporting incremental scraping and optional detailed event information.
You can Download csv date from huggingface
- Incremental Scraping: Only fetch new or updated events based on existing CSV data.
- Detailed Event Information: Optionally scrape detailed specifications for each event.
- Flexible Date Range: Specify custom date ranges for scraping.
- Timezone Support: Configure the timezone according to your preference.
- Data Management with pandas: Efficiently handle data merging and updates using pandas.
- Error Handling: Robust handling of common web scraping issues like stale elements and timeouts.
- Command-Line Interface: Easy-to-use CLI with configurable parameters.
- Python 3.7+: Ensure you have Python installed. You can download it from python.org.
-
Clone the Repository
git clone https://github.com/yourusername/forexfactory_scraper.git cd forexfactory_scraper -
Create a Virtual Environment (Optional but Recommended)
python -m venv venv
- Activate the Virtual Environment:
- Windows:
venv\Scripts\activate
- macOS/Linux:
source venv/bin/activate
- Windows:
- Activate the Virtual Environment:
-
Install Dependencies
Ensure you have
pipupdated:pip install --upgrade pip
Install required packages:
pip install -r requirements.txt
Note: Make sure
requirements.txtincludes all necessary libraries such asselenium,pandas,undetected-chromedriver, and others. -
Download WebDriver
The scraper uses
undetected-chromedriverto handle dynamic content and bypass some scraping protections. No additional setup is required asundetected-chromedrivermanages the ChromeDriver version automatically.
The main script can be executed via the command line, allowing you to specify various parameters such as the date range, output CSV file, timezone, and whether to scrape detailed event information.
--start: (Required) Start date for scraping inYYYY-MM-DDformat.--end: (Required) End date for scraping inYYYY-MM-DDformat.--csv: (Optional) Output CSV file path. Default isforex_factory_cache.csv.--tz: (Optional) Timezone for event dates. Default isAsia/Tehran.--details: (Optional) Flag to enable scraping of detailed event information. If omitted, only basic event data is scraped.
Navigate to the project root directory and execute the script using Python:
python -m src.forexfactory.main --start YYYY-MM-DD --end YYYY-MM-DD [--csv OUTPUT_CSV] [--tz TIMEZONE] [--details]-
Scrape Events from March 21, 2024, to March 25, 2024, Including Details
python -m src.forexfactory.main --start 2024-03-21 --end 2024-03-25 --csv forex_factory_cache.csv --tz Asia/Tehran --details
-
Scrape Events from January 1, 2024, to January 31, 2024, Without Details
python -m src.forexfactory.main --start 2024-01-01 --end 2024-01-31 --csv january_events.csv --tz Asia/Tehran
-
Scrape Events from February 15, 2024, to February 20, 2024, Saving to a Custom CSV File
python -m src.forexfactory.main --start 2024-02-15 --end 2024-02-20 --csv feb_events.csv --tz Asia/Tehran
All dependencies are listed in requirements.txt. Key libraries include:
- selenium: For browser automation.
- pandas: For data manipulation and management.
- undetected-chromedriver: To bypass Selenium detection mechanisms.
- python-dateutil: For advanced date handling.
Install dependencies using:
pip install -r requirements.txtpython -m src.forexfactory.main --start 2024-03-21 --end 2024-03-25 --csv forex_factory_cache.csv --tz Asia/Tehran --details
This command scrapes Forex Factory events from March 21, 2024, to March 25, 2024, including detailed specifications for each event, and saves the data to forex_factory_cache.csv with Tehran timezone.
python -m src.forexfactory.main --start 2024-03-21 --end 2024-03-25 --csv forex_factory_cache.csv --tz Asia/TehranThis command performs the same scraping without fetching detailed event specifications, resulting in a faster scraping process.
-
StaleElementReferenceExceptionErrorsCause: The web page's DOM has changed, making the reference to the web element invalid.
Solution:
- Increase the wait time using
WebDriverWait. - Re-fetch the web element after certain actions.
- Implement retry mechanisms.
- Increase the wait time using
-
CAPTCHA or Cloudflare Challenges
Cause: Forex Factory may employ CAPTCHA or Cloudflare protection to prevent automated scraping.
Solution:
- Use
undetected-chromedriverto bypass some protections. - Implement delays between requests to mimic human behavior.
- Use proxies if necessary.
- Be mindful of the scraping rate to avoid IP bans.
- Use
-
Incorrect Date Parsing
Cause: Mismatch between the date format in the CSV and the expected format in the script.
Solution:
- Ensure that dates in the CSV are in ISO format (
YYYY-MM-DDTHH:MM:SS). - Modify the
get_last_datetime_from_csvfunction if your date format differs.
- Ensure that dates in the CSV are in ISO format (
-
Missing or Incorrect XPath Selectors
Cause: Changes in the Forex Factory website structure leading to incorrect XPath selectors.
Solution:
- Verify the current structure of the Forex Factory website.
- Update XPath selectors in the scraper accordingly.
-
Browser Driver Issues
Cause: Incompatible or outdated ChromeDriver versions.
Solution:
- Ensure that
undetected-chromedriveris up to date. - Verify that Google Chrome is updated to the latest version.
- Ensure that
Logs provide detailed information about the scraping process and can help identify issues.
- Info Logs: Provide general information about the scraping progress.
- Warning Logs: Indicate non-critical issues that do not stop the scraper.
- Error Logs: Highlight critical issues that may require attention.
Ensure that your terminal or log files capture these logs for effective debugging.
Contributions are welcome! If you encounter bugs or have suggestions for improvements, feel free to open an issue or submit a pull request.
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/YourFeatureName
-
Commit Your Changes
git commit -m "Add your message here" -
Push to the Branch
git push origin feature/YourFeatureName
-
Open a Pull Request
Provide a clear description of your changes and the problem they solve.
This project is licensed under the MIT License.
Disclaimer: This scraper is intended for personal use and educational purposes only. Ensure compliance with Forex Factory's Terms of Service and avoid violating any usage policies. Use responsibly.