A collection of scripts to parse Indian Budget documents into clean machine readable formats.
Currently tested on Linux and Mac OS
-
Clone the repository
git clone https://github.com/cbgaindia/parsers.git
-
Install package dependencies:
pip install -r requirements.txt
-
Install software dependencies:
- Tabula Java: https://github.com/tabulapdf/tabula-java
- ImageMagic: https://www.imagemagick.org/script/install-source.php
- Finding appropriate parser: All parsers are arranged according to tiers of government, to see usage run script with help(-h) option.
To scrape budget data files from various sources, please refer to https://github.com/cbgaindia/scrapers
Please refer to http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html for documentation style guide lines.