This tool is designed for students, researchers, data scientists, or anyone who would like to have access to SICAR files.
- Get cities-codes by state code
- Download Shapefile or CSV
- Download city by code
- Download lists of cities by code
- Download all cities in a state by code
- Download the entire country
- Tesseract, and PaddleOCR (Optional) drivers to automatically detect captcha
Install SICAR with pip
pip install git+https://github.com/urbanogilson/SICAR
Prerequisite:
Google Tesseract OCR (additional info on how to install the engine on Linux, Mac OSX, and Windows).
Optional: PaddleOCR (additional info on how to install the engine on Linux, Mac OSX, and Windows).
If you don't want to install dependencies on your computer or don't know how to install them, we strongly recommend Google Colab.
from SICAR import Sicar
import pprint
# Create Sicar instance
car = Sicar(email = "[email protected]")
# Get cities codes in Roraima state
cities_codes = car.get_cities_codes(state='RR')
pprint.pprint(cities_codes)
# {'Alto Alegre': '1400050',
# 'Amajari': '1400027',
# 'Boa Vista': '1400100',
# 'Bonfim': '1400159',
# 'Cantá': '1400175',
# 'Caracaraí': '1400209',
# 'Caroebe': '1400233',
# 'Iracema': '1400282',
# 'Mucajaí': '1400308',
# 'Normandia': '1400407',
# 'Pacaraima': '1400456',
# 'Rorainópolis': '1400472',
# 'São João da Baliza': '1400506',
# 'São Luiz': '1400605',
# 'Uiramutã': '1400704'}
# Download 'Alto Alegre': '1400050'
car.download_city_code('1400050', folder='Roraima')
# Download in CSV format
from SICAR import OutputFormat
car.download_city_code('1400050', output_format = OutputFormat.CSV, folder='Roraima')
# Download specific cities
cities_codes = {
'São Gabriel da Cachoeira': '1303809',
'São Paulo de Olivença': '1303908'
}
car.download_cities(cities_codes=cities_codes, folder='cities')
# Download all cities in Roraima state
car.download_state(state='RR', folder='RR')
Optical character recognition (OCR) drivers are used to recognize characters in a captcha.
We currently have two options for automating the download process.
Tesseract OCR (Default)
from SICAR import Sicar
from SICAR.drivers import Tesseract
# Create Sicar instance using Tesseract OCR
car = Sicar(email="[email protected]", driver=Tesseract)
# Download a city
car.download_cities(cities_codes={'Belo Horizonte': '3106200'}, folder='SICAR/cities')
Install SICAR with pip and include Paddle dependencies
pip install 'SICAR[paddle] @ git+https://github.com/urbanogilson/SICAR'
from SICAR import Sicar
from SICAR.drivers import Paddle
# Create Sicar instance using PaddleOCR
car = Sicar(email="[email protected]", driver=Paddle)
# Download a city
car.download_cities(cities_codes={'Balneário Camboriú': '4202008'}, folder='SICAR/cities')
Using Google Colab, you don't need to install the dependencies on your computer and you can save files directly to your Google Drive.
Pull Image from Docker Hub urbanogilson/sicar
docker pull urbanogilson/sicar:latest
Run the downloaded Docker Image using an entry point (file) from your machine (host)
docker run -i -v $(pwd):/sicar urbanogilson/sicar:latest -<./examples/docker.py
Note: Update the entry point file ./examples/docker.py or create a new one to download data based on your needs.
or pass a script through STDIN
docker run -i -v $(pwd):/sicar urbanogilson/sicar:latest -<<EOF
from SICAR import Sicar
from SICAR.drivers import Paddle
car = Sicar(email="[email protected]", driver=Paddle)
car.download_state(state='MG', folder='MG')
EOF
Note: Using $(pwd)
the container will save the download data into the current folder.
Optional: Make an external directory to store the downloaded data and use a volume parameter in the run command to point to it.
- Download city by name
- Make Paddle driver optional
- Add support to download CSV files
The development environment with all necessary packages is available using Visual Studio Code Dev Containers.
Contributions are always welcome!
If you have any feedback, please reach me at [email protected]