-
Notifications
You must be signed in to change notification settings - Fork 13
Data Loading Special Cases
the PTS file is updated weekly and uploaded to the DOF SFTP server. We only pull it monthly when there's a update for PLUTO. the overall process involves:
- get data from sftp
- unzip
- data cleaning (remove extra columns, remove special characters)
- pass data into postgres (in the workflow files, they are temporary postgres containers created by github actions)
- export data to csv (
pluto_pts.csv
andgeocode_input_pluto_pts.csv
), upload to a temporary location in digitalocean - load data to data library using
action-library-archive
- Then we will geocode
geocode_input_pluto_pts.csv
and createpluto_input_geocodes.csv
- upload
pluto_input_geocodes.csv
to a temporary location in digitalocean then load data to data library usingaction-library-archive
This is the main file for PTS, note that the original PTS file has 140 columns, and for the purpose of building pluto, we are only taking a selected 40 fields. for the complete column definition and the fields we are taking, please checkout the pluto_build/_load_pts.sql
script.
This is the main geocoding input for PLUTO. DOF usually has out of sync address information compared to DCP, so we geocode BBL from PTS instead of addresses. The workflow follows: pass PTS BBL through the BL function -> get Address from BL function, then pass it into 1A+1E/1B -> Get Lat Lon and other information. Please checkout pluto_build/python/geocode.py
for more information.
overall workflow:
- get data from sftp
- unzip
- data cleaning (remove extra columns, remove special characters)
- pass data into postgres (in the workflow files, they are temporary postgres containers created by github actions)
- export data to csv (
cama.csv
), upload to a temporary location in digitalocean - load data to data library using
action-library-archive
We receive the CAMA file from DOF via SFTP.
overall workflow:
- pull bin info from opendata using the socrata API
- geocode BIN to BBL
- load data to data library using
action-library-archive
To enhance number of buildings on a lot, we are geocoding BIN information from building footprints and group by BBL to get a more up-to-date count of number of buildings pluto_build/python/numbldgs.py