Skip to content
This repository has been archived by the owner on Jul 13, 2023. It is now read-only.

Data Loading Special Cases

SPTKL edited this page Feb 12, 2021 · 4 revisions

PTS

the PTS file is updated weekly and uploaded to the DOF SFTP server. We only pull it monthly when there's a update for PLUTO. the overall process involves:

  • get data from sftp
  • unzip
  • data cleaning (remove extra columns, remove special characters)
  • pass data into postgres (in the workflow files, they are temporary postgres containers created by github actions)
  • export data to csv (pluto_pts.csv and geocode_input_pluto_pts.csv), upload to a temporary location in digitalocean
  • load data to data library using action-library-archive
  • Then we will geocode geocode_input_pluto_pts.csv and create pluto_input_geocodes.csv
  • upload pluto_input_geocodes.csv to a temporary location in digitalocean then load data to data library using action-library-archive

pluto_pts

This is the main file for PTS, note that the original PTS file has 140 columns, and for the purpose of building pluto, we are only taking a selected 40 fields. for the complete column definition and the fields we are taking, please checkout the pluto_build/_load_pts.sql script.

pluto_input_geocodes

This is the main geocoding input for PLUTO. DOF usually has out of sync address information compared to DCP, so we geocode BBL from PTS instead of addresses. The workflow follows: pass PTS BBL through the BL function -> get Address from BL function, then pass it into 1A+1E/1B -> Get Lat Lon and other information. Please checkout pluto_build/python/geocode.py for more information.

CAMA

overall workflow:

  • get data from sftp
  • unzip
  • data cleaning (remove extra columns, remove special characters)
  • pass data into postgres (in the workflow files, they are temporary postgres containers created by github actions)
  • export data to csv (cama.csv), upload to a temporary location in digitalocean
  • load data to data library using action-library-archive

pluto_input_cama_dof

We receive the CAMA file from DOF via SFTP.

Number of Buildings

overall workflow:

  • pull bin info from opendata using the socrata API
  • geocode BIN to BBL
  • load data to data library using action-library-archive

pluto_input_numbldgs

To enhance number of buildings on a lot, we are geocoding BIN information from building footprints and group by BBL to get a more up-to-date count of number of buildings pluto_build/python/numbldgs.py