Author: Mauricio Giraldo Arteaga @mgiraldo / NYPL Labs @nypl_labs
A open-source map vectorizer. Provided as is by NYPL Labs. Project based on a workflow suggested by Michael Resig.
NOTE: A proper README
and instructions will be created. For now this is a list of requirements to get the project running.
This project aims to automate a manual process: geographic polygon and attribute data extraction from maps including those from insurance atlases published in the 19th and early 20th centuries. Here is some background on why we're doing this and here is one of the maps we're extracting polygons from. This example map layer shows what these atlases look like once geo-rectified, i.e. geographically normalized.
The New York Public Library has hundreds of atlases with tens of thousands of these sheets and there is no way we can extract data manually in a reasonable amount of time. Just so you get an idea, it tool NYPL staff coordinating a small army of volunteers three years to produce 170,000 polygons with attributes (from just four of hundreds of atlases at NYPL).
It will now take a day to produce a comparable number of polygons with some basic metadata.
The goal is to extract the following data (✔ = mostly solved so far, ✢ = in process):
- ✔ shape
- ✔ color
- ✢ dot presence
- ✢ dot count
- ✢ dot type (full vs outline)
- presence of skylights
- numbers if any (not optimistic about this one but maybe you know a way)
A few things to be installed in your system in order to work properly. So far it has been tested on Mac OS X Lion so these instructions apply to that configuration only. I am sure you will be able to adapt it to your current configuration.
- Python with OpenCV
- ImageMagick
- R with the
rgdal
,alphahull
,igraph
andshapefiles
libraries. Make sure R is in your PATH (so you can run it via command-line by typingR
) - GIMP
- GDAL Tools
- It is also a good idea to install QGIS to test your results
These step by step instructions should work as-is. If not, check all the above are working before submitting an issue.
- Add the
gimp-scripts/
folder to the GIMP Script-Fu folders inPreferences > Scripts
. Make sure to run GIMP at least once if you restart your machine (not sure why it behaves this way... I am trying to make the project non-GIMP-dependent so this won't become an issue) - Add executable privileges to the main
vectorize_map.py
script like so:chmod +x vectorize_map.py
. The other Python files are some test files we use and might be excluded in later commits but feel free to browse them. - Take note of the path where the GIMP executable is installed (another reason why I want to remove GIMP from the process).
And finally:
- Run the script on the provided test GeoTIFF:
./vectorize_map.py test.tif
- Accept the GIMP folder location or input a different one and press ENTER.
This should take about 70 seconds to process. If it takes less there might be an error (or your machine rulez). Take a look at the console output to find the possible culprit.
If it works, you will see a test
folder with a test-traced
set of files (.shp
, .dbf
, .prj
and .shx
) and two log files.
- Michael Resig
- Chris Garrard for his sample code to assemble and disassemble shapefiles
- Barry Rowlingson for his tutorial on converting alpha shapes to polygons
- Added a config file (rename
vectorize_config_default.txt
tovectorize_config.txt
). - Added
consolidator.py
to assemble a set of shapefiles in a folder into a single file. - Added very rough OpenCV circle and cross detection (not working very well but it is a starting point).
- Added GeoJSON output.