Skip to content

0xbs0d/map-vectorizer

 
 

Repository files navigation

Map polygon and feature extractor

An NYPL Labs project

Author: Mauricio Giraldo Arteaga @mgiraldo / NYPL Labs @nypl_labs

A open-source map vectorizer. Provided as is by NYPL Labs. Project based on a workflow suggested by Michael Resig.

NOTE: A proper README and instructions will be created. For now this is a list of requirements to get the project running.

Like OCR for maps

This project aims to automate a manual process: geographic polygon and attribute data extraction from maps including those from insurance atlases published in the 19th and early 20th centuries. Here is some background on why we're doing this and here is one of the maps we're extracting polygons from. This example map layer shows what these atlases look like once geo-rectified, i.e. geographically normalized.

The New York Public Library has hundreds of atlases with tens of thousands of these sheets and there is no way we can extract data manually in a reasonable amount of time. Just so you get an idea, it tool NYPL staff coordinating a small army of volunteers three years to produce 170,000 polygons with attributes (from just four of hundreds of atlases at NYPL).

It will now take a day to produce a comparable number of polygons with some basic metadata.

The goal is to extract the following data (✔ = mostly solved so far, ✢ = in process):

  • ✔ shape
  • ✔ color
  • ✢ dot presence
  • ✢ dot count
  • ✢ dot type (full vs outline)
  • presence of skylights
  • numbers if any (not optimistic about this one but maybe you know a way)

Example input

Example input map

Example output

The resulting shapefile output superimposed

Extra feature detection

Extra feature detection for the polygon

Dependencies

A few things to be installed in your system in order to work properly. So far it has been tested on Mac OS X Lion so these instructions apply to that configuration only. I am sure you will be able to adapt it to your current configuration.

  • Python with OpenCV
  • ImageMagick
  • R with the rgdal, alphahull, igraph and shapefiles libraries. Make sure R is in your PATH (so you can run it via command-line by typing R)
  • GIMP
  • GDAL Tools
  • It is also a good idea to install QGIS to test your results

First run

These step by step instructions should work as-is. If not, check all the above are working before submitting an issue.

  1. Add the gimp-scripts/ folder to the GIMP Script-Fu folders in Preferences > Scripts. Make sure to run GIMP at least once if you restart your machine (not sure why it behaves this way... I am trying to make the project non-GIMP-dependent so this won't become an issue)
  2. Add executable privileges to the main vectorize_map.py script like so: chmod +x vectorize_map.py. The other Python files are some test files we use and might be excluded in later commits but feel free to browse them.
  3. Take note of the path where the GIMP executable is installed (another reason why I want to remove GIMP from the process).

And finally:

  1. Run the script on the provided test GeoTIFF: ./vectorize_map.py test.tif
  2. Accept the GIMP folder location or input a different one and press ENTER.

This should take about 70 seconds to process. If it takes less there might be an error (or your machine rulez). Take a look at the console output to find the possible culprit.

If it works, you will see a test folder with a test-traced set of files (.shp, .dbf, .prj and .shx) and two log files.

Acknowledgements

Change log

  • Added a config file (rename vectorize_config_default.txt to vectorize_config.txt).
  • Added consolidator.py to assemble a set of shapefiles in a folder into a single file.
  • Added very rough OpenCV circle and cross detection (not working very well but it is a starting point).
  • Added GeoJSON output.

About

A open-source map vectorizer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published