Documentation, How-tos, and example code for using Great Lakes to process U-M CoreLogic data.
This repository demonstrates a workflow for processing the CoreLogic Data on the Great Lakes (GL) cluster at the University of Michigan.
The repository is organized in several directories that each demo one step in the following workflow:
- [intro-to-corelogic-data]: describes the CoreLogic data and how you can get access at the University of Michigan
- [running-jupyter-spark-gl-ondemand]: describes how a user can start a Jupyter + Spark notebook in an Open on Demand session on the Great Lakes (GL) cluster
- [processing-corelogic-using-pyspark]: demonstrates how the CoreLogic data can be processed (read, explore, filter, save/write) using PySpark
- [github-and-greatlakes]: explains how a user can clone from/commit to a Github repository from their home directory on the Great Lakes (GL) cluster