This is one of the 100+ free recipes of the IPython Cookbook, Second Edition, by Cyrille Rossant, a guide to numerical computing and data science in the Jupyter Notebook. The ebook and printed book are available for purchase at Packt Publishing.
▶ Text on GitHub with a CC-BY-NC-ND license
▶ Code on GitHub with a MIT license
Chapter 14 : Graphs, Geometry, and Geographic Information Systems
In this recipe, we load and visualize a dataset containing many flight routes and airports around the world (obtained from the OpenFlights website at https://openflights.org/data.html).
To draw the graph on a map, you need cartopy, available at http://scitools.org.uk/cartopy/. You can install it with conda install -c conda-forge cartopy
.
- Let's import a few packages:
import math
import json
import numpy as np
import pandas as pd
import networkx as nx
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
from IPython.display import Image
%matplotlib inline
- We load the first dataset containing many flight routes:
names = ('airline,airline_id,'
'source,source_id,'
'dest,dest_id,'
'codeshare,stops,equipment').split(',')
routes = pd.read_csv(
'https://github.com/ipython-books/'
'cookbook-2nd-data/blob/master/'
'routes.dat?raw=true',
names=names,
header=None)
routes
- We load the second dataset with details about the airports, and we only keep the airports from the United States:
names = ('id,name,city,country,iata,icao,lat,lon,'
'alt,timezone,dst,tz,type,source').split(',')
airports = pd.read_csv(
'https://github.com/ipython-books/'
'cookbook-2nd-data/blob/master/'
'airports.dat?raw=true',
header=None,
names=names,
index_col=4,
na_values='\\N')
airports_us = airports[airports['country'] ==
'United States']
airports_us
The DataFrame index is the IATA code, a 3-characters code identifying the airports.
- Let's keep all national US flight routes, that is, those for which the source and the destination airports belong to the list of US airports:
routes_us = routes[
routes['source'].isin(airports_us.index) &
routes['dest'].isin(airports_us.index)]
routes_us
- We construct the list of edges representing our graph, where nodes are airports, and two airports are connected if there exists a route between them (flight network):
edges = routes_us[['source', 'dest']].values
edges
array([['ADQ', 'KLN'],
['KLN', 'KYK'],
['BRL', 'ORD'],
...,
['SOW', 'PHX'],
['VIS', 'LAX'],
['WRL', 'CYS']], dtype=object)
- We create the networkX graph from the
edges
array:
g = nx.from_edgelist(edges)
- Let's take a look at the graph's statistics:
len(g.nodes()), len(g.edges())
(546, 2781)
There are 546 US airports and 2781 routes in the dataset.
- Let's plot the graph:
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
nx.draw_networkx(g, ax=ax, node_size=5,
font_size=6, alpha=.5,
width=.5)
ax.set_axis_off()
- There are a few airports that are not connected to the rest of the airports. We keep the largest connected component of the graph as follows (the subgraphs returned by
connected_component_subgraphs()
are sorted by decreasing size):
sg = next(nx.connected_component_subgraphs(g))
- Now, we plot the largest connected component subgraph:
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
nx.draw_networkx(sg, ax=ax, with_labels=False,
node_size=5, width=.5)
ax.set_axis_off()
The graph encodes only the topology (connections between the airports) and not the geometry (actual positions of the airports on a map). Airports at the center of the graph are the largest US airports.
- We're going to draw the graph on a map, using the geographical coordinates of the airports. First, we need to create a dictionary where the keys are the airports IATA codes, and the values are the coordinates:
pos = {airport: (v['lon'], v['lat'])
for airport, v in
airports_us.to_dict('index').items()}
- The node sizes will depend on the degree of the nodes, that is, the number of airports connected to every node:
deg = nx.degree(sg)
sizes = [5 * deg[iata] for iata in sg.nodes]
- We will also show the airport altitude as the node color:
altitude = airports_us['alt']
altitude = [altitude[iata] for iata in sg.nodes]
- We will display the labels of the largest airports only (at least 20 connections to other US airports):
labels = {iata: iata if deg[iata] >= 20 else ''
for iata in sg.nodes}
- Finally, we use cartopy to project the points on the map:
# Map projection
crs = ccrs.PlateCarree()
fig, ax = plt.subplots(
1, 1, figsize=(12, 8),
subplot_kw=dict(projection=crs))
ax.coastlines()
# Extent of continental US.
ax.set_extent([-128, -62, 20, 50])
nx.draw_networkx(sg, ax=ax,
font_size=16,
alpha=.5,
width=.075,
node_size=sizes,
labels=labels,
pos=pos,
node_color=altitude,
cmap=plt.cm.autumn)
- Manipulating and visualize graphs with NetworkX
- Manipulating geospatial data with Cartopy