Photo by Denise Jans on Unsplash

Geopandas: Accessible, Yet Powerful GIS With Python

You are here!

--

Recently, I have been doing a lot more work on data that have a spatial meaning. Lucky for me, I have always been interested in GIS and, in my professional life, I have used a lot of GIS packages such as ArcGIS, MapInfo, and qGIS. But this time, as the data needs more processing, I did not go back to my old friends and, instead, I focussed on using Python to handle this information.

Because Python is open-source, there are lots of packages available to work with spatial data. It has never been easier to draw a map in Python. Folium, Bokeh, Plotly, or Altair achieve such a task with relative ease and, more often than not, the result is quite convincing. But these libraries are visualisation libraries, not really designed to analyse the data in a geospatial way. If we have a look at the more “back-end” libraries, or at least not the visualisation ones, we see that there are even more libraries available for spatial data handling.

The tools are there in Python but, unlike with the GIS software I am used to, they are not conveniently bundled, easily accessible in an ordered menu.

Fortunately, I discovered GeoPandas. A game changer!

The Basics

>>> pip install geopandas
>>> pip install descartes

As the name suggests, GeoPandas is going to be very similar to Pandas. It actually is one of its dependencies. Another dependency to be aware of is Shapely. It is an incredibly useful package you will most certainly start using pretty soon after plotting maps with GeoPandas. Talking about drawing maps, Descartes is not a dependency, but as it is required to generate the plots, we want it installed.

>>> import geopandas
>>> import matplotlib.pyplot as plt

In a nutshell, GeoPandas uses Pandas to do the data heavy-lifting, Matplotlib for the plotting, and libraries such as Shapely or Fiona to process geographic data.

# GeoPandas comes with a few toy GeoDataFrames:
>>> geopandas.datasets.available
['naturalearth_lowres', 'naturalearth_cities', 'nybb']
>>> world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))>>> world.head()
png
>>> ax = world.plot(column='pop_est',
cmap='Spectral',
figsize=(15, 15),
edgecolor='white',
label='name',
legend=True,
legend_kwds={"label":"Population",
"orientation":"horizontal"
}
);

>>> ax.axis('off');
png

The world dataset provided has a tidy structure, you would call it a DataFrame if you are familiar with Pandas, or a table if you are familiar with Excel/SQL/etc. The geometry column does look a little bit different, though. But it makes perfect sense once you realise it contains the list of coordinates that define the shape associated with the country (shapely can create those “objects”, wink wink). The one observation per line rule applies.

If you are using jupyter notebook, you can quickly plot that shape by slicing the dataset to its geometry:

>>> world['geometry'][4]
Jupyter Notebook rendering of [‘geometry’].

To display a background map, you can use some matplotlib wizardry and plot the background map as the “ax” object. As you might have noticed already GeoPandas pipes its plotting attributes directly from matplotlib:

>>> ax = world.plot(facecolor='none', edgecolor='black', figsize=(15,15));>>> cities  = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))>>> cities.plot(ax=ax, color='red', marker='o');>>> ax.axis('off');
png
Photo by Milan De Clercq on Unsplash

Relationship Between Geospatial Data

Plotly, Bokeh, Altair, and other visualisation libraries are solid competitors up to this point, but where GeoPandas arguably stands out is with geospatial data handling.

Reusing our toy datasets, we can use the .squeeze() method to "squeeze" our GeoSeries geometry into a shape object:

# Points
>>> london = cities.loc[cities['name']=='London', 'geometry'].squeeze()
>>> paris = cities.loc[cities['name']=='Paris', 'geometry'].squeeze()

#Polygons
>>> france = world.loc[world['name']=='France', 'geometry'].squeeze()
>>> uk = world.loc[world['name']=='United Kingdom', 'geometry'].squeeze()

Having shape objects, we can now process them as geospatial data. It means we can assess their relationship, measure the distance between them, compute their area, etc.

>>> london.within(uk)
True
>>> uk.touches(france)
False
>>> uk.area
34.20295398919941
>>> london.distance(paris)
3.5968122687166146

To create a geo-referenced object/shape, you can use the Shapely library:

>>> from shapely.geometry import LineString
>>> london_paris = LineString([london, paris])
>>> geopandas.GeoSeries([london, paris, london_paris, london.buffer(5), uk, france]).plot(figsize=(15,15), cmap='Spectral');

# Note how the london.buffer(5) "layer" is below the uk layer.
>>> plt.xlim(-10,10);
>>> plt.ylim(40, 60);
# axis limits are necessary because France includes French Guiana in this dataset (unlike some dataset which limit France to Metropolitan France)
png
>>> paris.within(london.buffer(5))
True
>>> london_paris.intersects(uk)
True

You can create new shapes by using operators like .difference(), .union(), or even the .overlay() method:

>>> london.buffer(5).difference(uk)
Spatial differences between a 5 degree buffer around London and the UK mainland.
Photo by Annie Spratt on Unsplash

Coordinate Reference System

You might have noticed the ['geometry'] column gives the latitude/longitude (degrees), so GeoPandas returned values in degrees. This is due to GeoPandas returning the results in the units of the GeoDataFrame's Coordinate Reference System when calculating distances and areas.

I don’t know if you visualise what a degree² represents but if you are like me, you will be happy to know GeoPandas comes with built-in tools to address crs transformations.

The .crs property verifies the projection data:

>>> world.crs<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Whilst the .to_crs() method allows the reference system to be changed:

>>> world_utm = world.to_crs(epsg=23031)
>>> cities_utm = cities.to_crs(epsg=23031)
>>> world_utm.crs<Projected CRS: EPSG:23031>
Name: ED50 / UTM zone 31N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Europe - 0°E to 6°E and ED50 by country
- bounds: (0.0, 38.56, 6.01, 82.41)
Coordinate Operation:
- name: UTM zone 31N
- method: Transverse Mercator
Datum: European Datum 1950
- Ellipsoid: International 1924
- Prime Meridian: Greenwich
# Points
>>> london_utm = cities_utm.loc[cities_utm['name']=='London', 'geometry'].squeeze()
>>> paris_utm = cities_utm.loc[cities_utm['name']=='Paris', 'geometry'].squeeze()
#Polygons
>>> france_utm = world_utm.loc[world_utm['name']=='France', 'geometry'].squeeze()
>>> uk_utm = world_utm.loc[world_utm['name']=='United Kingdom', 'geometry'].squeeze()
>>> print(f'London to Paris:{london_utm.distance(paris_utm) / 1_000 : .2f} kilometres')London to Paris: 341.10 kilometres
>>> print(f'UK area:{uk_utm.area / 1_000_000 : .2f} km²')UK area: 250757.98 km²
Photo by David Pennington on Unsplash

Spatial Join

Just like Pandas or SQL joins, GeoPandas can join GeoDataFrames. But unlike Pandas or SQL, it can be done based on their spatial relationships!

Note that you will need to install pygeos or rtree for these operations to work.

>>> joined = geopandas.sjoin(cities, world, op='within', how='left')
png

Because we did a left join, all the cities from the naturalearth_cities are included. However, as the continents have been defined for mainland bodies mainly, small islands like Mayotte, Comoros, The Marshall Island, etc. do not have a continent assigned.

>>> joined[joined['index_right'].isna()]
png
>>> joined[joined['index_right'].isna()].plot(ax=world.plot(column='continent', figsize=(15,15)), color='red');
>>> plt.xlim(20,180);
>>> plt.ylim(-20,20);
png

Last Words

GeoPandas is a powerful library which does a fantastic job with geospatial data. It is very accessible, especially if you are already familiar with Pandas and/or Matplotlib. I believe it is a library worth adding to anyone’s GIS toolkit.

--

--