Stop Thinking, Just Do!

Sungsoo Kim's Blog

22 Python libraries for Geospatial Data Analysis

tagsTags

2 September 2021


Article Source


22 Python libraries for Geospatial Data Analysis

How to harness the power of geospatial data

Spatial data, Geospatial data, GIS data or geodata, are names for numeric data that identifies the geographical location of a physical object such as a building, a street, a town, a city, a country, etc. according to a geographic coordinate system.

From the spatial data, you can find out not only the location but also the length, size, area or shape of any object.

An example of a kind of spatial data that you can get are: coordinates of an object such as latitude, longitude, and elevation.

Geographic Information Systems (GIS) or other specialized software applications can be used to access, visualize, manipulate and analyze geospatial data.

Some examples of geospatial data include:

1. Vectors and Attributes

Points, lines, polygons, and other descriptive information about a location. Understanding Vector Data

Understanding Vector
Data

Vector data is a representation of a spatial element through its x and y coordinates. The most basic form of vector data is a point. Two or more points form a line, and three or more lines form a polygon.

The simplest form is to include one or more extra columns in the table that defines its geospatial coordinates. More formal encoding formats such as GeoJSON also come in handy.

GeoJSON, an extension to the JSON data format, contains a geometry feature that can be a Point, LineString, Polygon, MultiPoint, MultiLineString, or MultiPolygon.

There are several other libraries available for representing geospatial data that are all described in the Geospatial Data Abstraction Library (GDAL).

2. Point Clouds

Collected by LiDAR systems, they can be used to create 3D models.

Understanding Point Cloud data from LiDAR systems

3. Raster and Satellite Imagery

Get a birds eye view of what the Earth looks like via high resolution imagery.

Raster data is used when spatial information across an area is observed. It consists of a matrix of rows and columns with some information associated with each cell.

An example of raster data is a satellite image of a nation or a city represented by a matrix that contains the weather information in each of its cells.

There are several ways that you can work with raster data in Python.

One recent package that is user-friendly is xarray, which reads netcdf files.

Terminology

  • shapefile: data file format used to represent items on a map
  • geometry: a vector (generally a column in a dataframe) used to represent points, polygons, and other geometric shapes or locations, usually represented as well-known text (WKT)
  • polygon: an area
  • point: a specific location
  • basemap: the background setting for a map, such as county borders in California
  • projection: since the Earth is a 3D spheroid, chose a method for how an area gets flattened into 2D map, using some coordinate reference system (CRS)
  • colormap: choice of a color palette for rendering data, selected with the cmap parameter
  • overplotting: stacking several different plots on top of one another
  • choropleth: using different hues to color polygons, as a way to represent data levels
  • kernel density estimation: a data smoothing technique (KDE) that creates contours of shading to represent data levels
  • cartogram: warping the relative area of polygons to represent data levels
  • quantiles: binning data values into a specified number of equal-sized groups
  • voronoi diagram: dividing an area into polygons such that each polygon contains exactly one generating point and every point in a given polygon is closer to its generating point than to any other; also called a Dirichlet tessellation

22 Python libraries for Geospatial Data Analysis

Here is the list of 22 Python libraries for geospatial data analysis:

1. Shapely

With shapely, you can create shapely geometry objects (e.g. Point, Polygon, Multipolygon) and manipulate them, e.g. buffer, calculate the area or an intersection etc.

2. Fiona

Fiona can read and write real-world data using multi-layered GIS formats and zipped virtual file systems and integrates readily with other Python GIS packages such as pyproj{.dt .iz}, Rtree, and Shapely.

3. GeoPandas

Geopandas is like pandas meet GIS. It extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. It further depends on fiona for file access and matplotlib for plotting.

4. GeoPlot

A high-level geospatial plotting library. It’s an extension to cartopy and matplotlib which makes mapping easy: like seaborn for geospatial. This is especially helpful since it builds on top of several other popular geospatial libraries, to simplify the coding that’s typically required.

5. Arcpy

If you use Esri ArcGIS, then you’re probably familiar with the ArcPy library. ArcPy is meant for geoprocessing operations. But it’s not only for spatial analysis, it’s also for data conversion, management, and map production with Esri ArcGIS.

6. Scikit-Image

Library for image manipulation, e.g. histogram adjustments, filter, segmentation/edge detection operations, texture feature extraction etc.

7. SciKit-Learn

The best and at the same time easy-to-use Python machine learning library. Regression, classification, dimensionality reductions etc.

8. Descartes

Enables plotting of shapely geometries as matplotlib paths/ patches. Also a dependency for the geometry plotting functions of geopandas.

9. RasterStats

For zonal statistics. Extracts statistics from rasters files or numpy arrays based on geometries.

10. Rasterio

Rasterio is the go-to library for raster data handling. It lets you read/write raster files to/from numpy{.dt .iz} arrays (the de-facto standard for Python array operations), offers many convenient ways to manipulate these array (e.g. masking, vectorizing etc.) and can handle transformations of coordinate reference systems. Just like any other numpy array, the data can also be easily plotted, e.g. using the matplotlib library.

11. ipyleaflet

If you want to create interactive maps, ipyleaflet is a fusion of Jupyter notebook and Leaflet. You can control an assortment of customizations like loading basemaps, geojson, and widgets. It also gives a wide range of map types to pick from including choropleth, velocity data, and side-by-side views.

12. Folium

Just like ipyleaflet, Folium allows you to leverage leaflet to build interactive web maps. It gives you the power to manipulate your data in Python, then you can visualize it with the leading open-source JavaScript library.

13. Geemap

Geemap is intended more for science and data analysis using Google Earth Engine (GEE).

14. PySAL

The Python Spatial Analysis Library contains a multitude of functions for spatial analysis, statistical modeling and plotting. It is intended to support the development of high-level applications.

15. xarray

Great for handling extensive image time series stacks, imagine 5 vegetation indices x 24 dates x 256 pixel x 256 pixel. xarray lets you label the dimensions of the multidimensional numpy array and combines this with many functions and the syntax of the pandas library (e.g. groupby, rolling window, plotting).

16. PyProj

The main purpose of the PyProj library is how it works with spatial referencing systems. It can project and transform coordinates with a range of geographic reference systems. PyProj can also perform geodetic calculations and distances for any given datum.

17. GDAL/OGR

The GDAL/OGR library is used for translating between GIS formats and extensions. QGIS, ArcGIS, ERDAS, ENVI, and GRASS GIS and almost all GIS software use it for translation in some way. At this time, GDAL/OGR supports 97 vector and 162 raster drivers.

18. RSGISLib

The RSGISLib library is a set of remote sensing tools for raster processing and analysis. To name a few, it classifies, filters, and performs statistics on imagery. My personal favorite is the module for object-based segmentation and classification (GEOBIA).

19. ReportLab

ReportLab is one of the most satisfying libraries on this list. I say this because GIS often lacks sufficient reporting capabilities. Especially, if you want to create a report template, this is a fabulous option. I don’t know why the ReportLab library falls a bit off the radar because it shouldn’t.

20. Imageio

It is a Python library that provides an easy interface to read and write a wide range of image data, including animated images, volumetric data, and scientific formats.

21. MapClassify{.dt .iz}

It implements a family of classification schemes for choropleth maps. Its focus is on the determination of the number of classes, and the assignment of observations to those classes.

22. RTree

It is a ctypes Python wrapper of lib_spatial_index that provides a number of advanced spatial indexing features.


comments powered by Disqus