PyGBS stands for Python for Geo-Bias Scores. It provides users with a collection of information-theoretic, model-agnostic metrics of geospatial bias (i.e., non-uniform model performance across different regions on Earth). It is complementary to existing model performance metrics such as accuracy, precision, recall and reciprocal rank.
This project aims at providing a plug-and-play toolbox for researchers to painlessly benchmarking the geo-bias of their models, encouraging them to report the geo-bias scores alongside with other metrics, and using geo-bias scores as training objectives to help harness the geo-bias of models. We hope this will boost the fairness and trustworthiness of spatial data analysis and GeoAI research -- in fact, we find that while introducing fine-grained geospatial information into models greatly improves the performance, it also significantly increases geo-bias. It is important to pay attention to the both sides of the coin.
Different geo-bias scores focus on different spatial aspects of geospatial bias. In this initial version, we introduce two most important categories of GBS: Spatial Self-Information (SSI) Scores and Spatial Relative Information (SRI) Scores. As the names indicate, SSI focuses on the bias originating from the spatial arrangement within each region of interest (i.e., self-information), while SRI focuses on the bias originating from the performance mis-alignment between different regions of interest (i.e., relative information). Each category of GBS consists of different GBS implementations, further specifying what spatial arrangement/performance mis-alignment we exactly take into consideration. Select the GBS implementation that best fits your research questions.
We will be actively supporting more GBS categories in the future.
In classic geostatistics, an unmarked measurement only cares about the spatial distribution of data (i.e., the data points are not "marked" with specific values), while a marked measurement additionally cares about the values of data. Following this naming tradition, we have two types of SSI Scores:
It only accounts for the spatial arrangement of the observed data.
It comprehensively accounts for both the spatial arrangement of the observed data, and how the high/low performance values scatter through the data. Beware that the Marked SSI Score can not be directly compared with the Unmarked SSI Score.
SRI measures the heterogeneity of model performance within a given region of interest (ROI). If a model is not geo-biased, its performance should be similar across the entire ROI and within any local partition of the ROI. Based on how we partition the ROI, we provide 4 types of SRI Scores, each score corresponding to a certain type of spatial heterogeneity.
Partition the ROI into smaller grids.
Partition the ROI into concentric distance lags.
Partition the ROI into semivariogram-style data pair bins.
Partition the ROI into sectors.