This repository contains the experimental code and the comprehensive survey paper associated with our research.
In this work, we introduce a novel framework for thoroughly assessing the fairness of visual models. Our framework extends existing evaluation methods by considering three key dimensions: Inter-Attribute, Inter-Class, and Intra-Class. These dimensions measure the fairness level of attributes and classes both in relation to each other and internally within an application scenario. We apply this framework to evaluate and analyze the fairness of various common computer vision models, providing guidance for subsequent fairness optimization efforts.
Based on our survey and evaluation framework, we have developed an online model evaluation platform. You can access it at: https://model-evaluation.vipazoo.com.
We have compiled some of the original papers on evaluating metrics in the table below. See the original article for details on the nature of each indicator.
Metrics | Purpose & Paper |
---|---|
Demographic Parity | Ensures that the prediction rate is equal across different sensitive groups. Buolamwini and Gebru (2018) |
Conditional Statistical Parity | Ensures equal prediction rates across groups after conditioning on certain attributes. Ramaswamy et al. (2021) |
Disparate Impact | Measures the ratio of favorable outcomes between groups. Barocas and Hardt (2016) |
Predictive Parity | Ensures equal positive predictive value across groups. Chouldechova (2017) |
Predictive Equality | Ensures equal false positive rates across groups. Hardt et al. (2016) |
Equal Opportunity | Ensures equal true positive rates across groups. Hardt et al. (2016) |
Equalized Odds | Ensures equal true positive and false positive rates across groups. Hardt et al. (2016) |
Conditional Use Accuracy Equality | Ensures equal use accuracy conditioned on the prediction. Pleiss et al. (2017) |
Overall Accuracy Equality | Ensures overall accuracy is equal across groups. Dieterich et al. (2016) |
Test-fairness | Ensures the test set performance is fair across groups. Friedler et al. (2019) |
Well-calibration | Ensures predictions are well-calibrated across groups. Kleinberg et al. (2017) |
Balance for Positive Class | Measures balance for positive class across groups. Chouldechova (2017) |
Balance for Negative Class | Measures balance for negative class across groups. Chouldechova (2017) |
Tanimoto Coefficient | Measures similarity between predicted and true labels. Van Deursen et al. (2015) |
Cosine Similarity | Measures cosine similarity between predicted and true labels. Zhang et al. (2017) |
Spearman Correlation | Measures the rank correlation between predicted and true labels. Spearman (1904) |
Neuron Distance | Measures the distance between neurons in neural networks. Li et al. (2015) |
Coverage Ratio | Measures the ratio of covered to total instances. Georgopoulos et al. (2021) |
BiasAmpMALS | Measures bias amplification in multi-attribute latent space. Zhang et al. (2021) |
BiasAmp | Measures bias amplification. Zhang et al. (2018) |
We encourage contributions from the community. If you encounter any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.