D-ACE: Dataset Assessment and Characteristics Evaluation

Dataset quality assessment is a crucial aspect of machine learning and artificial intelligence, as the performance and accuracy of algorithms are directly dependent on the quality and characteristics of the data they are trained on. Poor quality datasets can lead to biased or inaccurate results, leading to incorrect decisions being made. Hence, it is important to measure the quality of datasets and identify any potential issues before using them for training machine learning models.

D-ACE is a framework designed to assess the quality and characteristics of datasets, helping to identify any potential issues that may affect the performance of machine learning algorithms. This framework provides a comprehensive evaluation of the dataset, taking into account factors such as missing values, class imbalance, data heterogeneity, and more. D-ACE can be used to improve the dependability of machine learning algorithms by providing a detailed evaluation of the dataset and identifying any potential issues that may affect the performance of the algorithms. By addressing these issues and ensuring the quality of the dataset, the performance of machine learning algorithms can be improved, leading to more accurate and reliable results. In general, D-ACE can be a valuable tool for measuring the quality of datasets and ensuring the dependability of machine learning algorithms.

Currently Supporting Characteristics:

Characteristics	Characteristics
Dimensionality (d)	NrOfInstances (N)
NrOfClasses (C)	ZeroSparsity (OS)
NaNSparsity (NS)	DataSparsity (DS)
DataSparsityRatio (DSR)	Correlation of Features with Class (CorrFC)
Correlation of Features without Class (CorrFNC)	Multivariate Normality (MVN)
Homogeneity of class covariance (HCCov)	Intrinsic Dimensionality-PCA (ID)
Intrinsic Dimensionality Ratio (IDR)	Feature Noise variance (FN1)
Feature Noise paper (FN2)

To-do:

adding dataset separability evaluation metrics
adding geometric characteristics
adding miss-labeling ratio
adding algorithms like Data Shapley: Ghorbani, A., & Zou, J. (2019, May). Data shapley: Equitable valuation of data for machine learning. In International Conference on Machine Learning (pp. 2242-2251). PMLR.
Checking for the dataset balance with respect to sensitive features for fairness evaluation

Collaborators

Dependable Intelligent Systems Lab., University of Hull

Fraunhofer Institute for Experimental Software Engineering

Contributors

Jerin Antony
Akinwande Adegbola
Zhibao Mian
Septavera Sharvia
Koorosh Aslansefat
Mohammad Naveed Akram
Iannis Sorokos
Yiannis Papadopoulos

License

This framework is available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github		.github
dace		dace
data		data
docs		docs
draft		draft
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

D-ACE: Dataset Assessment and Characteristics Evaluation

Currently Supporting Characteristics:

Collaborators

Contributors

License

About

Releases

Packages

Languages

License

Dependable-Intelligent-Systems-Lab/Dataset-Characteristics

Folders and files

Latest commit

History

Repository files navigation

D-ACE: Dataset Assessment and Characteristics Evaluation

Currently Supporting Characteristics:

Collaborators

Contributors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages