Overview

This repository is part of The Synthetic Data Vault Project, a project from DataCebo.

Overview

The SDMetrics library evaluates synthetic data by comparing it to the real data that you're trying to mimic. It includes a variety of metrics to capture different aspects of the data, for example quality and privacy. It also includes reports that you can run to generate insights, visualize data and share with your team.

The SDMetrics library is model-agnostic, meaning you can use any synthetic data. The library does not need to know how you created the data.

Install

Install SDMetrics using pip or conda. We recommend using a virtual environment to avoid conflicts with other software on your device.

pip install sdmetrics

conda install -c conda-forge sdmetrics

For more information about using SDMetrics, visit the SDMetrics Documentation.

Usage

Get started with SDMetrics Reports using some demo data,

from sdmetrics import load_demo
from sdmetrics.reports.single_table import QualityReport

real_data, synthetic_data, metadata = load_demo(modality='single_table')

my_report = QualityReport()
my_report.generate(real_data, synthetic_data, metadata)

Creating report: 100%|██████████| 4/4 [00:00<00:00,  5.22it/s]

Overall Quality Score: 82.84%

Properties:
Column Shapes: 82.78%
Column Pair Trends: 82.9%

Once you generate the report, you can drill down on the details and visualize the results.

my_report.get_visualization(property_name='Column Pair Trends')

Save the report and share it with your team.

my_report.save(filepath='demo_data_quality_report.pkl')

# load it at any point in the future
my_report = QualityReport.load(filepath='demo_data_quality_report.pkl')

Want more metrics? You can also manually apply any of the metrics in this library to your data.

# calculate whether the synthetic data respects the min/max bounds
# set by the real data
from sdmetrics.single_column import BoundaryAdherence

BoundaryAdherence.compute(
    real_data['start_date'],
    synthetic_data['start_date']
)

0.8503937007874016

# calculate whether the synthetic data is new or whether it's an exact copy of the real data
from sdmetrics.single_table import NewRowSynthesis

NewRowSynthesis.compute(
    real_data,
    synthetic_data,
    metadata
)

1.0

What's next?

To learn more about the reports and metrics, visit the SDMetrics Documentation.

The Synthetic Data Vault Project was first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project. Today, DataCebo is the proud developer of SDV, the largest ecosystem for synthetic data generation & evaluation. It is home to multiple libraries that support synthetic data, including:

🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data.
🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular, multi table and time series data.
📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data generation models.

Get started using the SDV package -- a fully integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries for specific needs.

Name		Name	Last commit message	Last commit date
Latest commit History 629 Commits
.github		.github
conda		conda
docs/images		docs/images
resources		resources
scripts		scripts
sdmetrics		sdmetrics
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.md		HISTORY.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASE.md		RELEASE.md
codecov.yml		codecov.yml
latest_requirements.txt		latest_requirements.txt
pyproject.toml		pyproject.toml
static_code_analysis.txt		static_code_analysis.txt
tasks.py		tasks.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Install

Usage

What's next?

About

Releases 38

Packages

Used by 362

Contributors 16

Languages

License

sdv-dev/SDMetrics

Folders and files

Latest commit

History

Repository files navigation

Overview

Install

Usage

What's next?

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 38

Packages 0

Used by 362

Contributors 16

Languages

Packages