Skip to content

[FEATURE]: Add support for defining constraints in generator configuration #594

@scriminaci

Description

@scriminaci

Feature Summary

Enable users to specify and enforce constraints during synthetic data generation, ensuring that generated datasets respect relationships and rules across columns (e.g., column A must be greater than column B, or only pre-existing combinations of values are allowed).

Problem and Solution

Problem
Currently, the generator produces synthetic data without a mechanism to enforce user-defined constraints between columns or across multiple fields. This can lead to generated records that are mathematically valid at a distributional level but semantically invalid for the intended use case.

For example:

Ensuring that an “end_date” is always after a “start_date”.

Guaranteeing that “age” is always greater than or equal to “years_employed”.

Restricting outputs so that no new, unrealistic combinations of categorical values are introduced (e.g., Country = USA with Currency = EUR).

Without constraint support, users must post-process generated data, risking the production of unusable datasets for downstream testing, simulation, or analytics.

Proposed Solution
Introduce a constraint definition interface for the generator that allows users to specify logical or relational rules between columns. Examples include:

Relational constraints: column_A > column_B > column_C for numerical and datetime

Equality constraints: column_X == column_Y for any column type

Set membership constraints: (column_A, column_B) in allowed_combinations for categorical columns

The generator should validate and enforce these constraints during training and sampling, ensuring that all generated records comply.

This feature would significantly enhance the reliability and usability of synthetic data, reducing the need for extensive post-processing and providing users with greater confidence in the generated datasets.

Potential Alternatives

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions