Skip to content

[FEATURE] Data profiling in Databricks for data quality #52

@jonaslieben

Description

@jonaslieben

Problem statement
As a user implementing data quality in DataBricks, besides implementing DQ validations, I want to perform data profiling which is directly supported in Databricks but also save the profiling results, so that we can use them for data observability reasons and detecting strange patterns.

Describe the solution you'd like

  • Profiling functionality in Spark Expectations which also allows you to save data profiles over time in a different table
  • Ability to define expectations on data profiling information (for user friendliness)

Describe alternatives you've considered

  • There are some tools on the market which have this capability, but they are not directly available within the Databricks ecosystem. There are tools like Informatica which can perform this activity.
  • Default DataBricks profiling capability. This is useful for doing ad-hoc data profiling, but cannot save the results and therefore lacks the potential to do data observability

Additional context
I can give additional context if needed or if the requirement is not entirely clear.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions