Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add support for transformations on nested fields. #146

Open
taquero-s opened this issue Dec 16, 2024 · 1 comment
Open

[FEATURE] Add support for transformations on nested fields. #146

taquero-s opened this issue Dec 16, 2024 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@taquero-s
Copy link

Is your feature request related to a problem? Please describe.

Currently most of the transformation catalog is only applicable at the top level of the spark dataframe. Creating a base transformation class to implement on nested fields could resolve this problem and deeply increase the scope of the library.

Describe the solution you'd like

Create a base transformation class that can be used to extend the functionality of the library by transforming nested fields using only dot notation.

Describe alternatives you've considered

Raw spark code implementations.

Additional context

@taquero-s taquero-s added the enhancement New feature or request label Dec 16, 2024
@dannymeijer dannymeijer added this to the 0.11 milestone Feb 27, 2025
@dannymeijer
Copy link
Member

@taquero-s can you give some insight on how you see this looking like from a API point of view? What I mean is: how would we call a Transformation class for a nested field.

Also, with nested data, the traditional route to take is to explode the data and then re-build it back to the nested structure (which is quite computationally expensive and inefficient of course). Do you have some insight on how to do this more effectively?

Additionally, we need to agree on the scope. I like doing this on a Base Class level, that would make it where this would work essentially at any cascade. Personally I think we should do this for ColumnsTransformation and ColumnsTransformationsWithTarget based classes. Tests would also be needed to be added of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

2 participants