Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added normalisation and unit test cases #118

Open
wants to merge 2 commits into
base: release/0.9
Choose a base branch
from

Conversation

sritha272
Copy link

Description

Implemented multiple data transformation functions (normalize, scale, clipping, exponential, standardize, z_score_normalize) to enhance the framework's data processing capabilities. These functions are equipped with comprehensive unit tests to ensure correctness and handle edge cases.
Breakdown of Each Implemented Function

  • Normalize
    Scales data to a specified range [min_value, max_value].
    Includes edge case handling for zero range and empty data.

  • Scale
    Scales data by a specified multiplier.
    Useful for linear scaling transformations.

  • Clipping
    Clips data to fall within a specified range [min_value, max_value].
    Prevents extreme outliers in datasets.

  • Exponential Transformation
    Applies an exponential transformation to data with a specified base.
    Handles exponential growth scenarios effectively.

  • Standardize
    Standardizes data to have a mean of 0 and a standard deviation of 1.
    Includes custom mean and standard deviation parameters.

  • Z-Score Normalize
    Computes z-scores for data points for standardization.
    Handles mixed positive and negative datasets effectively.

Related Issue

Motivation and Context

The newly added functions provide robust data normalization and transformation capabilities, which are critical for preparing data for machine learning, statistical analysis, and other computational tasks. These functions solve the problem of inconsistent data scaling and ensure uniform preprocessing pipelines.

How Has This Been Tested?

  • Unit Tests: Added unit tests for each function:
    Verified outputs for standard, edge, and invalid inputs.
    Tests include large numbers, small numbers, mixed data types, and edge cases like empty datasets or identical values.

  • Environment: Testing performed on:
    Python 3.12
    OS: Windows 11

  • Commands:
    Ran python -m unittest discover tests to ensure all test cases passed successfully.
    Validated compatibility with existing project components.

Screenshots (if appropriate):

image image

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@sritha272 sritha272 requested a review from a team as a code owner November 24, 2024 11:37
@mikita-sakalouski mikita-sakalouski changed the base branch from main to release/0.9 November 24, 2024 11:38
@mikita-sakalouski mikita-sakalouski added the enhancement New feature or request label Nov 24, 2024
@mikita-sakalouski mikita-sakalouski added this to the 0.9.0 milestone Nov 24, 2024
@mikita-sakalouski
Copy link
Contributor

Can you make these functionality based on Transformation step and also put it to the correct module ?

Copy link
Contributor

@mikita-sakalouski mikita-sakalouski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

@dannymeijer
Copy link
Member

Thank you for your contribution @sritha272 , really appreciate it.

I do agree with Mikita about the placement of the modules that you chose for this one. Also, can you explain the intended use for this a bit more? Would this be for ML usescases with the input being something like pandas perhaps? Or did you have something else in mind.

I propose that we have a small meetup to discuss, as I would love to add your contribution to our library.

Copy link
Member

@dannymeijer dannymeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my earlier comment

@dannymeijer dannymeijer removed this from the 0.9.0 milestone Nov 26, 2024
@dannymeijer
Copy link
Member

Please also see: #129

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked enhancement New feature or request
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants