-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixer validator #215
Mixer validator #215
Conversation
… everything in place
…ee if both files and jq expressions are valid
…d checking their content
…ontain correct fields and same anount of lines
…ee if they work or fail
… to a temp folder
…red main, added logic to download sample files and work with them locally
You're going to need to add this file's dependencies to pyproject.toml in order for it to run in a clean environment |
…configs to test folder, adding a couple of helpers functions
Addressed this one and added dependencies |
The warnings produce a lot of noise, it'd be good to accept a 'verbose' flag and only log the warnings if it's set to true. |
scripts/validate_mixer.py
Outdated
print(f"File path type: {type(file_path)}") | ||
return None | ||
|
||
def evaluate_comparison(value, op, comparison_value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!= ? Any of the other weirder operators https://www.w3schools.com/python/python_operators.asp ?
Added --verbose flag and hid most of the print statements in it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should delete the old validate_mixer.py script but otherwise looks good!
This validator ensures that the Mixer job configuration is correct, data is properly aligned, and filters are valid before starting the actual Mixer job.
Configuration Validation
S3 Path and Permission Validation
Stream Filter Validation
Document and Attribute Alignment
- Checks that document and attribute files have the same number of lines
- Verifies that both document and attribute files are valid JSONL
- Ensures required fields are present in both document and attribute files
Filter Execution Simulation
Attribute Name Validation
File Sampling and Analysis
Reporting and Logging
Error Handling and Cleanup