-
-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add missing docstrings #87
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @glenn-jocher - I've reviewed your changes and they look great!
Here's what I looked at during the review
- π‘ General issues: 5 issues found
- π’ Security: all looks good
- π’ Testing: all looks good
- π’ Complexity: all looks good
Help me be more useful! Please click π or π on each comment to tell me if it was helpful.
@@ -31,7 +31,7 @@ def exif_size(img): | |||
|
|||
|
|||
def split_rows_simple(file="../data/sm4/out.txt"): # from utils import *; split_rows_simple() | |||
# splits one textfile into 3 smaller ones based upon train, test, val ratios |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code_clarification): Consider specifying the default ratios in the docstring for clarity.
Including default ratios in the docstring can help users understand the function's behavior without needing to look at the code.
# splits one textfile into 3 smaller ones based upon train, test, val ratios | |
"""Splits a text file into train, test, and val files based on specified ratios. | |
Default ratios are train: 70%, test: 20%, val: 10%. | |
Expects a file path as input.""" |
@@ -46,6 +46,7 @@ | |||
|
|||
|
|||
def split_files(out_path, file_name, prefix_path=""): # split training data | |||
"""Splits file names into separate train, test, and val datasets and writes them to prefixed paths.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code_clarification): Clarify in the docstring what 'file_name' should contain and its expected format.
The function's input parameters should be clearly defined to avoid confusion and potential misuse of the function.
"""Splits file names into separate train, test, and val datasets and writes them to prefixed paths.""" | |
def split_files(out_path: str, file_name: List[str], prefix_path: str = "") -> None: | |
"""Splits a list of file names into separate train, test, and val datasets. | |
Args: | |
out_path (str): The output directory where the datasets will be written. | |
file_name (List[str]): A list of file names to be split. | |
prefix_path (str): An optional prefix to be added to each output path. | |
""" |
@@ -58,6 +59,7 @@ | |||
|
|||
|
|||
def split_indices(x, train=0.9, test=0.1, validate=0.0, shuffle=True): # split training data | |||
"""Splits array indices for train, test, and validate datasets according to specified ratios.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (edge_case_not_handled): Consider handling the case where the sum of train, test, and validate ratios does not equal 1.
Adding a check to ensure the sum of the ratios equals 1 can prevent logical errors in dataset splitting.
# write a txt file listing all imaged in folder | ||
"""Generates a txt file listing all images in a specified folder; usage: `image_folder2file('path/to/folder/')`.""" | ||
s = glob.glob(f"{folder}*.*") | ||
with open(f"{folder[:-1]}.txt", "w") as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick (typo): Correct the typo in the docstring from 'imaged' to 'images'.
general_json2yolo.py
Outdated
@@ -138,7 +138,7 @@ def convert_vott_json(name, files, img_path): | |||
|
|||
# Convert ath JSON file into YOLO-format labels -------------------------------- | |||
def convert_ath_json(json_dir): # dir contains json annotations and images | |||
# Create folders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code_clarification): Specify what 'ath' stands for in the docstring to improve clarity.
Clarifying acronyms and potentially unfamiliar terms in the documentation can make the codebase more accessible to new developers or external contributors.
# Create folders | |
"""Converts annotations from ATH (Assumed Term Here) JSON format to YOLO (You Only Look Once) format labels, resizes images, and organizes data for training in a machine learning model.""" |
general_json2yolo.py
Outdated
@@ -138,7 +138,7 @@ | |||
|
|||
# Convert ath JSON file into YOLO-format labels -------------------------------- | |||
def convert_ath_json(json_dir): # dir contains json annotations and images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Don't assign to builtin variable
dir
(avoid-builtin-shadow
) - Low code quality found in convert_ath_json - 7% (
low-code-quality
)
Explanation
Python has a number ofbuiltin
variables: functions and constants thatform a part of the language, such as
list
, getattr
, and type
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:
list = [1, 2, 3]
However, this is considered poor practice.
- It will confuse other developers.
- It will confuse syntax highlighters and linters.
- It means you can no longer use that builtin for its original purpose.
How can you solve this?
Rename the variable something more specific, such as integers
.
In a pinch, my_list
and similar names are colloquially-recognized
placeholders.
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines. - Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.
π οΈ PR Summary
Made with β€οΈ by Ultralytics Actions
π Summary
Enhancements in JSON to YOLO conversion scripts for improved processing and documentation.
π Key Changes
defaultdict
import ingeneral_json2yolo.py
for cleaner code layout.general_json2yolo.py
andutils.py
to clarify their purpose and usage.convert_ath_json
function ingeneral_json2yolo.py
to include image resizing and data organizing for training, signaling a substantive enhancement in how JSON annotations are converted to YOLO format.utils.py
received updates, including new features for splitting datasets (split_rows_simple
,split_files
,split_indices
), generating image lists (image_folder2file
), augmenting datasets with COCO background (add_coco_background
), dataset simplification (create_single_class_dataset
), and folder structure optimization (flatten_recursive_folders
). These changes overall aim to streamline the dataset preparation process for YOLO model training.π― Purpose & Impact
These changes, while technical, contribute substantially towards making the JSON to YOLO conversion process more intuitive, documented, and efficient for users ranging from data scientists to AI hobbyists. π