Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing docstrings #87

Merged
merged 2 commits into from
Apr 28, 2024
Merged

Add missing docstrings #87

merged 2 commits into from
Apr 28, 2024

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Apr 28, 2024

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Enhancements in JSON to YOLO conversion scripts for improved processing and documentation.

πŸ“Š Key Changes

  • Code Organization: Shifted the defaultdict import in general_json2yolo.py for cleaner code layout.
  • Function Descriptions: Added descriptive docstrings to functions across general_json2yolo.py and utils.py to clarify their purpose and usage.
  • Functionality Improvements: Modified the convert_ath_json function in general_json2yolo.py to include image resizing and data organizing for training, signaling a substantive enhancement in how JSON annotations are converted to YOLO format.
  • Utility Updates: Several utility functions in utils.py received updates, including new features for splitting datasets (split_rows_simple, split_files, split_indices), generating image lists (image_folder2file), augmenting datasets with COCO background (add_coco_background), dataset simplification (create_single_class_dataset), and folder structure optimization (flatten_recursive_folders). These changes overall aim to streamline the dataset preparation process for YOLO model training.

🎯 Purpose & Impact

  • Ease of Use: The addition of docstrings and comments makes it easier for both new and existing users to understand the purpose and function of each utility, enhancing the usability of the conversion tools.
  • Training Efficiency: Improvements in dataset handling, such as the ability to easily augment datasets with COCO backgrounds and the capability to flatten nested folders, can significantly speed up the preparation process for model training.
  • Flexibility: The newly added functionalities and enhancements provide users with more flexibility in managing and preprocessing their datasets for YOLO, potentially leading to better training outcomes and more efficient workflows.

These changes, while technical, contribute substantially towards making the JSON to YOLO conversion process more intuitive, documented, and efficient for users ranging from data scientists to AI hobbyists. πŸš€

@glenn-jocher glenn-jocher merged commit 5d6b35f into master Apr 28, 2024
@glenn-jocher glenn-jocher deleted the docstrings branch April 28, 2024 13:30
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @glenn-jocher - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟑 General issues: 5 issues found
  • 🟒 Security: all looks good
  • 🟒 Testing: all looks good
  • 🟒 Complexity: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click πŸ‘ or πŸ‘Ž on each comment to tell me if it was helpful.

@@ -31,7 +31,7 @@ def exif_size(img):


def split_rows_simple(file="../data/sm4/out.txt"): # from utils import *; split_rows_simple()
# splits one textfile into 3 smaller ones based upon train, test, val ratios
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code_clarification): Consider specifying the default ratios in the docstring for clarity.

Including default ratios in the docstring can help users understand the function's behavior without needing to look at the code.

Suggested change
# splits one textfile into 3 smaller ones based upon train, test, val ratios
"""Splits a text file into train, test, and val files based on specified ratios.
Default ratios are train: 70%, test: 20%, val: 10%.
Expects a file path as input."""

@@ -46,6 +46,7 @@


def split_files(out_path, file_name, prefix_path=""): # split training data
"""Splits file names into separate train, test, and val datasets and writes them to prefixed paths."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code_clarification): Clarify in the docstring what 'file_name' should contain and its expected format.

The function's input parameters should be clearly defined to avoid confusion and potential misuse of the function.

Suggested change
"""Splits file names into separate train, test, and val datasets and writes them to prefixed paths."""
def split_files(out_path: str, file_name: List[str], prefix_path: str = "") -> None:
"""Splits a list of file names into separate train, test, and val datasets.
Args:
out_path (str): The output directory where the datasets will be written.
file_name (List[str]): A list of file names to be split.
prefix_path (str): An optional prefix to be added to each output path.
"""

@@ -58,6 +59,7 @@


def split_indices(x, train=0.9, test=0.1, validate=0.0, shuffle=True): # split training data
"""Splits array indices for train, test, and validate datasets according to specified ratios."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (edge_case_not_handled): Consider handling the case where the sum of train, test, and validate ratios does not equal 1.

Adding a check to ensure the sum of the ratios equals 1 can prevent logical errors in dataset splitting.

Comment on lines -98 to 100
# write a txt file listing all imaged in folder
"""Generates a txt file listing all images in a specified folder; usage: `image_folder2file('path/to/folder/')`."""
s = glob.glob(f"{folder}*.*")
with open(f"{folder[:-1]}.txt", "w") as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (typo): Correct the typo in the docstring from 'imaged' to 'images'.

@@ -138,7 +138,7 @@ def convert_vott_json(name, files, img_path):

# Convert ath JSON file into YOLO-format labels --------------------------------
def convert_ath_json(json_dir): # dir contains json annotations and images
# Create folders
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code_clarification): Specify what 'ath' stands for in the docstring to improve clarity.

Clarifying acronyms and potentially unfamiliar terms in the documentation can make the codebase more accessible to new developers or external contributors.

Suggested change
# Create folders
"""Converts annotations from ATH (Assumed Term Here) JSON format to YOLO (You Only Look Once) format labels, resizes images, and organizes data for training in a machine learning model."""

@@ -138,7 +138,7 @@

# Convert ath JSON file into YOLO-format labels --------------------------------
def convert_ath_json(json_dir): # dir contains json annotations and images
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:


ExplanationPython has a number of builtin variables: functions and constants that
form a part of the language, such as list, getattr, and type
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:

list = [1, 2, 3]

However, this is considered poor practice.

  • It will confuse other developers.
  • It will confuse syntax highlighters and linters.
  • It means you can no longer use that builtin for its original purpose.

How can you solve this?

Rename the variable something more specific, such as integers.
In a pinch, my_list and similar names are colloquially-recognized
placeholders.
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

  • Reduce the function length by extracting pieces of functionality out into
    their own functions. This is the most important thing you can do - ideally a
    function should be less than 10 lines.
  • Reduce nesting, perhaps by introducing guard clauses to return early.
  • Ensure that variables are tightly scoped, so that code using related concepts
    sits together within the function rather than being scattered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants