Add missing docstrings #87

glenn-jocher · 2024-04-28T13:29:43Z

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhancements in JSON to YOLO conversion scripts for improved processing and documentation.

📊 Key Changes

Code Organization: Shifted the defaultdict import in general_json2yolo.py for cleaner code layout.
Function Descriptions: Added descriptive docstrings to functions across general_json2yolo.py and utils.py to clarify their purpose and usage.
Functionality Improvements: Modified the convert_ath_json function in general_json2yolo.py to include image resizing and data organizing for training, signaling a substantive enhancement in how JSON annotations are converted to YOLO format.
Utility Updates: Several utility functions in utils.py received updates, including new features for splitting datasets (split_rows_simple, split_files, split_indices), generating image lists (image_folder2file), augmenting datasets with COCO background (add_coco_background), dataset simplification (create_single_class_dataset), and folder structure optimization (flatten_recursive_folders). These changes overall aim to streamline the dataset preparation process for YOLO model training.

🎯 Purpose & Impact

Ease of Use: The addition of docstrings and comments makes it easier for both new and existing users to understand the purpose and function of each utility, enhancing the usability of the conversion tools.
Training Efficiency: Improvements in dataset handling, such as the ability to easily augment datasets with COCO backgrounds and the capability to flatten nested folders, can significantly speed up the preparation process for model training.
Flexibility: The newly added functionalities and enhancements provide users with more flexibility in managing and preprocessing their datasets for YOLO, potentially leading to better training outcomes and more efficient workflows.

These changes, while technical, contribute substantially towards making the JSON to YOLO conversion process more intuitive, documented, and efficient for users ranging from data scientists to AI hobbyists. 🚀

sourcery-ai

Hey @glenn-jocher - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟡 General issues: 5 issues found
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Complexity: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.}

sourcery-ai · 2024-04-28T13:31:23Z

utils.py

@@ -31,7 +31,7 @@ def exif_size(img):


 def split_rows_simple(file="../data/sm4/out.txt"):  # from utils import *; split_rows_simple()
-    # splits one textfile into 3 smaller ones based upon train, test, val ratios


suggestion (code_clarification): Consider specifying the default ratios in the docstring for clarity.

Including default ratios in the docstring can help users understand the function's behavior without needing to look at the code.

Suggested change

# splits one textfile into 3 smaller ones based upon train, test, val ratios

"""Splits a text file into train, test, and val files based on specified ratios.

Default ratios are train: 70%, test: 20%, val: 10%.

Expects a file path as input."""

sourcery-ai · 2024-04-28T13:31:23Z

utils.py

@@ -46,6 +46,7 @@


 def split_files(out_path, file_name, prefix_path=""):  # split training data
+    """Splits file names into separate train, test, and val datasets and writes them to prefixed paths."""


suggestion (code_clarification): Clarify in the docstring what 'file_name' should contain and its expected format.

The function's input parameters should be clearly defined to avoid confusion and potential misuse of the function.

Suggested change

"""Splits file names into separate train, test, and val datasets and writes them to prefixed paths."""

def split_files(out_path: str, file_name: List[str], prefix_path: str = "") -> None:

"""Splits a list of file names into separate train, test, and val datasets.

Args:

out_path (str): The output directory where the datasets will be written.

file_name (List[str]): A list of file names to be split.

prefix_path (str): An optional prefix to be added to each output path.

"""

sourcery-ai · 2024-04-28T13:31:23Z

utils.py

@@ -58,6 +59,7 @@


 def split_indices(x, train=0.9, test=0.1, validate=0.0, shuffle=True):  # split training data
+    """Splits array indices for train, test, and validate datasets according to specified ratios."""


issue (edge_case_not_handled): Consider handling the case where the sum of train, test, and validate ratios does not equal 1.

Adding a check to ensure the sum of the ratios equals 1 can prevent logical errors in dataset splitting.

sourcery-ai · 2024-04-28T13:31:23Z

utils.py

-    # write a txt file listing all imaged in folder
+    """Generates a txt file listing all images in a specified folder; usage: `image_folder2file('path/to/folder/')`."""
    s = glob.glob(f"{folder}*.*")
    with open(f"{folder[:-1]}.txt", "w") as file:


nitpick (typo): Correct the typo in the docstring from 'imaged' to 'images'.

sourcery-ai · 2024-04-28T13:31:23Z

general_json2yolo.py

@@ -138,7 +138,7 @@ def convert_vott_json(name, files, img_path):

 # Convert ath JSON file into YOLO-format labels --------------------------------
 def convert_ath_json(json_dir):  # dir contains json annotations and images
-    # Create folders


suggestion (code_clarification): Specify what 'ath' stands for in the docstring to improve clarity.

Clarifying acronyms and potentially unfamiliar terms in the documentation can make the codebase more accessible to new developers or external contributors.

Suggested change

# Create folders

"""Converts annotations from ATH (Assumed Term Here) JSON format to YOLO (You Only Look Once) format labels, resizes images, and organizes data for training in a machine learning model."""

sourcery-ai · 2024-04-28T13:31:23Z

general_json2yolo.py

@@ -138,7 +138,7 @@

 # Convert ath JSON file into YOLO-format labels --------------------------------
 def convert_ath_json(json_dir):  # dir contains json annotations and images


issue (code-quality): We've found these issues:

Don't assign to builtin variable dir (avoid-builtin-shadow)

Low code quality found in convert_ath_json - 7% (low-code-quality)

Explanation
Python has a number of builtin variables: functions and constants that
form a part of the language, such as list, getattr, and type
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:

list = [1, 2, 3]

However, this is considered poor practice.

It will confuse other developers.

It will confuse syntax highlighters and linters.

It means you can no longer use that builtin for its original purpose.

How can you solve this?

Rename the variable something more specific, such as integers.
In a pinch, my_list and similar names are colloquially-recognized
placeholders.
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.

Reduce nesting, perhaps by introducing guard clauses to return early.

Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.

glenn-jocher and others added 2 commits April 28, 2024 15:29

Add docstrings

e5bc50b

Auto-format by https://ultralytics.com/actions

91085d3

glenn-jocher merged commit 5d6b35f into master Apr 28, 2024

glenn-jocher deleted the docstrings branch April 28, 2024 13:30

sourcery-ai bot reviewed Apr 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing docstrings #87

Add missing docstrings #87

glenn-jocher commented Apr 28, 2024 •

edited by github-actions bot

Loading

sourcery-ai bot left a comment

sourcery-ai bot Apr 28, 2024

sourcery-ai bot Apr 28, 2024

sourcery-ai bot Apr 28, 2024

sourcery-ai bot Apr 28, 2024

sourcery-ai bot Apr 28, 2024

sourcery-ai bot Apr 28, 2024

		@@ -31,7 +31,7 @@ def exif_size(img):


		def split_rows_simple(file="../data/sm4/out.txt"): # from utils import *; split_rows_simple()
		# splits one textfile into 3 smaller ones based upon train, test, val ratios

-    # splits one textfile into 3 smaller ones based upon train, test, val ratios
+    """Splits a text file into train, test, and val files based on specified ratios.
+    Default ratios are train: 70%, test: 20%, val: 10%.
+    Expects a file path as input."""

		@@ -46,6 +46,7 @@


		def split_files(out_path, file_name, prefix_path=""): # split training data
		"""Splits file names into separate train, test, and val datasets and writes them to prefixed paths."""

-    """Splits file names into separate train, test, and val datasets and writes them to prefixed paths."""
+def split_files(out_path: str, file_name: List[str], prefix_path: str = "") -> None:
+    """Splits a list of file names into separate train, test, and val datasets.
+    Args:
+        out_path (str): The output directory where the datasets will be written.
+        file_name (List[str]): A list of file names to be split.
+        prefix_path (str): An optional prefix to be added to each output path.
+    """

		@@ -58,6 +59,7 @@


		def split_indices(x, train=0.9, test=0.1, validate=0.0, shuffle=True): # split training data
		"""Splits array indices for train, test, and validate datasets according to specified ratios."""

	# Create folders
	"""Converts annotations from ATH (Assumed Term Here) JSON format to YOLO (You Only Look Once) format labels, resizes images, and organizes data for training in a machine learning model."""

		@@ -138,7 +138,7 @@

		# Convert ath JSON file into YOLO-format labels --------------------------------
		def convert_ath_json(json_dir): # dir contains json annotations and images

Add missing docstrings #87

Add missing docstrings #87

Conversation

glenn-jocher commented Apr 28, 2024 • edited by github-actions bot Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Apr 28, 2024

Choose a reason for hiding this comment

sourcery-ai bot Apr 28, 2024

Choose a reason for hiding this comment

sourcery-ai bot Apr 28, 2024

Choose a reason for hiding this comment

sourcery-ai bot Apr 28, 2024

Choose a reason for hiding this comment

sourcery-ai bot Apr 28, 2024

Choose a reason for hiding this comment

sourcery-ai bot Apr 28, 2024

Choose a reason for hiding this comment

glenn-jocher commented Apr 28, 2024 •

edited by github-actions bot

Loading