Skip to content

Commit

Permalink
Fix logging the number of generated samples for each leaf node
Browse files Browse the repository at this point in the history
We were logging the length of our `generated_data` list instead of the
number of newly generated samples in each leaf node, causing confusion
as this is typically 1, 2, 3, etc instead of 50, 500, 1000, etc that
users would expect.

Fixes #227

Signed-off-by: Ben Browning <[email protected]>
  • Loading branch information
bbrowning committed Dec 10, 2024
1 parent dcbabc5 commit 20808b5
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions src/instructlab/sdg/generate_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -426,8 +426,8 @@ def generate_data(
continue
generated_data.append(new_generated_data)

logger.info("Generated %d samples", len(generated_data))
logger.debug("Generated data: %s", generated_data)
logger.info("Generated %d samples", len(new_generated_data))
logger.debug("Generated data: %s", new_generated_data)

if is_knowledge:
# generate mmlubench data for the current leaf node
Expand Down

0 comments on commit 20808b5

Please sign in to comment.