Skip to content

Conversation

@lingwei-gu
Copy link
Member

@lingwei-gu lingwei-gu commented Sep 30, 2025

This PR enables reasoning output.

Screenshot 2025-10-05 at 7 17 25 PM Screenshot 2025-10-05 at 7 17 38 PM

Copy link
Member

@ronakice ronakice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the code looks fine some comments, I will review again after you resolve. I think next time we should also try to break PRs up, 500 something lines is a bit harder to review (but okay this time :))

Comment on lines 166 to 167
# print(f"🔍 DEBUG LLM: API call completed successfully") # not
# removed because it's very helpful for debugging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think we should remove these even if it is helpful for debugging, coding practice wise it is for the developer to debug.
  2. We could have a --debug mode where we expose such things, but that is a separate PR. And we shouldn't be doing print within function methods, logging is a better alternative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see let me remove this then. We can do debug mode in another PR if needed

Comment on lines 170 to 171
# print(f"🔍 DEBUG LLM: Full response: {completion}") # not
# removed because it's very helpful for debugging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

response = reasoning_content if reasoning_content else ""
reasoning_content = message["reasoning_content"]
else:
print(f"No reasoning found in response from {self.model}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be printed? Feels spammy. if the model is a reasoning model and we hit here, we reach some error state and should warn. else this should not be printing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also debugging print. removed

Comment on lines 222 to 228
"qwen" in self.model.lower()
or "qwen2" in self.model.lower()
or "qwen3" in self.model.lower()
):
# Use cl100k_base for Qwen models as they typically use
# similar tokenization
encoding = tiktoken.get_encoding("cl100k_base")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not conflate this into this PR, if we don't know the model just don't include the encoding

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. removed

Comment on lines -96 to +109
if self.log_level >= 1:
self.logger.info(
f"Initialized Nuggetizer with models: {creator_model}, {scorer_model}, {assigner_model}"
)
if log_level >= 1:
self.logger.setLevel(logging.INFO)
if log_level >= 2:
self.logger.setLevel(logging.DEBUG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this code doing, we don't update log_level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can specify log level in the input. same debugging logs appear only if we specify log level to be 2

@lingwei-gu
Copy link
Member Author

Most of the code looks fine some comments, I will review again after you resolve. I think next time we should also try to break PRs up, 500 something lines is a bit harder to review (but okay this time :))

I see thanks for pointing out. I will break them into smaller PRs

@lingwei-gu lingwei-gu requested a review from ronakice October 24, 2025 21:31
Copy link
Member

@ronakice ronakice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you :)

@ronakice ronakice merged commit 7ca223b into castorini:main Oct 24, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants