Skip to content

Conversation

@BAIKEMARK
Copy link
Owner

This pull request refactors the file hashing and database update logic to improve thread safety and maintainability. The main change is the extraction of database update operations into a dedicated, thread-safe function, and the adjustment of timeouts and thread pool configuration for better performance.

Thread safety and code organization:

  • Added a new function update_hash_in_db to handle writing hash results to the database in a thread-safe manner, isolating database operations from the hashing worker logic. (utils.py)
  • Refactored the hashing workflow to call update_hash_in_db from within the thread pool, removing direct database access from worker threads and simplifying error handling. (utils.py)

Performance and reliability improvements:

  • Increased the single file hash timeout from 60 to 90 seconds to reduce the likelihood of hash operations timing out on large files. (utils.py)
  • Adjusted the thread pool configuration to use all available CPUs instead of half, potentially improving hashing throughput. (utils.py)

Copilot AI review requested due to automatic review settings October 17, 2025 02:13
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refactors the file hashing and database update logic to improve thread safety and maintainability by extracting database operations into a dedicated function and optimizing performance configurations.

  • Extracted database update operations into a thread-safe update_hash_in_db function
  • Increased single file hash timeout from 60 to 90 seconds
  • Changed thread pool to use all available CPUs instead of half

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

utils.py Outdated
with ThreadPoolExecutor(max_workers=(os.cpu_count()/2 or 4)) as executor:
# 创建所有future
futures = {executor.submit(hash_worker, f): f for f in files_to_hash}
with ThreadPoolExecutor(max_workers=(os.cpu_count() or 4)) as executor:
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using all available CPUs for hashing may starve other system processes. Consider using min(os.cpu_count() or 4, 8) to cap the maximum number of workers, especially since the previous implementation used half the CPUs for a reason.

Suggested change
with ThreadPoolExecutor(max_workers=(os.cpu_count() or 4)) as executor:
with ThreadPoolExecutor(max_workers=min(os.cpu_count() or 4, 8)) as executor:

Copilot uses AI. Check for mistakes.
它在自己的上下文中获取数据库连接。
"""
if not file_info or not file_info.get("hash"):
return
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function returns None on invalid input but True/False for success/failure cases. This inconsistent return type could confuse callers. Consider returning False for invalid input to maintain consistent boolean return values.

Suggested change
return
return False

Copilot uses AI. Check for mistakes.
Comment on lines +725 to +726
res['model_type'] = model_type
if update_hash_in_db(res):
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutating the res dictionary in-place could have side effects if the same dictionary is used elsewhere. Consider creating a copy or passing model_type as a separate parameter to update_hash_in_db.

Suggested change
res['model_type'] = model_type
if update_hash_in_db(res):
new_res = {**res, 'model_type': model_type}
if update_hash_in_db(new_res):

Copilot uses AI. Check for mistakes.
@BAIKEMARK BAIKEMARK merged commit bb0de33 into main Oct 17, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants