-
Notifications
You must be signed in to change notification settings - Fork 6
Dev #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request refactors the file hashing and database update logic to improve thread safety and maintainability by extracting database operations into a dedicated function and optimizing performance configurations.
- Extracted database update operations into a thread-safe
update_hash_in_dbfunction - Increased single file hash timeout from 60 to 90 seconds
- Changed thread pool to use all available CPUs instead of half
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
utils.py
Outdated
| with ThreadPoolExecutor(max_workers=(os.cpu_count()/2 or 4)) as executor: | ||
| # 创建所有future | ||
| futures = {executor.submit(hash_worker, f): f for f in files_to_hash} | ||
| with ThreadPoolExecutor(max_workers=(os.cpu_count() or 4)) as executor: |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using all available CPUs for hashing may starve other system processes. Consider using min(os.cpu_count() or 4, 8) to cap the maximum number of workers, especially since the previous implementation used half the CPUs for a reason.
| with ThreadPoolExecutor(max_workers=(os.cpu_count() or 4)) as executor: | |
| with ThreadPoolExecutor(max_workers=min(os.cpu_count() or 4, 8)) as executor: |
| 它在自己的上下文中获取数据库连接。 | ||
| """ | ||
| if not file_info or not file_info.get("hash"): | ||
| return |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function returns None on invalid input but True/False for success/failure cases. This inconsistent return type could confuse callers. Consider returning False for invalid input to maintain consistent boolean return values.
| return | |
| return False |
| res['model_type'] = model_type | ||
| if update_hash_in_db(res): |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutating the res dictionary in-place could have side effects if the same dictionary is used elsewhere. Consider creating a copy or passing model_type as a separate parameter to update_hash_in_db.
| res['model_type'] = model_type | |
| if update_hash_in_db(res): | |
| new_res = {**res, 'model_type': model_type} | |
| if update_hash_in_db(new_res): |
This pull request refactors the file hashing and database update logic to improve thread safety and maintainability. The main change is the extraction of database update operations into a dedicated, thread-safe function, and the adjustment of timeouts and thread pool configuration for better performance.
Thread safety and code organization:
update_hash_in_dbto handle writing hash results to the database in a thread-safe manner, isolating database operations from the hashing worker logic. (utils.py)update_hash_in_dbfrom within the thread pool, removing direct database access from worker threads and simplifying error handling. (utils.py)Performance and reliability improvements:
utils.py)utils.py)