Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancements for Error Handling and Regex Operation Optimization in Distributed Tensor Loading #220

Open
Madhav-MKNC opened this issue Mar 19, 2024 · 0 comments · May be fixed by #221
Open

Comments

@Madhav-MKNC
Copy link

Description

The tensor loading process, specifically within ThreadPoolExecutor and regex operations in get_load_path_str, requires enhancements to improve error handling and efficiency.

Enhanced Error Handling in ThreadPoolExecutor:

Current behavior lacks detailed error information when futures fail, making debugging difficult.
Suggested enhancement involves catching exceptions within futures to log detailed failure information, including the specific tensor that failed to load.

Regex Operation Optimization:

The repeated use of regex in get_load_path_str for renaming and exclusion is computationally expensive.
Proposed improvement involves introducing caching for regex operation results to avoid unnecessary recomputation, thereby improving performance.

These enhancements are critical for maintaining robust and efficient tensor loading in distributed computing environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant