-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Needs thorough testing] async model file listing #4968
base: master
Are you sure you want to change the base?
Conversation
"refactor right before committing" the whiskey said, "it'll be fine" it said. woopsie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do a separate function impl instead of modifying existing resursive_search
function? The signature should be async resursive_search
and make the threading handled by the function caller.
folder_paths.py
Outdated
@@ -263,19 +279,41 @@ def cached_filename_list_(folder_name: str) -> tuple[list[str], dict[str, float] | |||
if folder_name not in filename_list_cache: | |||
return None | |||
out = filename_list_cache[folder_name] | |||
must_invalidate = [False] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the first element of must_invalidate
is accessed. Is there any reason why it needs to be an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python lets you read any variable anywhere very freely, but writing has a lot of edge cases and oddities, which hit you when you're doing threading stuff especially. But if you have an array, then it's just a read you're doing in the thread, and the value inside the array is somewhere in the heap down yonder, so it lets you do it. So the array is being used as the equivalent to a pointer/reference essentially
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems really hacky. Please try use threading.Event
to approach this problem.
import asyncio
import aiofiles
import threading
from concurrent.futures import ThreadPoolExecutor
async def check_folder_mtime(folder: str, time_modified: float) -> bool:
return await aiofiles.os.path.getmtime(folder) != time_modified
async def check_new_dirs(x: str, known_dirs: set) -> bool:
return await aiofiles.os.path.isdir(x) and x not in known_dirs
async def check_invalidation(folder_names_and_paths, folder_name):
folders = folder_names_and_paths[folder_name]
out = folders[1]
invalidation_event = threading.Event()
async def process_checks():
tasks = []
for x, time_modified in out[1].items():
tasks.append(check_folder_mtime(x, time_modified))
for x in folders[0]:
tasks.append(check_new_dirs(x, set(out[1])))
results = await asyncio.gather(*tasks)
if any(results):
invalidation_event.set()
with ThreadPoolExecutor() as executor:
future = executor.submit(lambda: asyncio.run(process_checks()))
future.result() # Wait for the async operations to complete
return None if invalidation_event.is_set() else out # Return None if invalidation is needed
# Usage
result = asyncio.run(check_invalidation(folder_names_and_paths, folder_name))
It replaces the must_invalidate list hack with a threading.Event object, which is designed for inter-thread communication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough, swapped to Event
|
|
the default is 32 limited by cpu core count, but we don't really care about core count here because this is for io, so just use 32
otherwise it can lock itself due to the very awkward hack of python threading instead of genuine async handling
Testing notes:
Improvement notes:
|
This is a messy proof of concept: using aiofiles, handle model listing and cache validation with async parallelization, to make high-latency drives list models fast.
I tested by running ComfyUI in Tokyo accessing a model folder in Los Angeles via SMB.
Key timings:
object_info
All times above are +/- a pretty wide range, latency was not constant, but an order of magnitude smaller than most of those jumps.
This code undoubtedly has side effects (I haven't tested symlinks for example), not to mention requires a new dependency, and adds significant complexity. So the main question is, is the case of high-latency drive reads worth the trouble of merging and maintaining this messier model listing code?
I will be building and submitting a separate, more easily agreeable PR, to buff up the caching code soon