You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
as suggested in a reddit discussion this issue documents my brief investigation and proof-of-concept port of the cq_cache, which is CadQuery plugin that caches the parts created by a Python function. To make this GH issue self-contained I'll include all the relevant code at the end of this message, and to save time I may copy/paste from the already mentioned Reddit thread.
The problem that made me think about caching is that adding bd_warehouse threads to a few holes on a part I was working on was taking more than 10 seconds to render, which quickly becomes annoying. I'm a build123d newbie, so it's entirely possible I was not using the threads in the most optimal manner. Anyway, I came across cq_cache, and decided to see if I could modify it to work with build123d. Turns out it's really straightforward even though I'm not familiar with build123d's internals. The result, a simple test and timing results can be seen below. The implementation shown is just a quck-and-dirty mod of the original code, and there are likely lots of bugs, missing and redundant code.
Now, is something like this really needed? I'm not sure TBH. Because build123d benefits from the full strength and the rich ecosystem of Python, there are probably plenty of other ways to work around the geometry construction slowness if it becomes a problem. Some, I can think of at the moment, are:
split the construction into different modules/functions, so that you can work on different facets of the design in isolation when possible
make the "slow" geometry conditional on level-of-detail flags. For example, if adding threads to holes is slow, you can have a boolean such as ADD_THREADS and if it's set to False only the holes are shown without the threads
in envirnoments such as jupyter notebooks, with partial execution/caching already built-in, additional caching layer would be redundant. I think in this case one can use Python's functools.cache as a Discord user commented about a week ago, or just assign the result to a variable in a preceding cell, and then shallow-copy it if there is need for more than one part instance.
I'm a CAD newbie and don't have enough 'data' to tell if there are situations where the proposed caching would be a clear winner. Apparently such situations, even if they exists, are quite rare. The fact that cq_cache itself is not updated for 3 or so years, even though it has plenty of scope for improvement, is telling.
My takeaway conclusion from this quick experiment is that such a persistent caching solution likely can't be entirely transparent to the user. That is, they can't just slap a '@build123d.cache' decorator before a part-making function and then forget all about it. The problem is that without some care the returned cache can become stale. In the case of build123d such caching solution would be likely most useful when iterating on a (complex/big/slow) design. Therefore I think it's crucial to have a robust way to detect changes in the code that must trigger cache invalidation. I wasn't completely successful in finding 100% foolproof method to detect changes in the cached function's code. A method based on inspect.getsource() looks promising and it worked when calling the script from the command line, but for some reason failed when cq-editor is used. Even if there is a good way to detect changes in decorated function's body, I believe it will not be 100% dependable unless the function is "pure" and doesn't depend on other(non-cached) functions and global variables. For example, my build123d scripts usually start with a parameters/constants section, and the subsequent code refers to them to parameterize the construction process. If a function whose result is cached refers to global variables to make a part and some of these parameters change, the currently considered simplistic approaches won't invalidate the cache, because the code of the function itself doesn't change. To be robust, the function must only use parameters that it receives as arguments. I can think of ways to semi-automate the passing of global state as function argument, but that won't be much less intrusive than using level-of-detail flags, for example, with the added disadvantage of being opaque to the user.
Sorry for the long write-up. As far as I understand the purpose of this GH issue is to serve as a reminder, so I wanted to make it as self-contained as possible, even at the risk of stating the obvious.
bxd_cache.py
"""Port of cq_cache for build123d"""importosimportsysimporttempfileimportinspectimportbase64importhashlibfromitertoolsimportchainfromfunctoolsimportwrapsfromOCP.BRepToolsimportBRepToolsfromOCP.BRepimportBRep_BuilderfromOCP.TopoDSimportTopoDS_Shapeimportbuild123dasbxdTEMPDIR_PATH=tempfile.gettempdir()
CACHE_DIR_NAME="build123d_geom_cache"CACHE_DIR_PATH=os.path.join(TEMPDIR_PATH, CACHE_DIR_NAME)
BXD_TYPES= [
bxd.Part,
bxd.Shape,
bxd.Solid,
bxd.Shell,
bxd.Compound,
bxd.Face,
bxd.Wire,
bxd.Edge,
bxd.Vertex,
bxd.Plane,
TopoDS_Shape,
]
ifCACHE_DIR_NAMEnotinos.listdir(TEMPDIR_PATH):
os.mkdir(CACHE_DIR_PATH)
#$ Doesn't quite work for our purposes b/c co_consts changes if there are, for## example list comprehensions in the body of the cached function, even if the## resulting list stays the same## def hash_fn(fn):## """## Creates function signature, to be able to detect when its code is modified## """## fn_enc = str(fn.__code__.co_consts).encode('utf-8') + fn.__code__.co_code## return hashlib.md5(fn_enc).hexdigest()# inspect doesn't work in cq-editor#def hash_fn(fn):# print(f'hashing "{fn.__name__}"', file=sys.stderr)# fn_src = inspect.getsource(fn)# return hashlib.md5(fn_src.encode('utf-8')).hexdigest()# Implementation suggested by Bing Copilotdefhash_fn(func):
# Combine bytecode and other relevant attributes of the functionbytecode=func.__code__.co_codenames=func.__code__.co_namesvarnames=func.__code__.co_varnamesconstants=func.__code__.co_consts# Create a hash objecthasher=hashlib.md5()
hasher.update(bytecode)
hasher.update(''.join(names).encode())
hasher.update(''.join(varnames).encode())
# Hash constants separately to minimize noiseforconstinconstants:
ifisinstance(const, (int, float, str, bytes)):
hasher.update(repr(const).encode())
returnhasher.hexdigest()
defimportBrep(file_path):
""" Import a boundary representation model Returns a TopoDS_Shape object """builder=BRep_Builder()
shape=TopoDS_Shape()
return_code=BRepTools.Read_s(shape, file_path, builder)
ifreturn_codeisFalse:
raiseValueError("Import failed, check file name")
returnshapedefget_cache_dir_size(cache_dir_path):
""" Returns size of the specified directory in bytes """total_size=0fordirpath, dirnames, filenamesinos.walk(cache_dir_path):
forfinfilenames:
fp=os.path.join(dirpath, f)
total_size+=os.path.getsize(fp)
returntotal_sizedefdelete_oldest_file(cache_dir_path):
""" When the cache directory size exceed the limit, this function is called deleting the oldest file of the cache """cwd=os.getcwd()
os.chdir(cache_dir_path)
files=sorted(os.listdir(os.getcwd()), key=os.path.getmtime)
oldest=files[0]
os.remove(os.path.join(cache_dir_path, oldest))
os.chdir(cwd)
defbuild_file_name(fct, *args, **kwargs):
""" Returns a file name given the specified function and args. If the function and the args are the same this function returns the same filename """# hash all relevant variableshasher=hashlib.md5()
forvalin [fct.__name__, repr(args), repr(kwargs)]:
hasher.update(bytes(val, "utf-8"))
# encode the hash as a filesystem safe stringhexdigest=hasher.hexdigest()
filename=fct.__name__+'_'+hexdigest## filename = base64.urlsafe_b64encode(hasher.digest()).decode("utf-8")## # strip the padding## return filename.rstrip("=")returnfilenamedefclear_cache():
""" Removes all the files from the cache """cache_size=get_cache_dir_size(CACHE_DIR_PATH)
forcache_fileinos.listdir(CACHE_DIR_PATH):
os.remove(os.path.join(CACHE_DIR_PATH, cache_file))
print(f"cache cleared to free {round(cache_size*1e-6,3)}MB ", file=sys.stderr)
defusing_same_function(fct, file_name):
""" Checks if this exact function call has been cached. Take care of the eventuality where the user cache a function but modify the body of the function afterwards. It assure that if the function has been modify, the cache won't load a wrong cached file """ifnotos.path.exists(file_name):
returnFalsewithopen(file_name, "r") asf:
cached_function_hash=f.readlines()[0].strip()
caching_function_hash=hash_fn(fct)
ifcached_function_hash==caching_function_hash:
returnTrueelse:
returnFalsedefreturn_right_wrapper(source, target_file):
""" Cast the TopoDS_Shape object loaded by importBrep as the right type that the original function is returning """withopen(target_file, "r") astf:
stored=tf.readlines()[-1]
target=next(xforxinBXD_TYPESifx.__name__==stored)
## if target == cq.Workplane:## shape = cq.Shape(source)## shape = cq.Workplane(obj=shape)## else:shape=target(source)
returnshapedefcache(cache_size=500, debug=False):
""" cache_size : Maximum cache memory in MB This function save the model created by the cached function as a BREP file and loads it if the cached function is called several time with the same arguments. Note that it is primarly made for caching function with simple types as argument. Objects passed as an argument with a __repr__ function that returns the same value for different object will fail without raising an error. If the __repr__ function returns different values for equivalent objects (which is the default behaviour of user defined classes) then the caching will be ineffective. """def_cache(function):
@wraps(function)defwrapper(*args, **kwargs):
file_name=build_file_name(function, *args, **kwargs)
file_path=os.path.join(CACHE_DIR_PATH, file_name)
same_function=using_same_function(function, file_path)
ifdebugandnotsame_function:
print(f"code change detected for {function.__name__}()",
file=sys.stderr)
iffile_nameinos.listdir(CACHE_DIR_PATH) andsame_function:
shape=importBrep(
os.path.join(CACHE_DIR_PATH, file_name+".brep")
)
ifdebug:
print(f'loading "{file_name}" from cache', file=sys.stderr)
returnreturn_right_wrapper(shape, file_path)
else:
shape=function(*args, **kwargs)
ifhasattr(shape, 'part'):
raiseTypeError(
'Please return builder.part instead of builder')
shape_type=type(shape)
ifshape_typenotinBXD_TYPES:
raiseTypeError(f"bxd_cache cannot wrap {shape_type} objects")
ifdebug:
print(f'saving "{file_name}" to cache', file=sys.stderr)
shape.export_brep(
os.path.join(CACHE_DIR_PATH, file_name) +".brep"
)
withopen(os.path.join(CACHE_DIR_PATH, file_name), "w") asfun_file:
fun_file.write(hash_fn(function))
fun_file.write("\n")
fun_file.write(shape_type.__name__)
cache_dir_size=get_cache_dir_size(CACHE_DIR_PATH)
ifdebug:
print(f'current cache dir size: 'f'{cache_dir_size/1e6}MB (max {cache_size}MB)',
file=sys.stderr)
while (cache_dir_size*1e-6) >cache_size:
delete_oldest_file(CACHE_DIR_PATH)
cache_dir_size=get_cache_dir_size(CACHE_DIR_PATH)
returnshapereturnwrapperreturn_cache
Elapsed time for "naive construction": 11.193s
Elapsed time for "pre-threaded blocks construction": 7.053s
loading "thrbox_block_cached_c421a9a6422544ed1158fde7a89fa6c2" from cache
Elapsed time for "partially cached construction": 4.794s
loading "thrbox_fully_cached_5d81c7dac0e07be6c1508d579703ad8c" from cache
Elapsed time for "fully cached construction": 0.028s
The text was updated successfully, but these errors were encountered:
Thank you for your detailed look at object caching. I've not looked into caching in detail but the Discord user "barnaby" posted the following:
#%% Importsfrombuild123dimport*importgridfinity_build123dasgfbfromocp_vscodeimport*importfunctools@functools.cachedefmemoized_base(units_x, units_y): # Arguments are used for dict keys and must be hashable!# gfb.Base is slow, taking about 6s to run for a 5x3 base!returngfb.Base(grid=((True,) *units_x,) *units_y,
features=(gfb.MagnetHole()),
align=(Align.MIN, Align.MIN, Align.MIN)
)
#%% PartwithBuildPart() aspart:
# Slow on the first execution, instant on subsequent executions with the same argumentsbase=add(memoized_base(3, 4))
# ...
However it's done, there should be a section in the docs describing these types of caching solutions.
@vdp note that bd_warehouse threads are designed as assemblies (Compound) to keep them small and fast. In a bd_warehouse Screw only one of the central thread segments is created, the rest are shallow copies, which improves speed/size dramatically. When using thread it's recommended that users also follow this approach and add thread to their parts by creating an assembly and not by fusing the thread into the part which is slow and often problematic at the CAD kernel level.
Hi,
as suggested in a reddit discussion this issue documents my brief investigation and proof-of-concept port of the cq_cache, which is CadQuery plugin that caches the parts created by a Python function. To make this GH issue self-contained I'll include all the relevant code at the end of this message, and to save time I may copy/paste from the already mentioned Reddit thread.
The problem that made me think about caching is that adding bd_warehouse threads to a few holes on a part I was working on was taking more than 10 seconds to render, which quickly becomes annoying. I'm a build123d newbie, so it's entirely possible I was not using the threads in the most optimal manner. Anyway, I came across cq_cache, and decided to see if I could modify it to work with build123d. Turns out it's really straightforward even though I'm not familiar with build123d's internals. The result, a simple test and timing results can be seen below. The implementation shown is just a quck-and-dirty mod of the original code, and there are likely lots of bugs, missing and redundant code.
Now, is something like this really needed? I'm not sure TBH. Because build123d benefits from the full strength and the rich ecosystem of Python, there are probably plenty of other ways to work around the geometry construction slowness if it becomes a problem. Some, I can think of at the moment, are:
split the construction into different modules/functions, so that you can work on different facets of the design in isolation when possible
make the "slow" geometry conditional on level-of-detail flags. For example, if adding threads to holes is slow, you can have a boolean such as ADD_THREADS and if it's set to False only the holes are shown without the threads
in envirnoments such as jupyter notebooks, with partial execution/caching already built-in, additional caching layer would be redundant. I think in this case one can use Python's functools.cache as a Discord user commented about a week ago, or just assign the result to a variable in a preceding cell, and then shallow-copy it if there is need for more than one part instance.
I'm a CAD newbie and don't have enough 'data' to tell if there are situations where the proposed caching would be a clear winner. Apparently such situations, even if they exists, are quite rare. The fact that cq_cache itself is not updated for 3 or so years, even though it has plenty of scope for improvement, is telling.
My takeaway conclusion from this quick experiment is that such a persistent caching solution likely can't be entirely transparent to the user. That is, they can't just slap a '@build123d.cache' decorator before a part-making function and then forget all about it. The problem is that without some care the returned cache can become stale. In the case of build123d such caching solution would be likely most useful when iterating on a (complex/big/slow) design. Therefore I think it's crucial to have a robust way to detect changes in the code that must trigger cache invalidation. I wasn't completely successful in finding 100% foolproof method to detect changes in the cached function's code. A method based on inspect.getsource() looks promising and it worked when calling the script from the command line, but for some reason failed when cq-editor is used. Even if there is a good way to detect changes in decorated function's body, I believe it will not be 100% dependable unless the function is "pure" and doesn't depend on other(non-cached) functions and global variables. For example, my build123d scripts usually start with a parameters/constants section, and the subsequent code refers to them to parameterize the construction process. If a function whose result is cached refers to global variables to make a part and some of these parameters change, the currently considered simplistic approaches won't invalidate the cache, because the code of the function itself doesn't change. To be robust, the function must only use parameters that it receives as arguments. I can think of ways to semi-automate the passing of global state as function argument, but that won't be much less intrusive than using level-of-detail flags, for example, with the added disadvantage of being opaque to the user.
Sorry for the long write-up. As far as I understand the purpose of this GH issue is to serve as a reminder, so I wanted to make it as self-contained as possible, even at the risk of stating the obvious.
bxd_cache.py
test_threads.py
timing results(old laptop)
The text was updated successfully, but these errors were encountered: