build123d caching POC #801

vdp · 2024-11-25T07:25:48Z

Hi,

as suggested in a reddit discussion this issue documents my brief investigation and proof-of-concept port of the cq_cache, which is CadQuery plugin that caches the parts created by a Python function. To make this GH issue self-contained I'll include all the relevant code at the end of this message, and to save time I may copy/paste from the already mentioned Reddit thread.

The problem that made me think about caching is that adding bd_warehouse threads to a few holes on a part I was working on was taking more than 10 seconds to render, which quickly becomes annoying. I'm a build123d newbie, so it's entirely possible I was not using the threads in the most optimal manner. Anyway, I came across cq_cache, and decided to see if I could modify it to work with build123d. Turns out it's really straightforward even though I'm not familiar with build123d's internals. The result, a simple test and timing results can be seen below. The implementation shown is just a quck-and-dirty mod of the original code, and there are likely lots of bugs, missing and redundant code.

Now, is something like this really needed? I'm not sure TBH. Because build123d benefits from the full strength and the rich ecosystem of Python, there are probably plenty of other ways to work around the geometry construction slowness if it becomes a problem. Some, I can think of at the moment, are:

split the construction into different modules/functions, so that you can work on different facets of the design in isolation when possible
make the "slow" geometry conditional on level-of-detail flags. For example, if adding threads to holes is slow, you can have a boolean such as ADD_THREADS and if it's set to False only the holes are shown without the threads
in envirnoments such as jupyter notebooks, with partial execution/caching already built-in, additional caching layer would be redundant. I think in this case one can use Python's functools.cache as a Discord user commented about a week ago, or just assign the result to a variable in a preceding cell, and then shallow-copy it if there is need for more than one part instance.

I'm a CAD newbie and don't have enough 'data' to tell if there are situations where the proposed caching would be a clear winner. Apparently such situations, even if they exists, are quite rare. The fact that cq_cache itself is not updated for 3 or so years, even though it has plenty of scope for improvement, is telling.

My takeaway conclusion from this quick experiment is that such a persistent caching solution likely can't be entirely transparent to the user. That is, they can't just slap a '@build123d.cache' decorator before a part-making function and then forget all about it. The problem is that without some care the returned cache can become stale. In the case of build123d such caching solution would be likely most useful when iterating on a (complex/big/slow) design. Therefore I think it's crucial to have a robust way to detect changes in the code that must trigger cache invalidation. I wasn't completely successful in finding 100% foolproof method to detect changes in the cached function's code. A method based on inspect.getsource() looks promising and it worked when calling the script from the command line, but for some reason failed when cq-editor is used. Even if there is a good way to detect changes in decorated function's body, I believe it will not be 100% dependable unless the function is "pure" and doesn't depend on other(non-cached) functions and global variables. For example, my build123d scripts usually start with a parameters/constants section, and the subsequent code refers to them to parameterize the construction process. If a function whose result is cached refers to global variables to make a part and some of these parameters change, the currently considered simplistic approaches won't invalidate the cache, because the code of the function itself doesn't change. To be robust, the function must only use parameters that it receives as arguments. I can think of ways to semi-automate the passing of global state as function argument, but that won't be much less intrusive than using level-of-detail flags, for example, with the added disadvantage of being opaque to the user.

Sorry for the long write-up. As far as I understand the purpose of this GH issue is to serve as a reminder, so I wanted to make it as self-contained as possible, even at the risk of stating the obvious.

bxd_cache.py

"""
Port of cq_cache for build123d
"""

import os
import sys
import tempfile
import inspect
import base64
import hashlib
from itertools import chain
from functools import wraps

from OCP.BRepTools import BRepTools
from OCP.BRep import BRep_Builder
from OCP.TopoDS import TopoDS_Shape

import build123d as bxd


TEMPDIR_PATH = tempfile.gettempdir()
CACHE_DIR_NAME = "build123d_geom_cache"
CACHE_DIR_PATH = os.path.join(TEMPDIR_PATH, CACHE_DIR_NAME)
BXD_TYPES = [
    bxd.Part,
    bxd.Shape,
    bxd.Solid,
    bxd.Shell,
    bxd.Compound,
    bxd.Face,
    bxd.Wire,
    bxd.Edge,
    bxd.Vertex,
    bxd.Plane,
    TopoDS_Shape,
]

if CACHE_DIR_NAME not in os.listdir(TEMPDIR_PATH):
    os.mkdir(CACHE_DIR_PATH)

#$ Doesn't quite work for our purposes b/c co_consts changes if there are, for
## example list comprehensions in the body of the cached function, even if the
## resulting list stays the same
## def hash_fn(fn):
##     """
##     Creates function signature, to be able to detect when its code is modified
##     """
##     fn_enc = str(fn.__code__.co_consts).encode('utf-8') + fn.__code__.co_code
##     return hashlib.md5(fn_enc).hexdigest()

# inspect doesn't work in cq-editor
#def hash_fn(fn):
#    print(f'hashing "{fn.__name__}"', file=sys.stderr)
#    fn_src = inspect.getsource(fn)
#    return hashlib.md5(fn_src.encode('utf-8')).hexdigest()


# Implementation suggested by Bing Copilot
def hash_fn(func):
    # Combine bytecode and other relevant attributes of the function
    bytecode = func.__code__.co_code
    names = func.__code__.co_names
    varnames = func.__code__.co_varnames
    constants = func.__code__.co_consts

    # Create a hash object
    hasher = hashlib.md5()
    hasher.update(bytecode)
    hasher.update(''.join(names).encode())
    hasher.update(''.join(varnames).encode())

    # Hash constants separately to minimize noise
    for const in constants:
        if isinstance(const, (int, float, str, bytes)):
            hasher.update(repr(const).encode())

    return hasher.hexdigest()



def importBrep(file_path):
    """
    Import a boundary representation model
    Returns a TopoDS_Shape object
    """
    builder = BRep_Builder()
    shape = TopoDS_Shape()
    return_code = BRepTools.Read_s(shape, file_path, builder)
    if return_code is False:
        raise ValueError("Import failed, check file name")
    return shape


def get_cache_dir_size(cache_dir_path):
    """
    Returns size of the specified directory in bytes
    """
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(cache_dir_path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size


def delete_oldest_file(cache_dir_path):
    """
    When the cache directory size exceed the limit, this function is called
    deleting the oldest file of the cache
    """
    cwd = os.getcwd()
    os.chdir(cache_dir_path)
    files = sorted(os.listdir(os.getcwd()), key=os.path.getmtime)
    oldest = files[0]
    os.remove(os.path.join(cache_dir_path, oldest))
    os.chdir(cwd)


def build_file_name(fct, *args, **kwargs):
    """
    Returns a file name given the specified function and args.
    If the function and the args are the same this function returns the same filename
    """

    # hash all relevant variables
    hasher = hashlib.md5()
    for val in [fct.__name__, repr(args), repr(kwargs)]:
        hasher.update(bytes(val, "utf-8"))
    # encode the hash as a filesystem safe string
    hexdigest = hasher.hexdigest()
    filename = fct.__name__ + '_' + hexdigest
    ## filename = base64.urlsafe_b64encode(hasher.digest()).decode("utf-8")
    ## # strip the padding
    ## return filename.rstrip("=")
    return filename


def clear_cache():
    """
    Removes all the files from the cache
    """
    cache_size = get_cache_dir_size(CACHE_DIR_PATH)
    for cache_file in os.listdir(CACHE_DIR_PATH):
        os.remove(os.path.join(CACHE_DIR_PATH, cache_file))
    print(f"cache cleared to free {round(cache_size*1e-6,3)}MB ", file=sys.stderr)


def using_same_function(fct, file_name):
    """
    Checks if this exact function call has been cached.
    Take care of the eventuality where the user cache a function but
    modify the body of the function afterwards.
    It assure that if the function has been modify, the cache won't load a wrong cached file
    """
    if not os.path.exists(file_name):
        return False

    with open(file_name, "r") as f:
        cached_function_hash = f.readlines()[0].strip()

    caching_function_hash = hash_fn(fct)
    if cached_function_hash == caching_function_hash:
        return True
    else:
        return False


def return_right_wrapper(source, target_file):
    """
    Cast the TopoDS_Shape object loaded by importBrep as the right type that the original function is returning
    """

    with open(target_file, "r") as tf:
        stored = tf.readlines()[-1]

    target = next(x for x in BXD_TYPES if x.__name__ == stored)

    ## if target == cq.Workplane:
    ##     shape = cq.Shape(source)
    ##     shape = cq.Workplane(obj=shape)
    ## else:
    shape = target(source)

    return shape


def cache(cache_size=500, debug=False):
    """
    cache_size : Maximum cache memory in MB
    This function save the model created by the cached function as a BREP file and
    loads it if the cached function is called several time with the same arguments.
    Note that it is primarly made for caching function with simple types as argument.
    Objects passed as an argument with a __repr__ function that returns the same value
    for different object will fail without raising an error. If the __repr__ function
    returns different values for equivalent objects (which is the default behaviour of
    user defined classes) then the caching will be ineffective.
    """

    def _cache(function):
        @wraps(function)
        def wrapper(*args, **kwargs):
            file_name = build_file_name(function, *args, **kwargs)
            file_path = os.path.join(CACHE_DIR_PATH, file_name)

            same_function = using_same_function(function, file_path)
            if debug and not same_function:
                print(f"code change detected for {function.__name__}()",
                      file=sys.stderr)
            if file_name in os.listdir(CACHE_DIR_PATH) and same_function:
                shape = importBrep(
                    os.path.join(CACHE_DIR_PATH, file_name + ".brep")
                )
                if debug:
                    print(f'loading "{file_name}" from cache', file=sys.stderr)
                return return_right_wrapper(shape, file_path)

            else:
                shape = function(*args, **kwargs)
                if hasattr(shape, 'part'):
                    raise TypeError(
                        'Please return builder.part instead of builder')
                shape_type = type(shape)
                if shape_type not in BXD_TYPES:
                    raise TypeError(f"bxd_cache cannot wrap {shape_type} objects")

                if debug:
                    print(f'saving "{file_name}" to cache', file=sys.stderr)

                shape.export_brep(
                    os.path.join(CACHE_DIR_PATH, file_name) + ".brep"
                )

                with open(os.path.join(CACHE_DIR_PATH, file_name), "w") as fun_file:
                    fun_file.write(hash_fn(function))
                    fun_file.write("\n")
                    fun_file.write(shape_type.__name__)

                cache_dir_size = get_cache_dir_size(CACHE_DIR_PATH)
                if debug:
                    print(f'current cache dir size: '
                          f'{cache_dir_size/1e6}MB (max {cache_size}MB)',
                          file=sys.stderr)
                while (cache_dir_size * 1e-6) > cache_size:
                    delete_oldest_file(CACHE_DIR_PATH)
                    cache_dir_size = get_cache_dir_size(CACHE_DIR_PATH)

                return shape

        return wrapper

    return _cache

test_threads.py

#!/usr/bin/env python3

"""
Demo for build123d caching
"""

import timeit
from contextlib import contextmanager
from copy import copy

from build123d import *
from bd_warehouse.thread import IsoThread

from bxd_cache import cache

@contextmanager
def timeit_context(id):
    start_time = timeit.default_timer()
    try:
        yield
    finally:
        end_time = timeit.default_timer()
        elapsed_time = end_time - start_time
        print(f'Elapsed time for "{id}": {elapsed_time:.3f}s')

iso_internal = IsoThread(
    major_diameter=6 * MM,
    pitch=1 * MM,
    length=5 * MM,
    external=False,
    end_finishes=("chamfer", "fade"),
    hand="right",
)

def thrbox_naive():
    with BuildPart() as threaded_box:
        with BuildSketch():
            Rectangle(26, 26)
            with Locations(RegularPolygon(8, 5).vertices()):
                Circle(iso_internal.major_diameter/2, mode=Mode.SUBTRACT)
        extrude(amount=iso_internal.length)

        btmface = faces().sort_by(Axis.Z)[0]
        with Locations([c.center(CenterOf.BOUNDING_BOX) for c in
                        btmface.edges().filter_by(GeomType.CIRCLE)]):
            add(iso_internal)

    return threaded_box.part


def thrbox_add_prethreaded():
    with BuildPart() as threaded_block:
       with BuildSketch():
           Rectangle(iso_internal.major_diameter+1,
                     iso_internal.major_diameter+1)
           Circle(iso_internal.major_diameter/2, mode=Mode.SUBTRACT)
       extrude(amount=iso_internal.length)
       add(iso_internal)

    with BuildPart() as threaded_box:
        with BuildSketch():
            Rectangle(26, 26)
            with Locations(RegularPolygon(8, 5).vertices()):
                Circle(iso_internal.major_diameter/2, mode=Mode.SUBTRACT)
        extrude(amount=iso_internal.length)

        btmface = faces().sort_by(Axis.Z)[0]
        with Locations([c.center(CenterOf.BOUNDING_BOX) for c in
                        btmface.edges().filter_by(GeomType.CIRCLE)]):
            add(threaded_block)
            #add(copy(threaded_block))  # slower

    return threaded_box.part

@cache(debug=True)
def thrbox_block_cached():
    with BuildPart() as threaded_block:
       with BuildSketch():
           Rectangle(iso_internal.major_diameter+1,
                     iso_internal.major_diameter+1)
           Circle(iso_internal.major_diameter/2, mode=Mode.SUBTRACT)
       extrude(amount=iso_internal.length)
       add(iso_internal)

    return threaded_block.part

def thrbox_partial_cache():
    threaded_block = thrbox_block_cached()

    with BuildPart() as threaded_box:
        with BuildSketch():
            Rectangle(26, 26)
            with Locations(RegularPolygon(8, 5).vertices()):
                Circle(iso_internal.major_diameter/2, mode=Mode.SUBTRACT)
        extrude(amount=iso_internal.length)

        btmface = faces().sort_by(Axis.Z)[0]
        with Locations([c.center(CenterOf.BOUNDING_BOX) for c in
                        btmface.edges().filter_by(GeomType.CIRCLE)]):
            add(threaded_block)
            #add(copy(threaded_block))

    return threaded_box.part


@cache(debug=True)
def thrbox_fully_cached():
    threaded_block = thrbox_block_cached()

    with BuildPart() as threaded_box:
        with BuildSketch():
            Rectangle(26, 26)
            with Locations(RegularPolygon(8, 5).vertices()):
                Circle(iso_internal.major_diameter/2, mode=Mode.SUBTRACT)
        extrude(amount=iso_internal.length)

        btmface = faces().sort_by(Axis.Z)[0]
        with Locations([c.center(CenterOf.BOUNDING_BOX) for c in
                        btmface.edges().filter_by(GeomType.CIRCLE)]):
            add(threaded_block)
            #add(copy(threaded_block)) # actually slower

    return threaded_box.part

with timeit_context("naive construction"):
    part = thrbox_naive()
show_object(part.locate(Location((-35, 0, 0))), options={'alpha': .5})

with timeit_context("pre-threaded blocks construction"):
    part = thrbox_add_prethreaded()
show_object(part, options={'alpha': .5})

with timeit_context("partially cached construction"):
    part = thrbox_partial_cache()
show_object(part.locate(Location((35, 0, 0))), options={'alpha': .5})

with timeit_context("fully cached construction"):
    part = thrbox_fully_cached()
show_object(part.locate(Location((0, 35, 0))), options={'alpha': .5})

timing results(old laptop)

Elapsed time for "naive construction": 11.193s
Elapsed time for "pre-threaded blocks construction": 7.053s
loading "thrbox_block_cached_c421a9a6422544ed1158fde7a89fa6c2" from cache
Elapsed time for "partially cached construction": 4.794s
loading "thrbox_fully_cached_5d81c7dac0e07be6c1508d579703ad8c" from cache
Elapsed time for "fully cached construction": 0.028s

The text was updated successfully, but these errors were encountered:

gumyr · 2024-11-25T14:11:32Z

Thank you for your detailed look at object caching. I've not looked into caching in detail but the Discord user "barnaby" posted the following:

#%% Imports
from build123d import *
import gridfinity_build123d as gfb
from ocp_vscode import *
import functools

@functools.cache
def memoized_base(units_x, units_y):  # Arguments are used for dict keys and must be hashable!
  # gfb.Base is slow, taking about 6s to run for a 5x3 base!
  return gfb.Base(grid=((True,) * units_x,) * units_y,
    features=(gfb.MagnetHole()),
    align=(Align.MIN, Align.MIN, Align.MIN)
  )

#%% Part
with BuildPart() as part:
  # Slow on the first execution, instant on subsequent executions with the same arguments
  base = add(memoized_base(3, 4))
  # ...

However it's done, there should be a section in the docs describing these types of caching solutions.

@vdp note that bd_warehouse threads are designed as assemblies (Compound) to keep them small and fast. In a bd_warehouse Screw only one of the central thread segments is created, the rest are shallow copies, which improves speed/size dramatically. When using thread it's recommended that users also follow this approach and add thread to their parts by creating an assembly and not by fusing the thread into the part which is slow and often problematic at the CAD kernel level.

gumyr added the enhancement New feature or request label Nov 25, 2024

gumyr added this to the Not Gating Release 1.0.0 milestone Nov 25, 2024

gumyr added the documentation Improvements or additions to documentation label Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build123d caching POC #801

build123d caching POC #801

vdp commented Nov 25, 2024 •

edited

Loading

gumyr commented Nov 25, 2024

build123d caching POC #801

build123d caching POC #801

Comments

vdp commented Nov 25, 2024 • edited Loading

gumyr commented Nov 25, 2024

vdp commented Nov 25, 2024 •

edited

Loading