- [12/17/2024]: Release the
<instruction, bpy script>
dataset: BlenderNet and the benchmark: CADBench. - [12/17/2024]: Release the model weights of BlenderLLM.
- [12/16/2024]: Release the tech report.
Welcome to the repository of BlenderLLM. BlenderLLM is a large language model specifically designed to generate CAD scripts based on user instructions. These scripts are then executed in Blender to render 3D models.
Here is a list of what has been released:
-
BlendNet: A high-quality dataset containing
$12k$ <instruction, CAD script>
pairs. - BlenderLLM: A large language model fine-tuned on BlendNet based on Qwen2.5-Coder-7B-Instruct, designed to output CAD scripts.
- CADBench: A comprehensive benchmark for evaluating this task.
-
To address the challenges posed by the complexity of input forms in CAD applications. We recognize that the high threshold for use limits accessibility, and we believe that user-friendly interfaces and simplified input methods are essential to encourage wider adoption of CAD-oriented LLMs.
-
To provide high-quality, domain-specific datasets for training CAD-oriented LLMs. Building datasets that capture the intricate nuances of CAD design is critical, yet challenging. Efforts to create and share such datasets will significantly enhance the ability of LLMs to understand and perform CAD tasks effectively.
-
To ensure accessibility, local deployment, and privacy preservation through open-source CAD-oriented LLMs. By developing and releasing open-source models, we aim to democratize access to advanced tools, empower localized and secure deployments, and support diverse user needs in the CAD domain.
-
To emphasize the importance of a comprehensive evaluation framework for CAD-oriented LLMs. Establishing rigorous evaluation methodologies is vital to assess and improve model performance, ensuring robust, reliable, and practical solutions for CAD applications.
The dataset contains
To ensure diversity, we categorized objects into 16 types, classified instructions into 8 tones, and varied the lengths of the instructions. The figure below illustrate the diversity distribution.
The figure below illustrates the complexity of tasks in the dataset, demonstrating task difficulty using the metrics—Unit Number, Parameter Density, and Entropy—which reflect geometric complexity, parameter intricacy, and spatial diversity.
Click here to view the samples and download the BlendNet.
Model | Backbone | Link |
---|---|---|
BlenderLLM | Qwen2.5-Coder-7B-Instruct | Model Weights |
Firstly, you should install all required packages:
pip install -r requirements.txt
Make sure you have installed Blender and set its executable path. You can test if Blender is installed by running the following command:
blender --version
If Blender is not installed, download it from the official Blender website and ensure its executable is accessible via system paths.
🔔 Please make sure you have downloaded our model weights.
If you only want to chat with BlenderLLM, please run:
python chat.py \
--model_name "$MODEL_NAME" \
--prompt "$PROMPT"
If you want to chat with BlenderLLM and execute scripts to render images, please run:
python modeling.py \
--model_name "$MODEL_NAME" \
--prompt "$PROMPT" \
--obj_name "$OBJ_NAME" \
--output_folder "$OUTPUT_FOLDER" \
--blender_executable "$BLENDER_EXECUTABLE" \
--brightness "$BRIGHTNESS"
-
--blender_executable
: Ensure you provide the correct path to the Blender executable, for example:- On Windows:
C:\Program Files\Blender Foundation\Blender\blender.exe
- On macOS/Linux:
/usr/bin/blender
or/usr/local/bin/blender
- On Windows:
-
Blender Dependency:
Blender is required for executing scripts and rendering images. If you skip this, you will only be able to use the chat feature.
We developed a comprehensive benchmark, CADBench, to evaluate the ability of LLMs to generate CAD scripts. It contains 500 simulated data samples and 200 data samples collected from online forums.
Each sample is assessed using specific multi-dimensional criteria. The figure below illustrates the dimensions of the criteria for each sample and the average number of criteria per dimension.
Click here to view the samples and download the CADBench.
We utilized GPT-4o
to evaluate LLMs on CADBench, and the evaluation results are shown in the table below.
CADBench-Sim | CADBench-Wild | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Models |
|
|
|
|
|
|
|
|
|
|
BlenderLLM | 0.846 | 0.760 | 0.638 | 0.748 ± 0.085 | 3.4% | 0.739 | 0.675 | 0.578 | 0.664 ± 0.066 | 3.5% |
o1-Preview | 0.729 | 0.707 | 0.624 | 0.687 ± 0.045 | 15.6% | 0.595 | 0.612 | 0.542 | 0.583 ± 0.030 | 17.5% |
GPT-4-Turbo | 0.658 | 0.621 | 0.488 | 0.589 ± 0.073 | 18.2% | 0.526 | 0.541 | 0.478 | 0.515 ± 0.027 | 24.5% |
Claude-3.5-Sonnet | 0.687 | 0.608 | 0.482 | 0.593 ± 0.084 | 15.6% | 0.529 | 0.508 | 0.430 | 0.489 ± 0.043 | 26.5% |
GPT-4o | 0.623 | 0.593 | 0.479 | 0.565 ± 0.062 | 21.4% | 0.460 | 0.466 | 0.408 | 0.444 ± 0.026 | 28.5% |
BlenderGPT | 0.574 | 0.540 | 0.444 | 0.519 ± 0.055 | 25.2% | 0.402 | 0.425 | 0.368 | 0.398 ± 0.023 | 35.0% |
Gemini-1.5-Pro | 0.535 | 0.483 | 0.387 | 0.468 ± 0.061 | 30.2% | 0.375 | 0.404 | 0.361 | 0.380 ± 0.018 | 38.0% |
DeepSeek-V2.5 | 0.569 | 0.497 | 0.372 | 0.479 ± 0.081 | 25.2% | 0.422 | 0.394 | 0.345 | 0.387 ± 0.032 | 34.0% |
Qwen2.5-Coder-7B-Instruct | 0.457 | 0.352 | 0.251 | 0.353 ± 0.084 | 31.4% | 0.354 | 0.327 | 0.250 | 0.310 ± 0.044 | 37.0% |
Qwen2.5 | 0.367 | 0.274 | 0.193 | 0.278 ± 0.071 | 44.8% | 0.220 | 0.219 | 0.170 | 0.203 ± 0.023 | 58.5% |
LLaMA-3.1-8B-Instruct | 0.125 | 0.087 | 0.071 | 0.094 ± 0.023 | 76.0% | 0.130 | 0.127 | 0.105 | 0.120 ± 0.011 | 65.5% |
Mistral-7B-Instruct-V0.3 | 0.015 | 0.018 | 0.015 | 0.016 ± 0.001 | 96.8% | 0.023 | 0.031 | 0.030 | 0.028 ± 0.004 | 93.0% |
CodeLLaMA-7B-Instruct | 0.005 | 0.004 | 0 | 0.003 ± 0.002 | 98.8% | 0.009 | 0.019 | 0.015 | 0.014 ± 0.004 | 96.5% |
BlenderLLM aims to improve the efficiency and accessibility of CAD modeling tasks but has the following limitations:
-
Focus on Basic Modeling: It primarily handles basic CAD tasks and does not support advanced design aspects such as material properties, surface treatments, or internal structural details.
-
Limited input Scope: The model generates CAD scripts from text instructions but does not support direct CAD model creation or multimodal inputs, such as integrating text with images.
-
Lack of Multi-turn Dialogue: It cannot handle iterative, multi-turn interactions, limiting its usefulness for collaborative and interactive design refinements.
We are aware that our works are inspired by the following works, including but not limited to
- Qwen2.5-Coder-7B-Instruct: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct
- Qwen2-VL-7B-Instruct: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct
- Self-instruct: https://github.com/yizhongw/self-instruct
Without these, nothing could happen in this repository.
@misc{du2024blenderllmtraininglargelanguage,
title={BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement},
author={Yuhao Du and Shunian Chen and Wenbo Zan and Peizhao Li and Mingxuan Wang and Dingjie Song and Bo Li and Yan Hu and Benyou Wang},
year={2024},
eprint={2412.14203},
archivePrefix={arXiv},
primaryClass={cs.HC},
url={https://arxiv.org/abs/2412.14203},
}
We are from the School of Data Science (SDS), the Chinese University of Hong Kong, Shenzhen (CUHKSZ).