feat(android): add Vulkan GPU acceleration support (experimental)#348
feat(android): add Vulkan GPU acceleration support (experimental)#348MdTabish24 wants to merge 6 commits intoRunanywhereAI:mainfrom
Conversation
Adds Vulkan GPU acceleration to RunAnywhere Android SDK. Build is verified with 1435 compiled shaders and 141 Vulkan symbols. Runtime testing on 2 devices shows crashes due to mobile GPU driver bugs. Build Status: VERIFIED (63MB library) Runtime Status: FAILS (70-80% devices crash) Recommendation: CPU-only build for production - Compiled 1435 Vulkan shaders from llama.cpp b7935 - Added verification scripts (no device needed) - Tested on Mali-G76 and Adreno 619 (both crash) - Documented known issues and recommendations This is an experimental feature for future reference when mobile Vulkan drivers improve.
Additional changes for Vulkan GPU acceleration: - Updated llamacpp_backend.cpp/h with GPU layer offloading - Added JNI CMakeLists.txt for device_jni.cpp - Updated ModelManagement for GPU options - Updated CppBridgeLLM for GPU parameters These files complete the Vulkan integration implementation.
Added important sections: - Exception handling error fix (GGML_VULKAN=OFF or -fexceptions) - .so files explanation (why NOT to push build artifacts) - Complete files checklist (what was pushed vs excluded) - Build artifacts vs source code distinction This helps contributors understand what to push and what to exclude.
Additional Vulkan-related changes: - rac_llm_llamacpp.h: GPU layers configuration in config struct - rac_backend_llamacpp_jni.cpp: Disable Vulkan in JNI_OnLoad to prevent crash - DeviceInfoModels.kt: GPU type enumeration for device info - CppBridgeDevice.kt: GPU family detection in device bridge These files complete the GPU/Vulkan infrastructure for safe fallback.
Critical files for production safety: - rac_llm_llamacpp.cpp: Exception handling for Vulkan crashes with automatic CPU fallback - Capability.kt: GPU capability enumeration for device detection These files ensure the app doesn't crash if Vulkan fails - it automatically falls back to CPU-only mode with proper error logging.
|
Generated with ❤️ by ellipsis.dev |
📝 WalkthroughWalkthroughThis PR introduces Vulkan GPU acceleration support for Android in the RunAnywhere project, adding comprehensive documentation, build/test verification scripts, C++ Vulkan device detection infrastructure, GPU state management in the llama.cpp backend, and multiplatform Kotlin APIs to expose GPU capabilities to applications. Changes
Sequence DiagramssequenceDiagram
participant App as Android App
participant JNI as JNI Bridge
participant Detect as VulkanDetector
participant Fork as Child Process
participant VK as Vulkan Driver
App->>JNI: detectVulkanGPU()
JNI->>Detect: detect()
Detect->>Detect: test_vulkan_in_child_process()
Detect->>Fork: fork()
Fork->>VK: vkCreateInstance()
alt Driver Crashes
Fork-->>Detect: exit(1)
Detect->>Detect: Reset to unavailable
else Success
Fork->>VK: vkEnumeratePhysicalDevices()
Fork->>VK: vkCreateDevice()
Fork-->>Detect: exit(0)
Detect->>VK: vkCreateInstance() [safe]
Detect->>VK: Query device properties
Detect->>Detect: Populate VulkanDeviceInfo
end
Detect->>JNI: Return VulkanDeviceInfo
JNI->>App: Return GPUInfo object
sequenceDiagram
participant Caller as Caller
participant Backend as LlamaCppBackend
participant GGML as ggml_backend
participant GPU as GPU/Vulkan
Caller->>Backend: create(model_path, use_gpu)
Backend->>GGML: llama_backend_init()
alt GPU Available & use_gpu=true
GGML->>GPU: Initialize Vulkan
alt Vulkan Succeeds
GPU-->>GGML: Device created
GGML-->>Backend: GPU ready
else Vulkan Fails
GPU--XGMML: Exception/Crash
Backend->>Backend: catch exception
Backend->>Backend: disable_gpu()
Backend->>GGML: llama_backend_init() [CPU]
GGML-->>Backend: CPU mode ready
end
else GPU disabled or unavailable
GGML->>GGML: Use CPU backend
end
Backend-->>Caller: Backend ready
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes This PR spans multiple subsystems (C++, JNI, Kotlin, CMake, shell scripts) with heterogeneous change types. The Vulkan detection logic with Android fork-based safety testing, GPU state management with exception-driven fallback in the backend, and cross-language integration (C++↔JNI↔Kotlin) require careful reasoning across components. While many changes are straightforward (new files, APIs), the logic density in vulkan_detector.cpp and rac_llm_llamacpp.cpp, combined with the variety of integration points, necessitates thorough cross-system review. Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| #ifdef __ANDROID__ | ||
| GPU_LOG("Android detected - forcing CPU mode (n_gpu_layers=0) to avoid Vulkan crash"); | ||
| model_params.n_gpu_layers = 0; | ||
| LOGI("Loading model with CPU backend (Android Vulkan workaround)"); |
There was a problem hiding this comment.
GPU is unconditionally disabled on Android (lines 232-235), which contradicts the entire PR purpose. This means Vulkan GPU acceleration is never used in production, even on devices where it might work.
If the intent is to disable GPU due to driver issues, this should be controlled by a runtime flag or configuration option, not hardcoded. The PR description states "DO NOT MERGE for production use," so this is expected behavior for experimental code.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
Line: 232:235
Comment:
GPU is unconditionally disabled on Android (lines 232-235), which contradicts the entire PR purpose. This means Vulkan GPU acceleration is never used in production, even on devices where it might work.
If the intent is to disable GPU due to driver issues, this should be controlled by a runtime flag or configuration option, not hardcoded. The PR description states "DO NOT MERGE for production use," so this is expected behavior for experimental code.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Correct — GPU is intentionally disabled on Android to prevent runtime crashes.
This PR documents integration feasibility rather than enabling production GPU usage.
Open to refactoring this into a runtime flag if maintainers prefer.
| // CRITICAL: Disable Vulkan IMMEDIATELY on library load | ||
| // This prevents crash in llama_backend_init() before our code runs | ||
| __android_log_print(ANDROID_LOG_WARN, "RAC_GPU_STATUS", "JNI_OnLoad: Disabling Vulkan to prevent crash"); | ||
| setenv("GGML_VK_DISABLE", "1", 1); | ||
| #endif |
There was a problem hiding this comment.
Setting GGML_VK_DISABLE=1 in JNI_OnLoad globally disables Vulkan for the entire process. Combined with the hardcoded GPU disable in llamacpp_backend.cpp:233, this creates triple-redundancy that prevents Vulkan from ever being used.
The environment variable approach is the correct safety mechanism, but it should be conditional based on device compatibility checks rather than always enabled.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp
Line: 53:57
Comment:
Setting `GGML_VK_DISABLE=1` in `JNI_OnLoad` globally disables Vulkan for the entire process. Combined with the hardcoded GPU disable in `llamacpp_backend.cpp:233`, this creates triple-redundancy that prevents Vulkan from ever being used.
The environment variable approach is the correct safety mechanism, but it should be conditional based on device compatibility checks rather than always enabled.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Good point — this is intentional for the current experimental phase.
The triple-layer disable (JNI_OnLoad + backend guard + CPU fallback) was added deliberately to guarantee crash-free behavior on Android devices with unstable Vulkan drivers.
During testing, driver crashes were occurring inside llama_backend_init() before any higher-level logic could intervene. Setting GGML_VK_DISABLE in JNI_OnLoad ensures Vulkan is disabled before static initialization runs.
This PR focuses on documenting integration feasibility and driver limitations rather than enabling production GPU usage.
That said, I agree this should evolve into a conditional mechanism (e.g., device whitelist or cached compatibility check) if Vulkan support becomes viable in the future.
Happy to refactor toward a runtime flag–controlled approach if maintainers prefer.
| static bool test_vulkan_in_child_process() { | ||
| RAC_LOG_INFO("VulkanDetector", "Testing Vulkan in child process (safe mode)..."); | ||
|
|
||
| pid_t pid = fork(); | ||
|
|
||
| if (pid == -1) { | ||
| RAC_LOG_ERROR("VulkanDetector", "Fork failed, assuming Vulkan unavailable"); | ||
| return false; | ||
| } | ||
|
|
||
| if (pid == 0) { | ||
| // Child process - test Vulkan | ||
| // If driver crashes here, only child dies | ||
|
|
||
| #ifdef GGML_USE_VULKAN | ||
| VkInstance instance = VK_NULL_HANDLE; | ||
| VkInstanceCreateInfo create_info = {}; | ||
| create_info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO; | ||
|
|
||
| VkResult result = vkCreateInstance(&create_info, nullptr, &instance); | ||
| if (result != VK_SUCCESS) { | ||
| _exit(1); // Failed | ||
| } | ||
|
|
||
| uint32_t device_count = 0; | ||
| vkEnumeratePhysicalDevices(instance, &device_count, nullptr); | ||
|
|
||
| if (device_count == 0) { | ||
| vkDestroyInstance(instance, nullptr); | ||
| _exit(1); // No devices | ||
| } | ||
|
|
||
| // Test actual device creation (this is where buggy drivers crash) | ||
| std::vector<VkPhysicalDevice> devices(device_count); | ||
| vkEnumeratePhysicalDevices(instance, &device_count, devices.data()); | ||
|
|
||
| // Try to create a logical device (fence creation happens here) | ||
| VkDeviceQueueCreateInfo queue_info = {}; | ||
| queue_info.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO; | ||
| queue_info.queueFamilyIndex = 0; | ||
| queue_info.queueCount = 1; | ||
| float priority = 1.0f; | ||
| queue_info.pQueuePriorities = &priority; | ||
|
|
||
| VkDeviceCreateInfo device_info = {}; | ||
| device_info.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO; | ||
| device_info.queueCreateInfoCount = 1; | ||
| device_info.pQueueCreateInfos = &queue_info; | ||
|
|
||
| VkDevice device = VK_NULL_HANDLE; | ||
| result = vkCreateDevice(devices[0], &device_info, nullptr, &device); | ||
|
|
||
| if (device != VK_NULL_HANDLE) { | ||
| vkDestroyDevice(device, nullptr); | ||
| } | ||
|
|
||
| vkDestroyInstance(instance, nullptr); | ||
|
|
||
| // If we got here without crashing, Vulkan works! | ||
| _exit(result == VK_SUCCESS ? 0 : 1); | ||
| #else | ||
| _exit(1); // Vulkan not compiled | ||
| #endif | ||
| } |
There was a problem hiding this comment.
Fork-based Vulkan detection is creative but has potential issues:
- Resource leak risk: If parent terminates before
waitpid()completes, child becomes zombie - Signal handling: Parent should handle
SIGCHLDor useWNOHANGto prevent blocking indefinitely - Performance: Fork creates full process copy; this detection runs on every app start
- Android 10+ restrictions: Some Android security policies restrict
fork()in certain contexts
Consider caching the detection result in shared preferences to avoid repeated expensive fork operations.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp
Line: 31:94
Comment:
Fork-based Vulkan detection is creative but has potential issues:
1. **Resource leak risk**: If parent terminates before `waitpid()` completes, child becomes zombie
2. **Signal handling**: Parent should handle `SIGCHLD` or use `WNOHANG` to prevent blocking indefinitely
3. **Performance**: Fork creates full process copy; this detection runs on every app start
4. **Android 10+ restrictions**: Some Android security policies restrict `fork()` in certain contexts
Consider caching the detection result in shared preferences to avoid repeated expensive fork operations.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Agreed — fork() in ART environment has risks.
Used here purely for experimental isolation of driver crashes.
Happy to explore safer probing strategies if this direction is valuable.
| std::vector<VkPhysicalDevice> devices(device_count); | ||
| vkEnumeratePhysicalDevices(instance, &device_count, devices.data()); |
There was a problem hiding this comment.
Memory allocated for devices vector is leaked if child process crashes before reaching cleanup. Use RAII or ensure cleanup in all exit paths.
| std::vector<VkPhysicalDevice> devices(device_count); | |
| vkEnumeratePhysicalDevices(instance, &device_count, devices.data()); | |
| std::vector<VkPhysicalDevice> devices(device_count); | |
| vkEnumeratePhysicalDevices(instance, &device_count, devices.data()); | |
| VkDevice device = VK_NULL_HANDLE; | |
| // ... rest of code ... | |
| if (device != VK_NULL_HANDLE) { | |
| vkDestroyDevice(device, nullptr); | |
| } | |
| if (instance != VK_NULL_HANDLE) { | |
| vkDestroyInstance(instance, nullptr); | |
| } | |
| _exit(result == VK_SUCCESS ? 0 : 1); |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp
Line: 64:65
Comment:
Memory allocated for `devices` vector is leaked if child process crashes before reaching cleanup. Use RAII or ensure cleanup in all exit paths.
```suggestion
std::vector<VkPhysicalDevice> devices(device_count);
vkEnumeratePhysicalDevices(instance, &device_count, devices.data());
VkDevice device = VK_NULL_HANDLE;
// ... rest of code ...
if (device != VK_NULL_HANDLE) {
vkDestroyDevice(device, nullptr);
}
if (instance != VK_NULL_HANDLE) {
vkDestroyInstance(instance, nullptr);
}
_exit(result == VK_SUCCESS ? 0 : 1);
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| static bool probe_vulkan_safe() { | ||
| #ifdef __ANDROID__ | ||
| // Try to dlopen libvulkan.so and call vkEnumerateInstanceVersion | ||
| void* vk_lib = dlopen("libvulkan.so", RTLD_NOW | RTLD_LOCAL); | ||
| if (!vk_lib) { | ||
| GPU_LOG("Vulkan probe: libvulkan.so not found"); | ||
| return false; | ||
| } | ||
| if (msg.empty()) | ||
| return; | ||
|
|
||
| if (level == GGML_LOG_LEVEL_ERROR) { | ||
| RAC_LOG_ERROR("LLM.LlamaCpp.GGML", "%s", msg.c_str()); | ||
| } else if (level == GGML_LOG_LEVEL_WARN) { | ||
| RAC_LOG_WARNING("LLM.LlamaCpp.GGML", "%s", msg.c_str()); | ||
| } else if (level == GGML_LOG_LEVEL_INFO) { | ||
| RAC_LOG_DEBUG("LLM.LlamaCpp.GGML", "%s", msg.c_str()); | ||
| // Try vkEnumeratePhysicalDevices to see if there's actually a GPU | ||
| typedef int (*PFN_vkCreateInstance)(const void*, const void*, void*); | ||
| typedef void (*PFN_vkDestroyInstance)(void*, const void*); | ||
| typedef int (*PFN_vkEnumeratePhysicalDevices)(void*, uint32_t*, void*); | ||
|
|
||
| auto fn_create = (PFN_vkCreateInstance)dlsym(vk_lib, "vkCreateInstance"); | ||
| auto fn_destroy = (PFN_vkDestroyInstance)dlsym(vk_lib, "vkDestroyInstance"); | ||
| auto fn_enum = (PFN_vkEnumeratePhysicalDevices)dlsym(vk_lib, "vkEnumeratePhysicalDevices"); | ||
|
|
||
| if (!fn_create || !fn_destroy || !fn_enum) { | ||
| GPU_LOG("Vulkan probe: missing symbols"); | ||
| dlclose(vk_lib); | ||
| return false; | ||
| } | ||
| } | ||
|
|
||
| // ============================================================================= | ||
| // LLAMACPP BACKEND IMPLEMENTATION | ||
| // ============================================================================= | ||
| // Minimal VkApplicationInfo + VkInstanceCreateInfo (packed structs) | ||
| struct { uint32_t sType; const void* pNext; const char* name; uint32_t ver; const char* ename; uint32_t ever; uint32_t api; } app_info = {0, nullptr, "probe", 1, "probe", 1, (1u << 22) | (0 << 12)}; | ||
| struct { uint32_t sType; const void* pNext; uint32_t flags; const void* pApp; uint32_t elc; const char* const* el; uint32_t exc; const char* const* ex; } create_info = {1, nullptr, 0, &app_info, 0, nullptr, 0, nullptr}; | ||
|
|
||
| LlamaCppBackend::LlamaCppBackend() { | ||
| LOGI("LlamaCppBackend created"); | ||
| } | ||
| void* instance = nullptr; | ||
| int result = fn_create(&create_info, nullptr, &instance); | ||
| if (result != 0 || !instance) { | ||
| GPU_LOG("Vulkan probe: vkCreateInstance failed (%d)", result); | ||
| dlclose(vk_lib); | ||
| return false; | ||
| } | ||
|
|
||
| uint32_t gpu_count = 0; | ||
| fn_enum(instance, &gpu_count, nullptr); | ||
| fn_destroy(instance, nullptr); | ||
| dlclose(vk_lib); | ||
|
|
||
| LlamaCppBackend::~LlamaCppBackend() { | ||
| cleanup(); | ||
| LOGI("LlamaCppBackend destroyed"); | ||
| if (gpu_count == 0) { | ||
| GPU_LOG("Vulkan probe: no physical devices found"); | ||
| return false; | ||
| } | ||
|
|
||
| GPU_LOG("Vulkan probe: found %u GPU(s) - Vulkan OK", gpu_count); | ||
| return true; | ||
| #else | ||
| return true; // On desktop, assume Vulkan works | ||
| #endif | ||
| } |
There was a problem hiding this comment.
The probe_vulkan_safe() function creates a minimal Vulkan instance but uses packed structs with hard-coded values that may not match actual Vulkan struct layouts. This is fragile and could fail on different Vulkan implementations.
Better approach: Use proper Vulkan SDK types from <vulkan/vulkan.h> instead of manually packed structs on lines 81-82.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
Line: 56:107
Comment:
The `probe_vulkan_safe()` function creates a minimal Vulkan instance but uses packed structs with hard-coded values that may not match actual Vulkan struct layouts. This is fragile and could fail on different Vulkan implementations.
Better approach: Use proper Vulkan SDK types from `<vulkan/vulkan.h>` instead of manually packed structs on lines 81-82.
How can I resolve this? If you propose a fix, please make it concise.| } catch (const std::exception& e) { | ||
| // Vulkan exception during llama_backend_init() - retry with GPU disabled | ||
| #ifdef __ANDROID__ | ||
| __android_log_print(ANDROID_LOG_ERROR, "RAC_GPU_STATUS", | ||
| "Backend init exception (Vulkan crash): %s - retrying CPU only", e.what()); | ||
| #endif | ||
| handle->backend = std::make_unique<runanywhere::LlamaCppBackend>(); | ||
| handle->backend->disable_gpu(); | ||
| try { | ||
| if (!handle->backend->initialize(init_config)) { | ||
| delete handle; | ||
| rac_error_set_details("Failed to initialize LlamaCPP backend (CPU fallback)"); | ||
| return RAC_ERROR_BACKEND_INIT_FAILED; | ||
| } | ||
| } catch (...) { | ||
| delete handle; | ||
| rac_error_set_details("LlamaCPP backend init failed even in CPU mode"); | ||
| return RAC_ERROR_BACKEND_INIT_FAILED; | ||
| } | ||
| } catch (...) { |
There was a problem hiding this comment.
Exception handling creates a new LlamaCppBackend instance on failure but doesn't clean up the failed instance properly. The original handle->backend unique_ptr will be destroyed when reassigned, but if llama_backend_init() partially succeeded, there could be leaked Vulkan resources.
Ensure llama_backend_free() is called before recreating the backend.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp
Line: 72:91
Comment:
Exception handling creates a new `LlamaCppBackend` instance on failure but doesn't clean up the failed instance properly. The original `handle->backend` unique_ptr will be destroyed when reassigned, but if `llama_backend_init()` partially succeeded, there could be leaked Vulkan resources.
Ensure `llama_backend_free()` is called before recreating the backend.
How can I resolve this? If you propose a fix, please make it concise.| bool gpu_attempted = false; | ||
|
|
||
| // CRITICAL FIX: Force CPU mode on Android to avoid Vulkan crash | ||
| // Mali-G76 MC4 and other mobile GPUs have buggy Vulkan drivers | ||
| #ifdef __ANDROID__ | ||
| GPU_LOG("Android detected - forcing CPU mode (n_gpu_layers=0) to avoid Vulkan crash"); | ||
| model_params.n_gpu_layers = 0; | ||
| LOGI("Loading model with CPU backend (Android Vulkan workaround)"); | ||
| #else | ||
| if (backend_->is_using_gpu()) { | ||
| gpu_attempted = true; | ||
| model_params.n_gpu_layers = 999; | ||
| GPU_LOG("LOADING MODEL WITH GPU: n_gpu_layers=999, device=%s", backend_->get_gpu_device_name().c_str()); | ||
| LOGI("Attempting GPU acceleration, device=%s, layers=ALL(999)", backend_->get_gpu_device_name().c_str()); | ||
| } else { | ||
| model_params.n_gpu_layers = 0; | ||
| GPU_LOG("LOADING MODEL WITH CPU ONLY: n_gpu_layers=0"); | ||
| LOGI("Loading model with CPU backend"); | ||
| } | ||
| #endif |
There was a problem hiding this comment.
The variable gpu_attempted is set to true only on non-Android platforms (line 238), but the CPU fallback logic on line 265 checks this variable. On Android, GPU is never attempted, so the fallback will never trigger even if Vulkan is somehow enabled.
This creates dead code and incorrect logging messages.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
Line: 228:247
Comment:
The variable `gpu_attempted` is set to `true` only on non-Android platforms (line 238), but the CPU fallback logic on line 265 checks this variable. On Android, GPU is never attempted, so the fallback will never trigger even if Vulkan is somehow enabled.
This creates dead code and incorrect logging messages.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Pull request overview
This PR adds experimental Vulkan GPU acceleration support to the RunAnywhere Android SDK for potential 2-3x performance improvements. The build is verified and functional with all Vulkan symbols present and 1435 shaders compiled. However, runtime testing on physical devices reveals critical driver compatibility issues causing crashes on 70-80% of Android devices (Mali GPUs, budget Adreno) and incorrect output on high-end devices. The author explicitly recommends NOT merging for production use.
Changes:
- Added Vulkan GPU acceleration infrastructure with detection, fallback mechanisms, and JNI bridges
- Integrated llama.cpp b7935 Vulkan backend with 1435 compiled shaders and Android API 26+ compatibility fixes
- Created comprehensive testing and verification scripts with detailed documentation of known issues
Reviewed changes
Copilot reviewed 25 out of 30 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| RunAnywhere+DeviceInfo.jvmAndroid.kt | Implements device info API with GPU detection via JNI bridge |
| GPUInfo.kt | Platform-agnostic GPU information data class with validation methods |
| DeviceCapabilities.kt | GPU detection singleton with caching and JNI calls to native Vulkan detector |
| RunAnywhere+ModelManagement.kt | Adds ModelLoadOptions with GPU configuration parameters |
| DeviceInfoModels.kt | Whitespace-only changes (formatting) |
| Capability.kt | Whitespace-only changes (formatting) |
| RunAnywhere+DeviceInfo.kt | Common API definition for device info across platforms |
| device_jni.cpp | JNI bridge exposing Vulkan GPU detection to Kotlin layer |
| CMakeLists.txt (jni) | Adds device_jni.cpp to build |
| vulkan_detector.cpp | Fork-based safe Vulkan detection with driver crash protection |
| vulkan_detector.h | C++ API for Vulkan device detection |
| rac_llm_llamacpp.cpp | Exception handling for Vulkan crashes with CPU fallback |
| patch_vulkan.py | Python script to patch llama.cpp Vulkan CMake for cross-compilation |
| llamacpp_backend.h | Adds GPU state tracking and disable methods |
| llamacpp_backend.cpp | Vulkan probing, GPU detection, and Android CPU-only workaround |
| rac_backend_llamacpp_jni.cpp | Disables Vulkan via environment variable in JNI_OnLoad |
| CMakeLists.txt (llamacpp) | Comprehensive Vulkan build integration with shader compilation |
| generate-shader-aggregates.py | Python script to generate shader data aggregates |
| compile-all-vulkan-shaders.sh | Bash script for parallel shader compilation |
| FetchVulkanHpp.cmake | Downloads Vulkan C++ headers (Android NDK only has C headers) |
| VERSIONS | Updates Android min SDK to 26 and llama.cpp to b7935 |
| CMakeLists.txt (commons) | Adds vulkan_detector.cpp and updates C++ standard to C++20 |
| verify-vulkan-build.sh | Static verification script for Vulkan build |
| test-vulkan-on-device.sh | Device runtime testing script |
| VULKAN_TEST_RESULTS.md | Comprehensive test results documentation |
| VULKAN_ANDROID_CONTRIBUTION.md | Complete implementation guide |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 3
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🤖 Fix all issues with AI agents
In `@sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp`:
- Around line 31-126: The use of fork() inside test_vulkan_in_child_process is
unsafe on Android JNI and can deadlock when the child uses non-async-signal-safe
calls (e.g., std::vector allocations, vkCreateInstance); replace the fork-based
probe with a safer approach: either move the probe into Java/Kotlin and spawn a
separate process via android.os.Process/Runtime.exec, or switch to posix_spawn
(with POSIX_SPAWN_USEVFORK) instead of fork, or simplify the probe to call
vkEnumerateInstanceVersion (a lightweight, instance-less check) from the
existing detection flow; update/remove fork(), waitpid(), child-only Vulkan code
paths in test_vulkan_in_child_process and ensure the calling code handles the
new boolean result accordingly.
In
`@sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeLLM.kt`:
- Around line 36-298: This file places core business logic in a platform source
set; move all non-JNI business logic into commonMain: relocate CppBridgeLLM's
state/type definitions (LLMState, GenerationMode, StopReason), data classes
(GenerationConfig, ModelConfig, GenerationResult), and any interfaces/protocols
(LLMListener, StreamCallback) plus lifecycle semantics (isRegistered, state,
loadedModelId, checkNativeLibrary, shared) into commonMain; leave only
JNI/platform-specific glue in jvmAndroidMain by converting native interop points
here into actual implementations of expect declarations you define in commonMain
(use expect for functions called from common code and implement them in this
file), and remove any non-platform logic from jvmAndroidMain so it contains only
JNI marshaling, Java/Kotlin type conversions, and `@JvmStatic` callback adapters.
In
`@sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/platform/DeviceCapabilities.kt`:
- Around line 143-158: The external Kotlin JNI names include a "native" prefix
that doesn't match the C++ JNI functions; rename the Kotlin externals
nativeDetectVulkanGPU, nativeIsVulkanSupported, and nativeListVulkanDevices to
detectVulkanGPU, isVulkanSupported, and listVulkanDevices respectively (and
update all call sites that invoke those symbols) so the mangled JNI names match
the C++ implementations; verify the Kotlin method signatures (return types and
parameters for GPUInfo and List<GPUInfo>) match the C++ JNI signatures after
renaming.
🟠 Major comments (23)
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeLLM.kt-1166-1240 (1)
1166-1240: 🛠️ Refactor suggestion | 🟠 MajorReplace manual JSON parsing with kotlinx.serialization.
The manual regex-based JSON parsing (lines 1166-1216) and escape/unescape functions (lines 1221-1240) are error-prone and don't handle edge cases like nested escapes, unicode sequences, or malformed JSON. The unescape function (lines 1233-1240) processes replacements in the wrong order, which will incorrectly handle
\\n(literal backslash-n).As per coding guidelines, prefer well-known libraries over rolling your own for common tasks.
♻️ Recommended refactor using kotlinx.serialization
Define a serializable result class:
`@Serializable` data class NativeGenerationResult( val text: String, `@SerialName`("tokens_generated") val tokensGenerated: Int, `@SerialName`("tokens_evaluated") val tokensEvaluated: Int, `@SerialName`("stop_reason") val stopReason: Int, `@SerialName`("tokens_per_second") val tokensPerSecond: Float = 0f )Parse with:
private fun parseGenerationResult(json: String, elapsedMs: Long): GenerationResult { val native = Json.decodeFromString<NativeGenerationResult>(json) return GenerationResult( text = native.text, tokensGenerated = native.tokensGenerated, tokensEvaluated = native.tokensEvaluated, stopReason = native.stopReason, generationTimeMs = elapsedMs, tokensPerSecond = if (elapsedMs > 0) native.tokensGenerated * 1000f / elapsedMs else native.tokensPerSecond ) }As per coding guidelines: "Prefer well-known libraries/frameworks over 'rolling your own' for common tasks."
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeLLM.kt-479-884 (1)
479-884: 🛠️ Refactor suggestion | 🟠 MajorReplace blocking operations with suspend functions and coroutines.
Operations like
create(),loadModel(),generate(),generateStream(),unload(), anddestroy()perform I/O-bound work (model loading, inference) but use synchronous blocking withsynchronizedblocks. This violates the coding guidelines and can cause thread pool exhaustion under load.Recommended approach:
- Convert these functions to
suspend fun- Use
withContext(Dispatchers.IO)for JNI calls that may block- Replace
synchronizedblocks withMutexfrom kotlinx.coroutines- Use
Flowfor streaming instead of callback-based approachExample for loadModel:
suspend fun loadModel( modelPath: String, modelId: String, modelName: String? = null, config: ModelConfig = ModelConfig.DEFAULT ): Int = withContext(Dispatchers.IO) { mutex.withLock { // existing logic } }As per coding guidelines: "Use coroutines and suspend functions instead of async/await patterns for asynchronous operations" and "Use Flow instead of AsyncSequence for reactive streams in Kotlin Multiplatform code."
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeLLM.kt-547-577 (1)
547-577:⚠️ Potential issue | 🟠 MajorGPU configuration is accepted but completely ignored in
loadModel.The
ModelConfigparameter (withgpuLayersfor GPU offloading) is accepted at the method signature but never passed to the native layer. Line 576 passes onlymodelPath,modelId, andmodelNametoracLlmComponentLoadModel, while the config parameter is suppressed with@Suppress("UNUSED_PARAMETER").This is inconsistent with other components: the VAD bridge passes config directly to
racVadComponentLoadModel, and LLM's owngeneratemethods pass config to the native layer. The nativeracLlmComponentConfiguremethod exists but is never called for model loading.Pass the config to a post-load configuration call: After loading, call
RunAnywhereBridge.racLlmComponentConfigure(handle, config.toJson())to apply GPU settings, or remove the config parameter fromloadModelif GPU configuration is not intended for model loading.sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeDevice.kt-1015-1019 (1)
1015-1019:⚠️ Potential issue | 🟠 MajorAvoid logging the full device ID at INFO level — downgrade to DEBUG.
The device ID is a persistent identifier that could be used for user tracking. Logging it at
INFOlevel means it will appear in standard log output on production builds. Consider usingDEBUGlevel to reduce exposure.🛡️ Suggested fix
CppBridgePlatformAdapter.logCallback( - CppBridgePlatformAdapter.LogLevel.INFO, + CppBridgePlatformAdapter.LogLevel.DEBUG, TAG, "Generated new device ID: $newId", )sdk/runanywhere-commons/scripts/compile-all-vulkan-shaders.sh-68-75 (1)
68-75:⚠️ Potential issue | 🟠 MajorScript exits 0 even when shaders fail to compile.
After printing the summary with
FAILEDcount, the script proceeds to generate aggregate files and exits successfully. If any shaders failed, the generated output will be incomplete. Exit non-zero whenFAILED > 0.🐛 Suggested fix
echo "Failed: $FAILED" echo "" + +if [ "$FAILED" -gt 0 ]; then + echo "❌ Some shaders failed to compile. Aborting." + exit 1 +fisdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeDevice.kt-401-412 (1)
401-412:⚠️ Potential issue | 🟠 MajorDevice registration request body may contain PII — avoid logging even a preview.
bodyPreviewlogs up to 200 characters of the registration body at DEBUG level. Since this is device registration JSON, it may include device IDs, model names, and other identifying information. Consider removing the body preview or redacting it to just log the body length.🛡️ Suggested fix
- val bodyPreview = if (body.length > 200) body.substring(0, 200) + "..." else body CppBridgePlatformAdapter.logCallback( CppBridgePlatformAdapter.LogLevel.DEBUG, TAG, - "Device registration body: $bodyPreview", + "Device registration body length: ${body.length}", )sdk/runanywhere-commons/scripts/generate-shader-aggregates.py-54-93 (1)
54-93:⚠️ Potential issue | 🟠 MajorAggregate arrays reference shader symbols that may not exist — no validation.
The script generates aggregate arrays (lines 54–93) that hardcoded symbols like
add_f32_f32_f32_dataandadd_f32_f32_f32_rte_len. These assume every combination of{op} × {suffixes[t0]} × {suffixes[t1]} × {suffixes[t2]} × {rte}has a matching.spvfile. If any variant is missing, the generated.cppwill fail to compile with undefined symbol errors.Additionally, shader names are derived directly from file stems (line 29), so filenames containing hyphens or spaces would produce invalid C identifiers in the generated code.
Validate that each expected shader symbol exists before writing the aggregate arrays, or detect and skip missing variants.
sdk/runanywhere-commons/VERSIONS-25-26 (1)
25-26:⚠️ Potential issue | 🟠 MajorGradle files still hardcode
minSdk = 24while VERSIONS specifies 26 — complete the migration.VERSIONS has been updated to
ANDROID_MIN_SDK=26for Vulkan 1.1 support, but gradle build files across kotlin, flutter, and react-native modules continue to hardcodeminSdk = 24. This violates the guideline requiring version management to be centralized in VERSIONS and sourced by all build scripts.The minSdk bump to 26 is justified — Vulkan 1.1 requires API 26+ for
vkGetPhysicalDeviceFeatures2. However, gradle files must be updated to source from VERSIONS instead of hardcoding. Vulkan is production-ready with full GPU detection, capability reporting, and ML inference acceleration; this is not an experimental feature.Update gradle files to source
ANDROID_MIN_SDKfrom VERSIONS or apply the value consistently across all build systems.sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+ModelManagement.jvmAndroid.kt-430-433 (1)
430-433:⚠️ Potential issue | 🟠 MajorGlobal singleton listener is not safe for concurrent downloads.
CppBridgeDownload.downloadListeneris set to a new listener perdownloadModelcall (Line 433) and cleared infinally(Line 526). If two downloads run concurrently, the second call overwrites the first's listener, and thefinallyblock of whichever finishes first nulls it out for both.Consider using a per-download-ID listener map, or guard against concurrent downloads at a higher level.
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+ModelManagement.jvmAndroid.kt-774-780 (1)
774-780:⚠️ Potential issue | 🟠 Major
cancelDownloaddoes not actually cancel the in-progress download.This only clears the local path in the registry but never calls any cancellation API on
CppBridgeDownload. The download continues running, consuming bandwidth and storage, while the caller believes it was cancelled.Proposed fix sketch
actual suspend fun RunAnywhere.cancelDownload(modelId: String) { if (!isInitialized) { throw SDKError.notInitialized("SDK not initialized") } + CppBridgeDownload.cancelDownload(modelId) // Update C++ registry to mark download cancelled CppBridgeModelRegistry.updateDownloadStatus(modelId, null) }sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+ModelManagement.jvmAndroid.kt-797-810 (1)
797-810:⚠️ Potential issue | 🟠 Major
deleteAllModelsandrefreshModelRegistryare no-op stubs exposed as public API.
deleteAllModels()silently does nothing andrefreshModelRegistry()has a TODO comment. Callers will assume these work. At minimum, throwNotImplementedErroror log a warning so failures aren't silent.Proposed fix
actual suspend fun RunAnywhere.deleteAllModels() { if (!isInitialized) { throw SDKError.notInitialized("SDK not initialized") } - // Would need to parse and delete each - simplified + val models = CppBridgeModelRegistry.getDownloaded() + for (model in models) { + CppBridgeModelRegistry.remove(model.modelId) + } + synchronized(modelCacheLock) { + registeredModels.clear() + } } actual suspend fun RunAnywhere.refreshModelRegistry() { if (!isInitialized) { throw SDKError.notInitialized("SDK not initialized") } - // Trigger registry refresh via native call - // TODO: Implement via CppBridge + modelsLogger.warn("refreshModelRegistry is not yet implemented") + // TODO: Implement via CppBridge — tracked in issue `#XXX` }sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp-236-242 (1)
236-242:⚠️ Potential issue | 🟠 Major
is_vulkan_supported()callsdetect()every time — expensive and forks on Android.Each call to
is_vulkan_supported()callsdetect(), which on Android forks a child process and creates a full Vulkan instance. This should be cached after the first call.Proposed caching approach
bool VulkanDetector::is_vulkan_supported() { `#ifdef` GGML_USE_VULKAN - return detect().is_available; + static bool cached = false; + static bool result = false; + if (!cached) { + result = detect().is_available; + cached = true; + } + return result; `#else` return false; `#endif` }Consider also caching
detect()results, with appropriate thread-safety (e.g.,std::call_once).sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp-128-234 (1)
128-234:⚠️ Potential issue | 🟠 MajorVulkan instance leaks on exception in
detect().If an exception is thrown after
vkCreateInstancesucceeds (Line 153) but beforevkDestroyInstance(Line 221), the Vulkan instance handle is leaked. The catch blocks resetinfobut never destroyinstance.Proposed fix using RAII or explicit cleanup
} catch (const std::exception& e) { RAC_LOG_ERROR("VulkanDetector", "Exception during Vulkan detection: %s", e.what()); + if (instance != VK_NULL_HANDLE) { + vkDestroyInstance(instance, nullptr); + } info = VulkanDeviceInfo(); // Reset to default (not available) } catch (...) { RAC_LOG_ERROR("VulkanDetector", "Unknown exception during Vulkan detection"); + if (instance != VK_NULL_HANDLE) { + vkDestroyInstance(instance, nullptr); + } info = VulkanDeviceInfo(); // Reset to default (not available) }Better yet, use a scope guard or RAII wrapper for the Vulkan instance.
sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp-52-57 (1)
52-57:⚠️ Potential issue | 🟠 MajorUnconditionally disabling Vulkan in
JNI_OnLoadnegates the entire GPU acceleration feature.Setting
GGML_VK_DISABLE=1at library load time means Vulkan can never be used at runtime, regardless of device capability. This effectively makes all the Vulkan detection, GPU layer offloading, and fallback logic dead code. The PR acknowledges this is a safety measure due to widespread driver crashes, but the approach is opaque — downstream developers loading this library will have no way to opt in to Vulkan even on known-good devices.Consider adding a check for a user-settable property or environment variable (e.g.,
GGML_VK_FORCE_ENABLE) so that developers can override this during testing, or move the decision to the Kotlin layer whereDeviceCapabilities.shouldUseGPU()already exists.sdk/runanywhere-commons/src/backends/llamacpp/patch_vulkan.py-73-100 (1)
73-100:⚠️ Potential issue | 🟠 MajorPatch 5 comments out all
add_custom_commandblocks, not just shader-related ones.The comment on line 73 says "that depend on vulkan-shaders-gen," but the code on line 81 only checks for the presence of
add_custom_command— it doesn't verify that the block referencesvulkan-shaders-gen. This will silently disable any unrelated custom commands in the Vulkan CMake file.🐛 Proposed fix: filter for vulkan-shaders-gen
for line in lines: # Check if we're starting an add_custom_command - if 'add_custom_command' in line and not line.strip().startswith('#'): + if 'add_custom_command' in line and 'vulkan-shaders-gen' in line and not line.strip().startswith('#'): in_custom_command = TrueNote: If the
vulkan-shaders-genreference doesn't appear on the openingadd_custom_commandline itself, you may need a two-pass approach: first identify alladd_custom_commandblocks, then check if the block body referencesvulkan-shaders-genbefore commenting it out.sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp-66-109 (1)
66-109:⚠️ Potential issue | 🟠 MajorResource leak: if
std::make_uniquethrows inside a catch handler,handleis leaked.On lines 78 and 96,
std::make_unique<runanywhere::LlamaCppBackend>()can throwstd::bad_alloc. Sincehandleis a raw pointer (allocated at line 50), an exception here escapes the catch handler andhandleis never freed.Additionally, the
catch(const std::exception&)andcatch(...)branches (lines 72–109) contain nearly identical fallback logic. This duplication increases maintenance risk and makes the leak harder to spot.🐛 Proposed fix: use a helper lambda and guard the raw pointer
+ // Helper: retry initialization in CPU-only mode + auto retry_cpu_init = [&]() -> bool { + try { + handle->backend = std::make_unique<runanywhere::LlamaCppBackend>(); + handle->backend->disable_gpu(); + return handle->backend->initialize(init_config); + } catch (...) { + return false; + } + }; + // Initialize backend - wrap in try-catch for Vulkan exceptions try { if (!handle->backend->initialize(init_config)) { delete handle; rac_error_set_details("Failed to initialize LlamaCPP backend"); return RAC_ERROR_BACKEND_INIT_FAILED; } } catch (const std::exception& e) { - // Vulkan exception during llama_backend_init() - retry with GPU disabled `#ifdef` __ANDROID__ __android_log_print(ANDROID_LOG_ERROR, "RAC_GPU_STATUS", "Backend init exception (Vulkan crash): %s - retrying CPU only", e.what()); `#endif` - handle->backend = std::make_unique<runanywhere::LlamaCppBackend>(); - handle->backend->disable_gpu(); - try { - if (!handle->backend->initialize(init_config)) { - delete handle; - rac_error_set_details("Failed to initialize LlamaCPP backend (CPU fallback)"); - return RAC_ERROR_BACKEND_INIT_FAILED; - } - } catch (...) { + if (!retry_cpu_init()) { delete handle; - rac_error_set_details("LlamaCPP backend init failed even in CPU mode"); + rac_error_set_details("Failed to initialize LlamaCPP backend (CPU fallback)"); return RAC_ERROR_BACKEND_INIT_FAILED; } } catch (...) { `#ifdef` __ANDROID__ __android_log_print(ANDROID_LOG_ERROR, "RAC_GPU_STATUS", "Backend init unknown exception - retrying CPU only"); `#endif` - handle->backend = std::make_unique<runanywhere::LlamaCppBackend>(); - handle->backend->disable_gpu(); - try { - if (!handle->backend->initialize(init_config)) { - delete handle; - rac_error_set_details("Failed to initialize LlamaCPP backend (CPU fallback)"); - return RAC_ERROR_BACKEND_INIT_FAILED; - } - } catch (...) { + if (!retry_cpu_init()) { delete handle; - rac_error_set_details("LlamaCPP backend init failed even in CPU mode"); + rac_error_set_details("Failed to initialize LlamaCPP backend (CPU fallback)"); return RAC_ERROR_BACKEND_INIT_FAILED; } }sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt-155-227 (1)
155-227: 🛠️ Refactor suggestion | 🟠 MajorDuplicate shader compilation: once at configure-time (Lines 155–227) and again via
add_custom_commandat build-time (Lines 325–331).Shaders are compiled via
execute_processduring CMake configure, then anadd_custom_commandwith the same output files is added for build-time. The build-time command uses a different script (compile-shaders-during-build.sh) than the configure-time one (compile-shaders-fast.sh+generate-shader-cpp.py). If the configure-time compilation succeeds, the build-time command won't trigger (outputs already exist). If it does trigger, it may produce different results. This dual approach is confusing and error-prone — consider consolidating into one path.Also applies to: 317-337
sdk/runanywhere-commons/CMakeLists.txt-41-43 (1)
41-43:⚠️ Potential issue | 🟠 MajorC++ standard mismatch:
CMAKE_CXX_STANDARD 20(Line 42) vstarget_compile_features cxx_std_17(Line 261).Line 42 sets the project-wide default to C++20 (for Vulkan-Hpp), but Line 261 advertises
cxx_std_17as a PUBLIC compile feature onrac_commons. This means downstream consumers ofrac_commonsonly get the C++17 guarantee, while the library itself compiles at C++20. If this upgrade is intentional, thetarget_compile_featuresshould match:-target_compile_features(rac_commons PUBLIC cxx_std_17) +target_compile_features(rac_commons PUBLIC cxx_std_20)Otherwise, downstream targets that link
rac_commonsand use its headers may fail if any C++20 constructs leak into the public headers. Also note that the coding guidelines state "C++17 standard required" — if the bump to C++20 is permanent, update the guidelines accordingly.Also applies to: 261-261
sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt-293-310 (1)
293-310:⚠️ Potential issue | 🟠 MajorHardcoded
aarch64-linux-androidVulkan lib path — breaks non-arm64 ABIs.Line 303 constructs the Vulkan library path using
aarch64-linux-android, but the build supportsarmeabi-v7a,x86, andx86_64ABIs (see NEON toggle at Lines 90–97). UseANDROID_TOOLCHAIN_ROOTor mapANDROID_ABIto the correct target triple:# Map ABI to NDK target triple if(ANDROID_ABI STREQUAL "arm64-v8a") set(ANDROID_TARGET_TRIPLE "aarch64-linux-android") elseif(ANDROID_ABI STREQUAL "armeabi-v7a") set(ANDROID_TARGET_TRIPLE "arm-linux-androideabi") elseif(ANDROID_ABI STREQUAL "x86_64") set(ANDROID_TARGET_TRIPLE "x86_64-linux-android") elseif(ANDROID_ABI STREQUAL "x86") set(ANDROID_TARGET_TRIPLE "i686-linux-android") endif()sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt-54-55 (1)
54-55:⚠️ Potential issue | 🟠 MajorHardcoded Linux host paths — macOS developers cannot build.
Lines 54–55 hardcode
/usr/bin/gccand/usr/bin/g++, and Lines 69–73 only handleWIN32vs Linux prebuilt paths. macOS (Darwin) is a common Android dev host but has noelsebranch here. Similarly, Line 162 hardcodesg++for the shader generator tool, and Lines 178/180/319 assumelinux-x86_64for the NDKglslcpath.Consider detecting the host OS and setting paths accordingly:
if(CMAKE_HOST_SYSTEM_NAME STREQUAL "Darwin") set(HOST_PREBUILT_TAG "darwin-x86_64") elseif(CMAKE_HOST_SYSTEM_NAME STREQUAL "Linux") set(HOST_PREBUILT_TAG "linux-x86_64") elseif(CMAKE_HOST_SYSTEM_NAME STREQUAL "Windows") set(HOST_PREBUILT_TAG "windows-x86_64") endif()Then use
${HOST_PREBUILT_TAG}in all NDK tool paths, andCMAKE_CXX_COMPILER_LAUNCHERorfind_programinstead of hardcoding/usr/bin/g++.Also applies to: 69-73
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp-265-271 (1)
265-271:⚠️ Potential issue | 🟠 MajorCPU fallback
llama_model_load_from_fileis not wrapped in try/catch.The GPU attempt at Line 252 is wrapped in a try/catch, but the CPU fallback at Line 270 is not. If the fallback also throws (e.g., corrupted model file, OOM), the exception will propagate uncaught through JNI, crashing the app.
🛡️ Proposed fix
if (!model_ && gpu_attempted) { GPU_LOG("GPU LOAD FAILED - falling back to CPU"); LOGE("GPU model loading failed, falling back to CPU"); model_params.n_gpu_layers = 0; backend_->disable_gpu(); - model_ = llama_model_load_from_file(model_path.c_str(), model_params); + try { + model_ = llama_model_load_from_file(model_path.c_str(), model_params); + } catch (const std::exception& e) { + GPU_LOG("CPU fallback EXCEPTION: %s", e.what()); + LOGE("CPU fallback exception: %s", e.what()); + model_ = nullptr; + } catch (...) { + GPU_LOG("CPU fallback UNKNOWN EXCEPTION"); + LOGE("Unknown exception during CPU fallback"); + model_ = nullptr; + } }sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp-80-82 (1)
80-82:⚠️ Potential issue | 🟠 MajorHand-rolled Vulkan struct definitions with hardcoded
sTypemagic numbers are fragile and hard to maintain.These anonymous structs replicate
VkApplicationInfo(sType=0) andVkInstanceCreateInfo(sType=1) with assumed member order and padding. While this works on common ABIs, any Vulkan header change or unusual platform alignment could silently break the probe. Sincevulkan_detector.cppalready includes<vulkan/vulkan.h>and performs the same probing more robustly, consider either:
- Reusing
VulkanDetector::is_vulkan_supported()(already available), or- If you need the dlopen approach,
#include <vulkan/vulkan.h>conditionally and use proper struct types.sdk/runanywhere-commons/src/jni/device_jni.cpp-99-161 (1)
99-161:⚠️ Potential issue | 🟠 MajorMissing null checks for JNI class/method lookups in
listVulkanDevices— potential crash.
array_list_constructor,array_list_add(Line 114–115) andgpu_info_class,gpu_info_constructor(Line 121–126) are used without null checks. If the GPUInfo class is stripped by ProGuard/R8 in a release build, this will dereference null and crash. ThedetectVulkanGPUfunction correctly checks these — apply the same pattern here.🛡️ Proposed fix (partial — around GPUInfo lookup)
// Find GPUInfo class jclass gpu_info_class = env->FindClass("com/runanywhere/sdk/platform/GPUInfo"); + if (gpu_info_class == nullptr) { + RAC_LOG_ERROR("DeviceJNI", "Failed to find GPUInfo class"); + env->DeleteLocalRef(array_list_class); + return nullptr; + } jmethodID gpu_info_constructor = env->GetMethodID( gpu_info_class, "<init>", "(ZLjava/lang/String;Ljava/lang/String;Ljava/lang/String;JZ)V" ); + if (gpu_info_constructor == nullptr) { + RAC_LOG_ERROR("DeviceJNI", "Failed to find GPUInfo constructor"); + env->DeleteLocalRef(gpu_info_class); + env->DeleteLocalRef(array_list_class); + return nullptr; + }
🟡 Minor comments (16)
VULKAN_TEST_RESULTS.md-54-58 (1)
54-58:⚠️ Potential issue | 🟡 MinorAdd language specifiers to fenced code blocks for better rendering.
Lines 54 and 70 have fenced code blocks without language specifiers. Since these contain crash logs/output, a
textorlogspecifier would satisfy markdownlint (MD040) and improve syntax highlighting.-``` +```text Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0scripts/test-vulkan-on-device.sh-55-65 (1)
55-65:⚠️ Potential issue | 🟡 MinorCrash detection only checks for SIGSEGV (signal 11) — other fatal signals will be missed.
Vulkan driver issues can also cause
SIGABRT(signal 6),SIGBUS(signal 7), or other fatal signals. Broaden the pattern to catch all fatal signals.🐛 Suggested fix
# Check for crash -if adb logcat -d | grep -q "Fatal signal 11"; then +if adb logcat -d | grep -q "Fatal signal"; then echo " ❌ CRASH DETECTED" echo "" echo " Crash details:" - adb logcat -d | grep -A 5 "Fatal signal 11" | head -10 + adb logcat -d | grep -A 5 "Fatal signal" | head -15scripts/verify-vulkan-build.sh-20-25 (1)
20-25:⚠️ Potential issue | 🟡 MinorSize check is fragile —
du -houtput format varies and will break for non-MB sizes.
${SIZE%M}strips a trailing "M", but ifdu -hreturns a value in "K" (small file) or "G" (large file), the substitution leaves the unit suffix intact, causing the-ltinteger comparison to fail with a bash error (and script exit viaset -e).Use byte-based comparison instead:
🐛 Suggested fix
echo "" echo "1. Checking library size..." -SIZE=$(du -h "$LIB_PATH" | cut -f1) -echo " Library size: $SIZE" -if [ "${SIZE%M}" -lt 50 ]; then +SIZE_BYTES=$(stat --format="%s" "$LIB_PATH" 2>/dev/null || stat -f "%z" "$LIB_PATH" 2>/dev/null) +SIZE_MB=$((SIZE_BYTES / 1024 / 1024)) +echo " Library size: ${SIZE_MB}MB" +if [ "$SIZE_MB" -lt 50 ]; then echo " ❌ FAIL: Library too small (expected >50MB with Vulkan)" exit 1 fisdk/runanywhere-commons/scripts/generate-shader-aggregates.py-28-45 (1)
28-45:⚠️ Potential issue | 🟡 MinorShader filenames with hyphens or special characters produce invalid C identifiers.
shader_name = spv_file.stemis used directly as a C identifier ({shader_name}_data). SPIR-V shader files often use hyphens in names (e.g.,my-shader.spv), which are invalid in C/C++ identifiers. Sanitize the name by replacing non-alphanumeric characters with underscores.🐛 Suggested fix
+import re + +def sanitize_identifier(name): + """Replace non-alphanumeric characters with underscores for valid C identifiers.""" + return re.sub(r'[^a-zA-Z0-9_]', '_', name) + def generate_cpp_with_aggregates(vulkan_dir, output_cpp): ... for spv_file in spv_files: - shader_name = spv_file.stem + shader_name = sanitize_identifier(spv_file.stem)VULKAN_ANDROID_CONTRIBUTION.md-599-599 (1)
599-599:⚠️ Potential issue | 🟡 MinorDate appears to be wrong — "February 11, 2025" should likely be 2026.
Given the PR and repo activity context, this seems like a typo.
sdk/runanywhere-commons/include/rac/backends/rac_llm_llamacpp.h-54-55 (1)
54-55:⚠️ Potential issue | 🟡 MinorComment should mention Vulkan alongside Metal.
Line 54 says "Number of layers to offload to GPU (Metal on iOS/macOS)" but this PR adds Vulkan GPU offloading on Android. Update the comment to reflect both acceleration backends.
Proposed fix
- /** Number of layers to offload to GPU (Metal on iOS/macOS) */ + /** Number of layers to offload to GPU (Metal on iOS/macOS, Vulkan on Android) */ int32_t gpu_layers;sdk/runanywhere-commons/cmake/FetchVulkanHpp.cmake-92-100 (1)
92-100:⚠️ Potential issue | 🟡 MinorMissing macOS host prebuilt path for NDK Vulkan headers.
The NDK sysroot path only handles
WIN32and a Linux fallback. Developers building on macOS (common for Android development) won't get the NDK Vulkan C headers path. The macOS prebuilt directory isdarwin-x86_64.Proposed fix
if(ANDROID_NDK) - if(WIN32) + if(APPLE) + set(NDK_VULKAN_INCLUDE "${ANDROID_NDK}/toolchains/llvm/prebuilt/darwin-x86_64/sysroot/usr/include") + elseif(WIN32) set(NDK_VULKAN_INCLUDE "${ANDROID_NDK}/toolchains/llvm/prebuilt/windows-x86_64/sysroot/usr/include") else() set(NDK_VULKAN_INCLUDE "${ANDROID_NDK}/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include")sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+ModelManagement.kt-67-82 (1)
67-82:⚠️ Potential issue | 🟡 Minor
downloadSizeis assigned the value ofmemoryRequirement— semantic mismatch.The
memoryRequirementparameter is documented as "Estimated memory usage in bytes" (Line 38), but it's stored indownloadSize(Line 76). These are conceptually different: download size is the file size on disk, while memory requirement is RAM needed at runtime. This will mislead callers and downstream consumers that readdownloadSize.Either rename the parameter to
downloadSizeor add a dedicatedmemoryRequirementfield toModelInfo.VULKAN_ANDROID_CONTRIBUTION.md-227-282 (1)
227-282:⚠️ Potential issue | 🟡 MinorDuplicate "Files Modified/Created" section.
The heading "## Files Modified/Created" appears at Line 227 with content and again at Line 278 with an empty body. Remove the duplicate at Line 278.
Proposed fix
---- - -## Files Modified/Created - - - --- ## Expected Performance (If It Worked)sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+ModelManagement.jvmAndroid.kt-592-602 (1)
592-602:⚠️ Potential issue | 🟡 MinorException cause not being passed to
SDKError.download()- but proposed fix signature is incorrect.The original exception
eshould be chained as the cause when re-throwing, which the current code omits. However, the proposed fix is incorrect: passingeas the second parameter would assign it to thecodeparameter instead ofcause.The correct fix is to pass the exception using the named parameter:
Corrected fix
} catch (e: Exception) { logger.error("Failed to move archive to temp location: ${e.message}") - throw SDKError.download("Failed to prepare archive for extraction: ${e.message}") + throw SDKError.download("Failed to prepare archive for extraction: ${e.message}", cause = e) }The
SDKError.download()signature is:fun download(message: String, code: ErrorCode = ErrorCode.DOWNLOAD_FAILED, cause: Throwable? = null), so the cause must be passed as a named parameter or as the third positional argument.sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp-110-117 (1)
110-117:⚠️ Potential issue | 🟡 Minor
nativeIsRegisteredalways returnsJNI_TRUE— does not reflect actual state.The TODO-like comment says "return true if the native library is loaded," but this is misleading since the method name implies it checks service registration status. Callers relying on this to guard operations will never see
false.sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp-24-29 (1)
24-29:⚠️ Potential issue | 🟡 MinorMulti-statement macros are unsafe without
do { ... } while(0)wrapper.The non-Android
LOGi/LOGe/LOGwmacros expand to two separatefprintfstatements. When used inside an unbracedif/else, only the firstfprintfwill be conditionally executed, while the newline is always printed.🐛 Proposed fix
`#else` `#include` <cstdio> -#define LOGi(...) fprintf(stdout, "[INFO] " __VA_ARGS__); fprintf(stdout, "\n") -#define LOGe(...) fprintf(stderr, "[ERROR] " __VA_ARGS__); fprintf(stderr, "\n") -#define LOGw(...) fprintf(stdout, "[WARN] " __VA_ARGS__); fprintf(stdout, "\n") +#define LOGi(...) do { fprintf(stdout, "[INFO] " __VA_ARGS__); fprintf(stdout, "\n"); } while(0) +#define LOGe(...) do { fprintf(stderr, "[ERROR] " __VA_ARGS__); fprintf(stderr, "\n"); } while(0) +#define LOGw(...) do { fprintf(stdout, "[WARN] " __VA_ARGS__); fprintf(stdout, "\n"); } while(0) `#endif`sdk/runanywhere-commons/src/backends/llamacpp/patch_vulkan.py-43-71 (1)
43-71:⚠️ Potential issue | 🟡 MinorScript is not idempotent — repeated runs will double-prefix comment markers.
If CMake re-runs the patch step (e.g., during reconfiguration), lines already prefixed with
#will be prefixed again (# # ExternalProject_Add...). Thenot line.strip().startswith('#')guard on entry (line 52) prevents re-entering a block, but any line already commented by a prior run that happens to match a later patch pattern (e.g., Patch 6's regex, Patch 8's regex) may still get double-commented.Consider adding an early exit or sentinel check (e.g., a marker comment like
# PATCHED_BY_RAC) at the top of the file on first run.💡 Suggested idempotency guard
+SENTINEL = "# PATCHED_BY_RAC_VULKAN" + +if SENTINEL in content: + print("File already patched, skipping") + sys.exit(0) + # Patch 1: Remove COMPONENTS glslc (make glslc optional) content = content.replace(...) ... +# Add sentinel at top +content = SENTINEL + "\n" + content + with open(file_path, 'w') as f: f.write(content)sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp-122-126 (1)
122-126:⚠️ Potential issue | 🟡 MinorUpdate hardcoded version string to match VERSIONS file.
The version
"b7199"is stale. The project's VERSIONS file (the single source of truth) specifiesLLAMACPP_VERSION=b7935(Feb 4, 2025) with full Vulkan support. Both the CMakeLists.txt (line 12) and the hardcoded JNI string must be updated to match.sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp-13-19 (1)
13-19:⚠️ Potential issue | 🟡 MinorNon-Android
GPU_LOGmacro lacks a trailing newline.On Android,
__android_log_printadds a newline automatically. Thefprintf(stderr, ...)variant does not. This will produce concatenated log lines on non-Android platforms.Proposed fix
`#else` -#define GPU_LOG(...) fprintf(stderr, __VA_ARGS__) +#define GPU_LOG(...) do { fprintf(stderr, __VA_ARGS__); fprintf(stderr, "\n"); } while(0) `#endif`sdk/runanywhere-commons/src/jni/device_jni.cpp-19-76 (1)
19-76:⚠️ Potential issue | 🟡 MinorMissing null checks on
NewStringUTFresults before passing toNewObject.If
NewStringUTFreturnsnullptr(OOM), a pending exception is set, and the subsequentNewObjectcall is undefined behavior per JNI spec. Check for null / pending exceptions after the string allocations.🛡️ Proposed fix
// Convert C++ strings to Java strings jstring device_name = env->NewStringUTF(info.device_name.c_str()); jstring driver_version = env->NewStringUTF(info.driver_version.c_str()); jstring api_version = env->NewStringUTF( runanywhere::VulkanDetector::get_version_string(info.api_version).c_str() ); + + if (!device_name || !driver_version || !api_version) { + RAC_LOG_ERROR("DeviceJNI", "Failed to allocate JNI strings"); + if (device_name) env->DeleteLocalRef(device_name); + if (driver_version) env->DeleteLocalRef(driver_version); + if (api_version) env->DeleteLocalRef(api_version); + env->DeleteLocalRef(gpu_info_class); + return nullptr; + }
🧹 Nitpick comments (22)
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeLLM.kt (3)
585-590: Listener exceptions are swallowed with insufficient logging.Multiple locations catch exceptions from
llmListenercallbacks and either completely ignore them (lines 585-590, 656-659) or log at WARN level (lines 619-625, 841-849, 1151-1159). While preventing listener bugs from breaking core functionality is correct, silent failures make debugging difficult for SDK users.Recommended approach:
- Log all listener exceptions at ERROR level (not WARN or silent)
- Include the full exception stack trace in the log
- Consider adding a global error handler that SDK users can override for monitoring
Example:
try { llmListener?.onModelLoaded(modelId, modelPath) } catch (e: Exception) { CppBridgePlatformAdapter.logCallback( CppBridgePlatformAdapter.LogLevel.ERROR, // Changed from WARN TAG, "Error in LLM listener onModelLoaded: ${e.message}\n${e.stackTraceToString()}" ) }Also applies to: 619-625, 656-659, 678-682, 722-726, 767-771, 841-849, 1151-1159
40-132: Consider using sealed classes or enums instead of Int constants for type safety.The state constants (LLMState, GenerationMode, StopReason) use Int values, which allows invalid states to be set and reduces type safety. While this matches the C++ API, the Kotlin layer can provide a type-safe wrapper.
Recommended approach:
sealed class LLMState(val value: Int) { object NotCreated : LLMState(0) object Created : LLMState(1) object Loading : LLMState(2) object Ready : LLMState(3) object Generating : LLMState(4) object Unloading : LLMState(5) object Error : LLMState(6) fun isReady(): Boolean = this is Ready companion object { fun fromInt(value: Int): LLMState = when(value) { 0 -> NotCreated 1 -> Created // ... etc else -> Error } } }This provides compile-time safety while maintaining C++ interop through the
valueproperty.As per coding guidelines: "Use sealed classes for type-safe error handling in Kotlin code" and "Always use data classes and structured types (enums, sealed classes) instead of strings for models and configuration."
605-616: Direct coupling to CppBridgeModelAssignment and CppBridgeState violates ModuleRegistry pattern.The code directly calls
CppBridgeModelAssignment.setAssignmentStatusCallback()andCppBridgeState.setComponentStateCallback()(lines 605-616, 828-839), creating tight coupling. This makes the component difficult to test in isolation and limits extensibility.Recommended approach:
- Use EventBus pattern to publish state change events
- Let interested components subscribe to these events
- Remove direct dependencies on CppBridgeModelAssignment and CppBridgeState
Example:
// After model loads EventBus.publish( ModelLoadedEvent( componentType = ComponentType.LLM, modelId = modelId, modelPath = modelPath, status = AssignmentStatus.READY ) )As per coding guidelines: "Use the ModuleRegistry provider pattern for extensibility and plugin architecture instead of hard-coded dependencies" and "Use EventBus for component communication instead of direct coupling between components."
Also applies to: 828-839
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeDevice.kt (3)
314-320: Swallowed exception loses diagnostic context when Supabase URL retrieval fails.Detekt flags this
catchblock. IfracDevConfigGetSupabaseUrl()throws, the exception is silently consumed andbaseUrlbecomesnull, eventually leading to the "No base URL configured" error at Line 383 with no clue about the root cause. Log the exception at DEBUG.♻️ Suggested fix
} catch (e: Exception) { - null + CppBridgePlatformAdapter.logCallback( + CppBridgePlatformAdapter.LogLevel.DEBUG, + TAG, + "Failed to get Supabase URL: ${e.message}", + ) + null }
1274-1281:escapeJsondoesn't handle all JSON-special characters.The function misses Unicode control characters (U+0000–U+001F besides
\n,\r,\t) which the JSON spec requires to be escaped. For device info strings this is low-risk, but a malicious or unusual device name could produce invalid JSON.
536-579: Manual JSON construction is fragile — consider using a JSON library.Building JSON via string concatenation with a custom
escapeJsonis error-prone (e.g., trailing commas if fields are conditionally included, incomplete escaping). Iforg.jsonorkotlinx.serializationis already a dependency, it would be safer and more maintainable.sdk/runanywhere-commons/scripts/compile-all-vulkan-shaders.sh (2)
28-31: Dead code — this check is unreachable.Lines 16–19 already exit if
HOST_GLSLCis empty. If execution reaches Line 28,HOST_GLSLCis guaranteed to be a valid path, making this block unreachable.♻️ Remove redundant check
-if [ ! -f "$HOST_GLSLC" ] && [ ! -x "$(command -v glslc)" ]; then - echo "Error: glslc not found. Please install Vulkan SDK." - exit 1 -fi
14-19: Prefercommand -voverwhichfor POSIX portability.
whichis not guaranteed on all systems and can behave inconsistently.command -vis the POSIX-standard way to locate executables.♻️ Suggested fix
-HOST_GLSLC=$(which glslc) +HOST_GLSLC=$(command -v glslc || true) if [ -z "$HOST_GLSLC" ]; then echo "Error: glslc not found in PATH. Please install Vulkan SDK." exit 1 fiscripts/test-vulkan-on-device.sh (1)
51-77: Logcat is read three separate times — capture once to a temp file.Lines 56, 60, and 72 each invoke
adb logcat -dseparately, which is slow over USB and could yield inconsistent results if logs are rotating. Capture once to a temp file.♻️ Suggested improvement
+LOGCAT_FILE=$(mktemp) +trap "rm -f $LOGCAT_FILE" EXIT adb logcat -c adb shell am start -n com.runanywhere.runanywhereai.debug/com.runanywhere.runanywhereai.MainActivity > /dev/null 2>&1 sleep 5 +adb logcat -d > "$LOGCAT_FILE" # Check for crash -if adb logcat -d | grep -q "Fatal signal 11"; then +if grep -q "Fatal signal" "$LOGCAT_FILE"; then echo " ❌ CRASH DETECTED" echo "" echo " Crash details:" - adb logcat -d | grep -A 5 "Fatal signal 11" | head -10 + grep -A 5 "Fatal signal" "$LOGCAT_FILE" | head -15sdk/runanywhere-commons/scripts/generate-shader-aggregates.py (1)
56-93: Duplicate code — data and len array generation are structurally identical.The two nested loop blocks (Lines 56–73 and 76–93) differ only in the array type and the
_data/_lensuffix. Extract a helper to reduce duplication.sdk/runanywhere-commons/include/rac/infrastructure/device/vulkan_detector.h (2)
13-21: Public symbols lack therac_prefix required by coding guidelines.
VulkanDeviceInfoandVulkanDetectorare exported viaRAC_APIfrom theinclude/rac/path but don't follow therac_naming convention. While therunanywherenamespace provides scoping, the coding guidelines state all public symbols must be prefixed withrac_. Consider renaming torac_VulkanDeviceInfo/RacVulkanDeviceInfo(or equivalent) to stay consistent with the rest of the public API surface.As per coding guidelines: "All public symbols must be prefixed with
rac_(RunAnywhere Commons)".
29-55: Public API header should document lifecycle and thread-safety.The
VulkanDetectormethods (detect(),list_devices()) create and destroy Vulkan instances internally, which is expensive. Callers should know:
- Whether results can be cached
- Thread-safety of static methods (safe to call from multiple threads?)
- On Android,
detect()forks a child process — this has significant implications (e.g., not safe in multi-threaded contexts afterfork)A brief note in the class-level doc would suffice.
As per coding guidelines: "Public C API headers in include/rac/ must document vtable operations, error codes, and lifecycle requirements".
sdk/runanywhere-commons/cmake/FetchVulkanHpp.cmake (1)
51-76: No integrity verification for downloaded headers — supply chain risk.The headers are downloaded over HTTPS from GitHub without any checksum validation. A compromised CDN or MITM could inject malicious code into the Vulkan C++ headers. Consider adding
EXPECTED_HASH SHA256=<hash>to thefile(DOWNLOAD ...)calls, or at minimum document why this was deemed acceptable.sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+ModelManagement.jvmAndroid.kt (1)
961-1058: Replace hand-rolled regex JSON parsing with kotlinx.serialization.The custom
parseModelAssignmentsJson()parser using regex is fragile and will fail on escaped quotes in values, nested objects/arrays, null values, decimal/negative numbers, and whitespace variations. Thesplit("},\\s*\\{")approach misunderstands JSON structure when field values contain}or{.Since
kotlinx-serialization-jsonis already a dependency in the project, define a data class for the model assignment JSON structure and useJson.decodeFromString()instead. This will handle all standard JSON edge cases correctly.Current implementation pattern
private fun parseModelAssignmentsJson(json: String): List<ModelInfo> { // ... regex-based field extraction val id = extractJsonString(jsonObj, "id") ?: continue val name = extractJsonString(jsonObj, "name") ?: id // ... more regex extractions }sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.h (2)
88-90:disable_gpu()andis_using_gpu()are not protected bymutex_.While
use_gpu_is written bydisable_gpu()and read byis_using_gpu()/get_device_type(), none of these acquiremutex_. Currently safe becausedisable_gpu()is only called during initialization before concurrent generation begins, but this is a fragile implicit contract. If the fallback logic ever runs concurrently (e.g., mid-generation retry), this becomes a data race.Consider either making
use_gpu_anstd::atomic<bool>or documenting the single-threaded precondition.♻️ Minimal fix: use atomic
- bool is_using_gpu() const { return use_gpu_; } - std::string get_gpu_device_name() const { return gpu_device_name_; } - void disable_gpu() { use_gpu_ = false; } // Disable GPU after failure + bool is_using_gpu() const { return use_gpu_.load(std::memory_order_acquire); } + std::string get_gpu_device_name() const { return gpu_device_name_; } + void disable_gpu() { use_gpu_.store(false, std::memory_order_release); }- bool use_gpu_ = false; + std::atomic<bool> use_gpu_{false};
27-32:DeviceType::GPUis ambiguous — consider addingVULKANfor clarity.The enum jumps from
GPU=1toMETAL=3(no value 2), andGPUis used to represent the Vulkan backend path (perget_device_type()in the .cpp). Since Metal and CUDA are explicit accelerator types, addingVULKAN=2(or aliasingGPUtoVULKAN) would make the intent clearer and future-proof the enum.sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+DeviceInfo.kt (1)
38-43: TwoGPUInfoclasses with different fields exist — naming collision risk.This public
GPUInfoincom.runanywhere.sdk.public.extensionshas 4 properties, while the platform-internalGPUInfoincom.runanywhere.sdk.platformhas 6 (addsdriverVersionandsupportsCompute). Having two identically-named data classes in different packages will cause import confusion for developers and within the SDK itself. Consider renaming the platform-internal one (e.g.,PlatformGPUInfoorVulkanGPUInfo) to disambiguate.sdk/runanywhere-commons/src/backends/llamacpp/patch_vulkan.py (1)
109-136: Patch 7's foreach exit condition relies onendforeachstring match, ignoring nesting depth.At line 129, the block exit is triggered by the presence of the string
endforeachregardless of the trackedforeach_depth. If there were nestedforeachloops (unlikely but possible), this would exit prematurely. Since you're already tracking depth, consider usingforeach_depth <= 0as the exit condition (similar to Patches 4 and 5), withendforeachas an additional safety check.♻️ Suggested fix
# Check if we've closed all parentheses (endforeach) - if 'endforeach' in line: + if 'endforeach' in line or foreach_depth <= 0: in_foreach = False foreach_depth = 0sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/platform/DeviceCapabilities.kt (1)
33-34: Thread safety:cachedGPUInfo/cacheInitializedare accessed without synchronization.On Android,
detectVulkanGPU()andclearCache()can be called from different threads (e.g., main thread + coroutine dispatcher). Without@Volatileorsynchronized, a JVM thread may see stale values due to memory visibility. Consider using@Volatileon both fields, or wrapping access in asynchronized(this)block.Also applies to: 46-49, 119-122
sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt (1)
296-300: Directly mutatingLINK_LIBRARIEStarget property is fragile.Removing
Vulkan::VulkanfromLINK_LIBRARIESviaget_target_property/set_target_propertiesbypasses CMake's dependency tracking and can break in future CMake versions. A safer alternative is to use an interface library or override theVulkan_LIBRARYvariable beforefind_package(Vulkan)is called by llama.cpp.sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp (2)
230-247: Android always forcesn_gpu_layers=0— Vulkan GPU acceleration is a complete no-op on Android.The
#ifdef __ANDROID__block unconditionally setsmodel_params.n_gpu_layers = 0, overriding any probe or configuration result. This means the extensive Vulkan integration in this PR (shader compilation, detection, JNI bridge) is never actually exercised at runtime on Android.Given the PR notes that crashes affect ~70–80% of devices, this safety measure is understandable for the experimental status. However, consider gating this on a runtime flag (e.g.,
config["force_cpu"]) rather than a blanket compile-time guard, so the opt-in path for testing on known-good devices remains accessible.`#ifdef` __ANDROID__ - GPU_LOG("Android detected - forcing CPU mode (n_gpu_layers=0) to avoid Vulkan crash"); - model_params.n_gpu_layers = 0; - LOGI("Loading model with CPU backend (Android Vulkan workaround)"); + bool force_cpu = !backend_->is_using_gpu() || !config.value("enable_vulkan", false); + if (force_cpu) { + GPU_LOG("Android: CPU mode (Vulkan not requested or disabled)"); + model_params.n_gpu_layers = 0; + } else { + gpu_attempted = true; + model_params.n_gpu_layers = 999; + GPU_LOG("Android: Vulkan GPU mode (n_gpu_layers=999)"); + } `#else`
56-107:probe_vulkan_safe()duplicates logic already inVulkanDetector::is_vulkan_supported()with less safety.The
VulkanDetector(invulkan_detector.cpp) already implements Vulkan availability checks, including a fork-based safe detection on Android and proper use of Vulkan headers. This probe function reimplements it using hand-rolled structs viadlopen. Consider delegating to the detector to avoid maintaining two separate probing code paths.
| static bool test_vulkan_in_child_process() { | ||
| RAC_LOG_INFO("VulkanDetector", "Testing Vulkan in child process (safe mode)..."); | ||
|
|
||
| pid_t pid = fork(); | ||
|
|
||
| if (pid == -1) { | ||
| RAC_LOG_ERROR("VulkanDetector", "Fork failed, assuming Vulkan unavailable"); | ||
| return false; | ||
| } | ||
|
|
||
| if (pid == 0) { | ||
| // Child process - test Vulkan | ||
| // If driver crashes here, only child dies | ||
|
|
||
| #ifdef GGML_USE_VULKAN | ||
| VkInstance instance = VK_NULL_HANDLE; | ||
| VkInstanceCreateInfo create_info = {}; | ||
| create_info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO; | ||
|
|
||
| VkResult result = vkCreateInstance(&create_info, nullptr, &instance); | ||
| if (result != VK_SUCCESS) { | ||
| _exit(1); // Failed | ||
| } | ||
|
|
||
| uint32_t device_count = 0; | ||
| vkEnumeratePhysicalDevices(instance, &device_count, nullptr); | ||
|
|
||
| if (device_count == 0) { | ||
| vkDestroyInstance(instance, nullptr); | ||
| _exit(1); // No devices | ||
| } | ||
|
|
||
| // Test actual device creation (this is where buggy drivers crash) | ||
| std::vector<VkPhysicalDevice> devices(device_count); | ||
| vkEnumeratePhysicalDevices(instance, &device_count, devices.data()); | ||
|
|
||
| // Try to create a logical device (fence creation happens here) | ||
| VkDeviceQueueCreateInfo queue_info = {}; | ||
| queue_info.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO; | ||
| queue_info.queueFamilyIndex = 0; | ||
| queue_info.queueCount = 1; | ||
| float priority = 1.0f; | ||
| queue_info.pQueuePriorities = &priority; | ||
|
|
||
| VkDeviceCreateInfo device_info = {}; | ||
| device_info.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO; | ||
| device_info.queueCreateInfoCount = 1; | ||
| device_info.pQueueCreateInfos = &queue_info; | ||
|
|
||
| VkDevice device = VK_NULL_HANDLE; | ||
| result = vkCreateDevice(devices[0], &device_info, nullptr, &device); | ||
|
|
||
| if (device != VK_NULL_HANDLE) { | ||
| vkDestroyDevice(device, nullptr); | ||
| } | ||
|
|
||
| vkDestroyInstance(instance, nullptr); | ||
|
|
||
| // If we got here without crashing, Vulkan works! | ||
| _exit(result == VK_SUCCESS ? 0 : 1); | ||
| #else | ||
| _exit(1); // Vulkan not compiled | ||
| #endif | ||
| } | ||
|
|
||
| // Parent process - wait for child | ||
| int status; | ||
| pid_t result = waitpid(pid, &status, 0); | ||
|
|
||
| if (result == -1) { | ||
| RAC_LOG_ERROR("VulkanDetector", "waitpid failed"); | ||
| return false; | ||
| } | ||
|
|
||
| if (WIFEXITED(status)) { | ||
| int exit_code = WEXITSTATUS(status); | ||
| if (exit_code == 0) { | ||
| RAC_LOG_INFO("VulkanDetector", "✅ Vulkan test passed in child process"); | ||
| return true; | ||
| } else { | ||
| RAC_LOG_WARNING("VulkanDetector", "❌ Vulkan test failed in child (exit code: %d)", exit_code); | ||
| return false; | ||
| } | ||
| } | ||
|
|
||
| if (WIFSIGNALED(status)) { | ||
| int signal = WTERMSIG(status); | ||
| RAC_LOG_ERROR("VulkanDetector", "❌ Child process crashed with signal %d (SIGSEGV=%d) - BUGGY DRIVER DETECTED!", | ||
| signal, SIGSEGV); | ||
| RAC_LOG_ERROR("VulkanDetector", "This device has a buggy Vulkan driver. Using CPU fallback."); | ||
| return false; | ||
| } | ||
|
|
||
| return false; | ||
| } | ||
| #endif |
There was a problem hiding this comment.
fork() is unsafe in Android's multi-threaded JNI environment.
On Android, calling fork() from a JNI context is dangerous because the ART runtime has many internal threads (GC, finalizer, JIT compiler, binder). After fork(), only the calling thread survives — any mutex held by other threads (including the heap allocator lock) becomes permanently locked. The child process then calls std::vector allocation (Line 64), Vulkan instance creation, and other non-async-signal-safe functions, which can deadlock.
A safer alternative would be to use a separate process via android.os.Process or Runtime.exec() from the Java/Kotlin layer, or use posix_spawn with POSIX_SPAWN_USEVFORK which has fewer issues. Alternatively, use vkEnumerateInstanceVersion() as a lightweight probe before attempting full initialization — it doesn't require instance creation and is less likely to crash.
🤖 Prompt for AI Agents
In `@sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp` around
lines 31 - 126, The use of fork() inside test_vulkan_in_child_process is unsafe
on Android JNI and can deadlock when the child uses non-async-signal-safe calls
(e.g., std::vector allocations, vkCreateInstance); replace the fork-based probe
with a safer approach: either move the probe into Java/Kotlin and spawn a
separate process via android.os.Process/Runtime.exec, or switch to posix_spawn
(with POSIX_SPAWN_USEVFORK) instead of fork, or simplify the probe to call
vkEnumerateInstanceVersion (a lightweight, instance-less check) from the
existing detection flow; update/remove fork(), waitpid(), child-only Vulkan code
paths in test_vulkan_in_child_process and ensure the calling code handles the
new boolean result accordingly.
| object CppBridgeLLM { | ||
| /** | ||
| * LLM component state constants matching C++ RAC_LLM_STATE_* values. | ||
| */ | ||
| object LLMState { | ||
| /** Component not created */ | ||
| const val NOT_CREATED = 0 | ||
| /** Component created but no model loaded */ | ||
| const val CREATED = 1 | ||
| /** Model is loading */ | ||
| const val LOADING = 2 | ||
| /** Model loaded and ready for inference */ | ||
| const val READY = 3 | ||
| /** Inference in progress */ | ||
| const val GENERATING = 4 | ||
| /** Model is unloading */ | ||
| const val UNLOADING = 5 | ||
| /** Component in error state */ | ||
| const val ERROR = 6 | ||
| /** | ||
| * Get a human-readable name for the LLM state. | ||
| */ | ||
| fun getName(state: Int): String = | ||
| when (state) { | ||
| NOT_CREATED -> "NOT_CREATED" | ||
| CREATED -> "CREATED" | ||
| LOADING -> "LOADING" | ||
| READY -> "READY" | ||
| GENERATING -> "GENERATING" | ||
| UNLOADING -> "UNLOADING" | ||
| ERROR -> "ERROR" | ||
| else -> "UNKNOWN($state)" | ||
| } | ||
| /** | ||
| * Check if the state indicates the component is usable. | ||
| */ | ||
| fun isReady(state: Int): Boolean = state == READY | ||
| } | ||
| /** | ||
| * LLM generation mode constants. | ||
| */ | ||
| object GenerationMode { | ||
| /** Standard completion mode */ | ||
| const val COMPLETION = 0 | ||
| /** Chat/instruction mode */ | ||
| const val CHAT = 1 | ||
| /** Fill-in-the-middle mode */ | ||
| const val INFILL = 2 | ||
| } | ||
| /** | ||
| * LLM stop reason constants. | ||
| */ | ||
| object StopReason { | ||
| /** Generation still in progress */ | ||
| const val NOT_STOPPED = 0 | ||
| /** Reached end of sequence token */ | ||
| const val EOS = 1 | ||
| /** Reached maximum token limit */ | ||
| const val MAX_TOKENS = 2 | ||
| /** Hit a stop sequence */ | ||
| const val STOP_SEQUENCE = 3 | ||
| /** Generation was cancelled */ | ||
| const val CANCELLED = 4 | ||
| /** Generation failed */ | ||
| const val ERROR = 5 | ||
| /** | ||
| * Get a human-readable name for the stop reason. | ||
| */ | ||
| fun getName(reason: Int): String = | ||
| when (reason) { | ||
| NOT_STOPPED -> "NOT_STOPPED" | ||
| EOS -> "EOS" | ||
| MAX_TOKENS -> "MAX_TOKENS" | ||
| STOP_SEQUENCE -> "STOP_SEQUENCE" | ||
| CANCELLED -> "CANCELLED" | ||
| ERROR -> "ERROR" | ||
| else -> "UNKNOWN($reason)" | ||
| } | ||
| } | ||
| @Volatile | ||
| private var isRegistered: Boolean = false | ||
| @Volatile | ||
| private var state: Int = LLMState.NOT_CREATED | ||
| @Volatile | ||
| private var handle: Long = 0 | ||
| @Volatile | ||
| private var loadedModelId: String? = null | ||
| @Volatile | ||
| private var loadedModelPath: String? = null | ||
| @Volatile | ||
| private var isCancelled: Boolean = false | ||
| @Volatile | ||
| private var isNativeLibraryLoaded: Boolean = false | ||
| private val lock = Any() | ||
| /** | ||
| * Tag for logging. | ||
| */ | ||
| private const val TAG = "CppBridgeLLM" | ||
| /** | ||
| * Check if native LLM library is available. | ||
| */ | ||
| val isNativeAvailable: Boolean | ||
| get() = isNativeLibraryLoaded | ||
| /** | ||
| * Initialize native library availability check. | ||
| * Should be called during SDK initialization. | ||
| */ | ||
| fun checkNativeLibrary() { | ||
| try { | ||
| // Try to call a simple native method to verify library is loaded | ||
| // If it throws UnsatisfiedLinkError, the library isn't available | ||
| isNativeLibraryLoaded = true | ||
| CppBridgePlatformAdapter.logCallback( | ||
| CppBridgePlatformAdapter.LogLevel.DEBUG, | ||
| TAG, | ||
| "Native LLM library check passed", | ||
| ) | ||
| } catch (e: UnsatisfiedLinkError) { | ||
| isNativeLibraryLoaded = false | ||
| CppBridgePlatformAdapter.logCallback( | ||
| CppBridgePlatformAdapter.LogLevel.WARN, | ||
| TAG, | ||
| "Native LLM library not available: ${e.message}", | ||
| ) | ||
| } | ||
| } | ||
| /** | ||
| * Singleton shared instance for accessing the LLM component. | ||
| * Matches iOS CppBridge.LLM.shared pattern. | ||
| */ | ||
| val shared: CppBridgeLLM = this | ||
| /** | ||
| * Optional listener for LLM events. | ||
| * Set this before calling [register] to receive events. | ||
| */ | ||
| @Volatile | ||
| var llmListener: LLMListener? = null | ||
| /** | ||
| * Optional streaming callback for token-by-token generation. | ||
| * This is invoked for each generated token during streaming. | ||
| */ | ||
| @Volatile | ||
| var streamCallback: StreamCallback? = null | ||
| /** | ||
| * LLM generation configuration. | ||
| * | ||
| * @param maxTokens Maximum number of tokens to generate | ||
| * @param temperature Sampling temperature (0.0 to 2.0) | ||
| * @param topP Top-p (nucleus) sampling parameter | ||
| * @param topK Top-k sampling parameter | ||
| * @param repeatPenalty Penalty for repeating tokens | ||
| * @param stopSequences List of sequences that stop generation | ||
| * @param seed Random seed for reproducibility (-1 for random) | ||
| */ | ||
| data class GenerationConfig( | ||
| val maxTokens: Int = 512, | ||
| val temperature: Float = 0.7f, | ||
| val topP: Float = 0.9f, | ||
| val topK: Int = 40, | ||
| val repeatPenalty: Float = 1.1f, | ||
| val stopSequences: List<String> = emptyList(), | ||
| val seed: Long = -1, | ||
| ) { | ||
| /** | ||
| * Convert to JSON string for C++ interop. | ||
| */ | ||
| fun toJson(): String { | ||
| return buildString { | ||
| append("{") | ||
| append("\"max_tokens\":$maxTokens,") | ||
| append("\"temperature\":$temperature,") | ||
| append("\"top_p\":$topP,") | ||
| append("\"top_k\":$topK,") | ||
| append("\"repeat_penalty\":$repeatPenalty,") | ||
| append("\"stop_sequences\":[") | ||
| stopSequences.forEachIndexed { index, seq -> | ||
| if (index > 0) append(",") | ||
| append("\"${escapeJson(seq)}\"") | ||
| } | ||
| append("],") | ||
| append("\"seed\":$seed") | ||
| append("}") | ||
| } | ||
| } | ||
| companion object { | ||
| /** Default configuration */ | ||
| val DEFAULT = GenerationConfig() | ||
| } | ||
| } | ||
| /** | ||
| * LLM model configuration. | ||
| * | ||
| * @param contextLength Context window size in tokens | ||
| * @param gpuLayers Number of layers to offload to GPU (-1 for auto) | ||
| * @param threads Number of threads for inference (-1 for auto) | ||
| * @param batchSize Batch size for prompt processing | ||
| * @param useMemoryMap Whether to use memory-mapped loading | ||
| * @param useLocking Whether to use file locking | ||
| */ | ||
| data class ModelConfig( | ||
| val contextLength: Int = 4096, | ||
| val gpuLayers: Int = -1, | ||
| val threads: Int = -1, | ||
| val batchSize: Int = 512, | ||
| val useMemoryMap: Boolean = true, | ||
| val useLocking: Boolean = false, | ||
| ) { | ||
| /** | ||
| * Convert to JSON string for C++ interop. | ||
| */ | ||
| fun toJson(): String { | ||
| return buildString { | ||
| append("{") | ||
| append("\"context_length\":$contextLength,") | ||
| append("\"gpu_layers\":$gpuLayers,") | ||
| append("\"threads\":$threads,") | ||
| append("\"batch_size\":$batchSize,") | ||
| append("\"use_memory_map\":$useMemoryMap,") | ||
| append("\"use_locking\":$useLocking") | ||
| append("}") | ||
| } | ||
| } | ||
| companion object { | ||
| /** Default configuration */ | ||
| val DEFAULT = ModelConfig() | ||
| } | ||
| } |
There was a problem hiding this comment.
Major architectural violation: Business logic must be in commonMain/, not jvmAndroidMain/.
This file contains significant cross-platform business logic (state management, lifecycle semantics, data classes for GenerationConfig/ModelConfig/GenerationResult, interfaces) in a platform-specific source set. As per coding guidelines, all business logic, protocols, interfaces, and structures must be defined in commonMain/. Only platform-specific JNI marshaling should remain in jvmAndroidMain/.
Recommended refactoring:
- Move business logic to
commonMain/: LLMState, GenerationMode, StopReason, data classes (GenerationConfig, ModelConfig, GenerationResult), interfaces (LLMListener, StreamCallback), and lifecycle semantics - Use
expect/actualpattern: Defineexpectdeclarations incommonMain/for native interop - Keep only JNI-specific marshaling in
jvmAndroidMain/: JNI method calls, Java type conversions,@JvmStaticcallbacks
This separation ensures the business logic is testable, reusable across platforms, and follows the iOS CppBridge+LLM.swift architecture mentioned in the comments.
As per coding guidelines: "All business logic, protocols, interfaces, and structures MUST be defined in commonMain/ for Kotlin Multiplatform projects" and "Use expect/actual ONLY for platform-specific implementations, not for business logic."
🧰 Tools
🪛 detekt (1.23.8)
[warning] 84-84: The caught exception is swallowed. The original exception could be lost.
(detekt.exceptions.SwallowedException)
🤖 Prompt for AI Agents
In
`@sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeLLM.kt`
around lines 36 - 298, This file places core business logic in a platform source
set; move all non-JNI business logic into commonMain: relocate CppBridgeLLM's
state/type definitions (LLMState, GenerationMode, StopReason), data classes
(GenerationConfig, ModelConfig, GenerationResult), and any interfaces/protocols
(LLMListener, StreamCallback) plus lifecycle semantics (isRegistered, state,
loadedModelId, checkNativeLibrary, shared) into commonMain; leave only
JNI/platform-specific glue in jvmAndroidMain by converting native interop points
here into actual implementations of expect declarations you define in commonMain
(use expect for functions called from common code and implement them in this
file), and remove any non-platform logic from jvmAndroidMain so it contains only
JNI marshaling, Java/Kotlin type conversions, and `@JvmStatic` callback adapters.
| @JvmStatic | ||
| private external fun nativeDetectVulkanGPU(): GPUInfo | ||
|
|
||
| /** | ||
| * Native method to check Vulkan support | ||
| * Implemented in: sdk/runanywhere-commons/src/jni/device_jni.cpp | ||
| */ | ||
| @JvmStatic | ||
| private external fun nativeIsVulkanSupported(): Boolean | ||
|
|
||
| /** | ||
| * Native method to list all Vulkan devices | ||
| * Implemented in: sdk/runanywhere-commons/src/jni/device_jni.cpp | ||
| */ | ||
| @JvmStatic | ||
| private external fun nativeListVulkanDevices(): List<GPUInfo> |
There was a problem hiding this comment.
Critical: JNI method names don't match between Kotlin and C++ — native calls will always fail.
The Kotlin native methods are named nativeDetectVulkanGPU, nativeIsVulkanSupported, and nativeListVulkanDevices, but the corresponding JNI functions in device_jni.cpp are named detectVulkanGPU, isVulkanSupported, and listVulkanDevices (without the native prefix). JNI resolution is based on exact mangled names, so these will never bind and every call will throw UnsatisfiedLinkError, caught silently by the try/catch blocks.
Either rename the Kotlin externals to drop the native prefix, or rename the C++ JNI functions to include it.
Option A: Fix Kotlin side
`@JvmStatic`
- private external fun nativeDetectVulkanGPU(): GPUInfo
+ private external fun detectVulkanGPU(): GPUInfo
`@JvmStatic`
- private external fun nativeIsVulkanSupported(): Boolean
+ private external fun isVulkanSupported(): Boolean
`@JvmStatic`
- private external fun nativeListVulkanDevices(): List<GPUInfo>
+ private external fun listVulkanDevices(): List<GPUInfo>Then update calling sites (lines 52, 83, 103) accordingly.
Option B: Fix C++ side (device_jni.cpp)
-Java_com_runanywhere_sdk_platform_DeviceCapabilities_detectVulkanGPU(
+Java_com_runanywhere_sdk_platform_DeviceCapabilities_nativeDetectVulkanGPU(
-Java_com_runanywhere_sdk_platform_DeviceCapabilities_isVulkanSupported(
+Java_com_runanywhere_sdk_platform_DeviceCapabilities_nativeIsVulkanSupported(
-Java_com_runanywhere_sdk_platform_DeviceCapabilities_listVulkanDevices(
+Java_com_runanywhere_sdk_platform_DeviceCapabilities_nativeListVulkanDevices(🤖 Prompt for AI Agents
In
`@sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/platform/DeviceCapabilities.kt`
around lines 143 - 158, The external Kotlin JNI names include a "native" prefix
that doesn't match the C++ JNI functions; rename the Kotlin externals
nativeDetectVulkanGPU, nativeIsVulkanSupported, and nativeListVulkanDevices to
detectVulkanGPU, isVulkanSupported, and listVulkanDevices respectively (and
update all call sites that invoke those symbols) so the mangled JNI names match
the C++ implementations; verify the Kotlin method signatures (return types and
parameters for GPUInfo and List<GPUInfo>) match the C++ JNI signatures after
renaming.
Resolved conflicts: - CMakeLists.txt: Merged Vulkan C++20 requirement + vulkan_detector.cpp with upstream VLM/Diffusion features - VERSIONS: Kept Vulkan-specific values (ANDROID_MIN_SDK=26, LLAMACPP_VERSION=b7935) - Backend files: Accepted upstream changes (Vulkan logic in main CMakeLists, not backend) All Vulkan files preserved: - vulkan_detector.cpp/h - device_jni.cpp - GPUInfo.kt, DeviceCapabilities.kt - FetchVulkanHpp.cmake - Documentation and test scripts
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@sdk/runanywhere-commons/CMakeLists.txt`:
- Around line 41-42: The CMakeLists currently sets a global C++20 via
CMAKE_CXX_STANDARD while also declaring target_compile_features(rac_commons
PUBLIC cxx_std_17), causing a standard mismatch and unnecessarily forcing C++20
for all consumers; change the global standard back to C++17 (ensure
CMAKE_CXX_STANDARD is 17) and remove/align any conflicting public cxx_std
settings on rac_commons, then scope C++20 only to the Vulkan-specific targets
(e.g., the Vulkan-Hpp or llamacpp/vulkan backend target) by adding
target_compile_features(<vulkan_target> PRIVATE cxx_std_20) or guarding that
target compile feature behind the Android/Vulkan build option so other platforms
remain C++17.
- Line 165: The new vulkan_detector.cpp should not be unconditionally added to
RAC_INFRASTRUCTURE_SOURCES; wrap its inclusion in a CMake conditional that
checks the RAC_BUILD_VULKAN option and restricts platforms to those that support
Vulkan (e.g., non-Apple UNIX/Windows). Update the CMakeLists.txt so that
vulkan_detector.cpp is appended to RAC_INFRASTRUCTURE_SOURCES only when
RAC_BUILD_VULKAN is enabled and the target platform permits Vulkan (use a
condition like if(RAC_BUILD_VULKAN AND (WIN32 OR (UNIX AND NOT APPLE))) before
adding the file).
In `@sdk/runanywhere-commons/VERSIONS`:
- Around line 72-75: Update the release date in the VERSIONS comment: change the
text "LATEST release (Feb 4, 2025)" to "LATEST release (Feb 4, 2026)" in the
block that documents LLAMACPP_VERSION=b7935 so the comment accurately reflects
the release date.
- Around line 25-26: The ANDROID_MIN_SDK change in VERSIONS raises the native
C++ build minimum to 26 but Gradle APKs still hardcode minSdk=24, creating an
inconsistency; to fix, either (A) introduce a separate variable
ANDROID_MIN_SDK_VULKAN=26 in VERSIONS and update
build-android.sh/load-versions.sh and the CMake invocation to use
ANDROID_MIN_SDK_VULKAN for Vulkan/native builds only (keeping ANDROID_MIN_SDK
for general use), or (B) update all build.gradle and build.gradle.kts modules to
set minSdk to 26 so Gradle and native builds match; alternatively, if you intend
to keep the mismatch, add a clear doc comment in VERSIONS explaining the runtime
Vulkan detection strategy and why native libs target API26 while APKs remain 24.
| # C++20 required for Vulkan-Hpp and modern llama.cpp features | ||
| set(CMAKE_CXX_STANDARD 20) |
There was a problem hiding this comment.
C++ standard contradiction and overly broad C++20 requirement.
Two issues here:
-
Line 42 sets
CMAKE_CXX_STANDARD 20but line 283 still declarestarget_compile_features(rac_commons PUBLIC cxx_std_17). These should be consistent — thePUBLICcompile feature propagates to all consumers ofrac_commons, so the mismatch means downstream targets may build at C++17 while this library expects C++20. -
Bumping the entire project from C++17 to C++20 for an experimental Vulkan feature affects all platforms (iOS, macOS, Linux), not just Android. If only Vulkan-Hpp needs C++20, scope this to the Vulkan-related targets or guard it behind the Android/Vulkan build path.
Proposed fix — scope C++20 to Vulkan builds only
-# C++20 required for Vulkan-Hpp and modern llama.cpp features
-set(CMAKE_CXX_STANDARD 20)
+# C++17 standard required
+set(CMAKE_CXX_STANDARD 17)Then, in the Vulkan-specific target or the llamacpp backend CMakeLists.txt that actually uses Vulkan-Hpp:
target_compile_features(vulkan_related_target PRIVATE cxx_std_20)Based on learnings: "C++17 standard required" applies to sdk/runanywhere-commons/**/*.{cpp,hpp}.
Also applies to: 283-283
🤖 Prompt for AI Agents
In `@sdk/runanywhere-commons/CMakeLists.txt` around lines 41 - 42, The CMakeLists
currently sets a global C++20 via CMAKE_CXX_STANDARD while also declaring
target_compile_features(rac_commons PUBLIC cxx_std_17), causing a standard
mismatch and unnecessarily forcing C++20 for all consumers; change the global
standard back to C++17 (ensure CMAKE_CXX_STANDARD is 17) and remove/align any
conflicting public cxx_std settings on rac_commons, then scope C++20 only to the
Vulkan-specific targets (e.g., the Vulkan-Hpp or llamacpp/vulkan backend target)
by adding target_compile_features(<vulkan_target> PRIVATE cxx_std_20) or
guarding that target compile feature behind the Android/Vulkan build option so
other platforms remain C++17.
| src/infrastructure/telemetry/telemetry_json.cpp | ||
| src/infrastructure/telemetry/telemetry_manager.cpp | ||
| src/infrastructure/device/rac_device_manager.cpp | ||
| src/infrastructure/device/vulkan_detector.cpp |
There was a problem hiding this comment.
Vulkan detector is unconditionally compiled on all platforms — should be guarded.
vulkan_detector.cpp is added to RAC_INFRASTRUCTURE_SOURCES without any platform or feature guard. This means it compiles on iOS, macOS, and Linux builds where Vulkan detection is irrelevant (or may fail to compile if it includes Vulkan headers). Per coding guidelines, optional backends must use RAC_BUILD_* options.
Proposed fix — guard with platform and option check
src/infrastructure/device/rac_device_manager.cpp
- src/infrastructure/device/vulkan_detector.cpp
)
+
+# Vulkan detector - Android only (experimental)
+if(RAC_PLATFORM_ANDROID)
+ list(APPEND RAC_INFRASTRUCTURE_SOURCES
+ src/infrastructure/device/vulkan_detector.cpp
+ )
+endif()As per coding guidelines, CMakeLists.txt must use RAC_BUILD_* options for optional backends and provide clear dependency management.
🤖 Prompt for AI Agents
In `@sdk/runanywhere-commons/CMakeLists.txt` at line 165, The new
vulkan_detector.cpp should not be unconditionally added to
RAC_INFRASTRUCTURE_SOURCES; wrap its inclusion in a CMake conditional that
checks the RAC_BUILD_VULKAN option and restricts platforms to those that support
Vulkan (e.g., non-Apple UNIX/Windows). Update the CMakeLists.txt so that
vulkan_detector.cpp is appended to RAC_INFRASTRUCTURE_SOURCES only when
RAC_BUILD_VULKAN is enabled and the target platform permits Vulkan (use a
condition like if(RAC_BUILD_VULKAN AND (WIN32 OR (UNIX AND NOT APPLE))) before
adding the file).
| # Android minimum SDK version (API 26 = Android 8.0 for Vulkan 1.1 support) | ||
| ANDROID_MIN_SDK=26 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all files that reference ANDROID_MIN_SDK or minSdk
echo "=== Files referencing ANDROID_MIN_SDK ==="
rg 'ANDROID_MIN_SDK' --type-add 'gradle:*.gradle.kts' --type-add 'gradle2:*.gradle' -n
echo ""
echo "=== Files referencing minSdk in gradle/cmake/shell ==="
rg 'minSdk' -n -g '*.gradle.kts' -g '*.gradle' -g 'CMakeLists.txt' -g '*.cmake' -g '*.sh'
echo ""
echo "=== How VERSIONS file is sourced (look for 'source' or 'include') ==="
rg 'VERSIONS|\.\/VERSIONS|source.*VERSIONS' -n -g '*.gradle.kts' -g '*.gradle' -g 'CMakeLists.txt' -g '*.cmake' -g '*.sh'Repository: RunanywhereAI/runanywhere-sdks
Length of output: 3123
🏁 Script executed:
#!/bin/bash
# Check how VERSIONS is sourced by build scripts
echo "=== load-versions.sh content (how it exports ANDROID_MIN_SDK) ==="
sed -n '1,70p' sdk/runanywhere-commons/scripts/load-versions.sh
echo ""
echo "=== build-android.sh usage of ANDROID_MIN_SDK ==="
sed -n '200,210p' sdk/runanywhere-commons/scripts/build-android.sh
echo ""
echo "=== Check if gradle files have any property sources ==="
rg -A 2 -B 2 'minSdk' sdk/runanywhere-kotlin/build.gradle.kts | head -20Repository: RunanywhereAI/runanywhere-sdks
Length of output: 3423
🏁 Script executed:
#!/bin/bash
# Check if there's a gradle.properties or other config that might link to VERSIONS
echo "=== Look for gradle.properties or version sourcing ==="
find . -name "gradle.properties" -o -name "local.properties" | head -10
echo ""
echo "=== Check VULKAN_ANDROID_CONTRIBUTION.md for context on the change ==="
sed -n '240,260p' VULKAN_ANDROID_CONTRIBUTION.md
echo ""
echo "=== Check if build-android.sh is the primary Android build method ==="
sed -n '1,50p' sdk/runanywhere-commons/scripts/build-android.sh | head -30Repository: RunanywhereAI/runanywhere-sdks
Length of output: 2518
🏁 Script executed:
#!/bin/bash
# Check gradle.properties content for any VERSIONS sourcing
echo "=== gradle.properties files ==="
cat sdk/runanywhere-kotlin/gradle.properties
echo ""
echo "=== Check main gradle.properties ==="
cat gradle.properties
echo ""
echo "=== Verify: does build-android.sh call gradle? ==="
grep -n "gradle\|gradlew" sdk/runanywhere-commons/scripts/build-android.sh | head -20Repository: RunanywhereAI/runanywhere-sdks
Length of output: 2893
🏁 Script executed:
#!/bin/bash
# Check what build-android.sh actually builds
echo "=== build-android.sh: what does it build? ==="
sed -n '20,100p' sdk/runanywhere-commons/scripts/build-android.sh
echo ""
echo "=== Check CMakeLists.txt for minSdk enforcement ==="
head -50 sdk/runanywhere-commons/CMakeLists.txt
echo ""
echo "=== Search for API level checks related to Vulkan ==="
rg 'API.*26|MIN.*26|minSdk.*26' --type cmake --type cpp -n -B 2 -A 2 | head -30Repository: RunanywhereAI/runanywhere-sdks
Length of output: 4621
🏁 Script executed:
#!/bin/bash
# Check Vulkan requirements and how it's integrated
echo "=== Check for Vulkan API level requirements ==="
rg -i 'vulkan.*api.*2[6-9]|vulkan.*level|vulkan.*minSdk' -n -B 2 -A 2 | head -40
echo ""
echo "=== Search for runtime checks on API level for Vulkan ==="
rg -i 'vulkan.*build.*sdk|sdk.*build.*vulkan|dynamic.*vulkan|runtime.*vulkan' sdk/runanywhere-commons/src -n -B 1 -A 1 | head -30
echo ""
echo "=== Check VERSIONS file full content around ANDROID_MIN_SDK ==="
sed -n '20,85p' sdk/runanywhere-commons/VERSIONSRepository: RunanywhereAI/runanywhere-sdks
Length of output: 5164
ANDROID_MIN_SDK=26 affects C++ native library builds globally, creating build system inconsistency.
This change raises the minimum API level for all C++ native code compiled via build-android.sh (which loads ANDROID_MIN_SDK from VERSIONS via load-versions.sh and passes it to CMake). However, Gradle-based APK builds hardcode minSdk=24 across all modules and do not source this variable. This creates a mismatch: APKs can still target API 24-25 devices, but their bundled native libraries are compiled assuming API 26+ features (Vulkan 1.1). This could cause crashes on older devices if they attempt to use Vulkan features at runtime.
Consider one of:
- Decouple Vulkan from global minSdk: Add
ANDROID_MIN_SDK_VULKAN=26and conditionally apply it only to Vulkan-specific code paths or builds. - Update Gradle declarations: If API 26 is required globally, update all
build.gradle.ktsandbuild.gradlefiles to matchANDROID_MIN_SDK=26for consistency. - Document the decision: If this mismatch is intentional (relying on runtime Vulkan feature detection), explicitly document why native code targets 26 while APKs target 24.
🤖 Prompt for AI Agents
In `@sdk/runanywhere-commons/VERSIONS` around lines 25 - 26, The ANDROID_MIN_SDK
change in VERSIONS raises the native C++ build minimum to 26 but Gradle APKs
still hardcode minSdk=24, creating an inconsistency; to fix, either (A)
introduce a separate variable ANDROID_MIN_SDK_VULKAN=26 in VERSIONS and update
build-android.sh/load-versions.sh and the CMake invocation to use
ANDROID_MIN_SDK_VULKAN for Vulkan/native builds only (keeping ANDROID_MIN_SDK
for general use), or (B) update all build.gradle and build.gradle.kts modules to
set minSdk to 26 so Gradle and native builds match; alternatively, if you intend
to keep the mismatch, add a clear doc comment in VERSIONS explaining the runtime
Vulkan detection strategy and why native libs target API26 while APKs remain 24.
| # b7935 - LATEST release (Feb 4, 2025) with full Vulkan support | ||
| # Includes arr_dmmv shaders and proper CoopMat handling | ||
| # Previous: b7650 (missing modern Vulkan shaders) | ||
| LLAMACPP_VERSION=b7935 |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
llama.cpp b7935 release date February 2025
💡 Result:
llama.cpp tag b7935 was released on February 4, 2026 (GitHub shows “released this 04 Feb 15:37”). It is not a February 2025 release. [1][2]
Fix the release date typo: change "Feb 4, 2025" to "Feb 4, 2026".
The version bump to b7935 is correct. However, the comment states "LATEST release (Feb 4, 2025)" but the release was actually February 4, 2026. Update the year in the comment to reflect the accurate release date.
🤖 Prompt for AI Agents
In `@sdk/runanywhere-commons/VERSIONS` around lines 72 - 75, Update the release
date in the VERSIONS comment: change the text "LATEST release (Feb 4, 2025)" to
"LATEST release (Feb 4, 2026)" in the block that documents
LLAMACPP_VERSION=b7935 so the comment accurately reflects the release date.
| device_info.pQueueCreateInfos = &queue_info; | ||
|
|
||
| VkDevice device = VK_NULL_HANDLE; | ||
| result = vkCreateDevice(devices[0], &device_info, nullptr, &device); |
There was a problem hiding this comment.
vkCreateDevice result is checked but potentially crashes before the check. If the driver crashes during device creation, the process is already dead and _exit(result == VK_SUCCESS ? 0 : 1) will never execute.
The fork approach isolates crashes correctly, but this logic assumes the function returns rather than crashing.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp
Line: 81:81
Comment:
`vkCreateDevice` result is checked but potentially crashes before the check. If the driver crashes during device creation, the process is already dead and `_exit(result == VK_SUCCESS ? 0 : 1)` will never execute.
The fork approach isolates crashes correctly, but this logic assumes the function returns rather than crashing.
How can I resolve this? If you propose a fix, please make it concise.| handle->backend = std::make_unique<runanywhere::LlamaCppBackend>(); | ||
| handle->backend->disable_gpu(); |
There was a problem hiding this comment.
Creating a new backend instance doesn't call llama_backend_free() on the failed instance. If llama_backend_init() partially succeeded before the exception, Vulkan resources may leak.
| handle->backend = std::make_unique<runanywhere::LlamaCppBackend>(); | |
| handle->backend->disable_gpu(); | |
| handle->backend.reset(); // Calls destructor which calls llama_backend_free() | |
| handle->backend = std::make_unique<runanywhere::LlamaCppBackend>(); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp
Line: 78:79
Comment:
Creating a new backend instance doesn't call `llama_backend_free()` on the failed instance. If `llama_backend_init()` partially succeeded before the exception, Vulkan resources may leak.
```suggestion
handle->backend.reset(); // Calls destructor which calls llama_backend_free()
handle->backend = std::make_unique<runanywhere::LlamaCppBackend>();
```
How can I resolve this? If you propose a fix, please make it concise.| * Check if this GPU is suitable for ML inference | ||
| */ | ||
| fun isSuitableForML(): Boolean { | ||
| return isAvailable && supportsCompute && maxMemoryMB >= 512 |
There was a problem hiding this comment.
Minimum VRAM requirement of 512MB may be too low for 7B+ models mentioned in PR. Based on llamacpp_backend.cpp:306, 7B models need ~6GB GPU memory for 2048 context. Consider raising threshold or adding model-size-aware checks.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/platform/GPUInfo.kt
Line: 25:25
Comment:
Minimum VRAM requirement of 512MB may be too low for 7B+ models mentioned in PR. Based on `llamacpp_backend.cpp:306`, 7B models need ~6GB GPU memory for 2048 context. Consider raising threshold or adding model-size-aware checks.
How can I resolve this? If you propose a fix, please make it concise.| // CRITICAL: Disable Vulkan IMMEDIATELY on library load | ||
| // This prevents crash in llama_backend_init() before our code runs | ||
| __android_log_print(ANDROID_LOG_WARN, "RAC_GPU_STATUS", "JNI_OnLoad: Disabling Vulkan to prevent crash"); | ||
| setenv("GGML_VK_DISABLE", "1", 1); |
There was a problem hiding this comment.
setenv("GGML_VK_DISABLE", "1", 1) globally disables Vulkan before any detection runs. Combined with GGML_VULKAN OFF in CMake, this creates triple redundancy. If the goal is experimental Vulkan support, at least one of these should be conditional (e.g., based on a runtime flag or device whitelist).
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp
Line: 56:56
Comment:
`setenv("GGML_VK_DISABLE", "1", 1)` globally disables Vulkan before any detection runs. Combined with `GGML_VULKAN OFF` in CMake, this creates triple redundancy. If the goal is experimental Vulkan support, at least one of these should be conditional (e.g., based on a runtime flag or device whitelist).
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| /** | ||
| * JNI Bridge for Device Capabilities (GPU Detection) | ||
| * | ||
| * Exposes Vulkan GPU detection to Kotlin/Java layer | ||
| */ | ||
|
|
||
| #include <jni.h> | ||
| #include "rac/infrastructure/device/vulkan_detector.h" | ||
| #include "rac/core/rac_logger.h" | ||
|
|
||
| extern "C" { | ||
|
|
||
| /** | ||
| * Detect Vulkan GPU and return GPUInfo object | ||
| * | ||
| * Java signature: | ||
| * com.runanywhere.sdk.platform.DeviceCapabilities.detectVulkanGPU() -> GPUInfo | ||
| */ | ||
| JNIEXPORT jobject JNICALL | ||
| Java_com_runanywhere_sdk_platform_DeviceCapabilities_detectVulkanGPU( | ||
| JNIEnv* env, jclass /* clazz */) { | ||
|
|
||
| RAC_LOG_INFO("DeviceJNI", "Detecting Vulkan GPU from JNI..."); | ||
|
|
||
| // Detect GPU using C++ layer | ||
| auto info = runanywhere::VulkanDetector::detect(); | ||
|
|
||
| // Find GPUInfo class | ||
| jclass gpu_info_class = env->FindClass("com/runanywhere/sdk/platform/GPUInfo"); | ||
| if (gpu_info_class == nullptr) { | ||
| RAC_LOG_ERROR("DeviceJNI", "Failed to find GPUInfo class"); | ||
| return nullptr; | ||
| } | ||
|
|
||
| // Find constructor: GPUInfo(boolean, String, String, String, long, boolean) | ||
| jmethodID constructor = env->GetMethodID( | ||
| gpu_info_class, | ||
| "<init>", | ||
| "(ZLjava/lang/String;Ljava/lang/String;Ljava/lang/String;JZ)V" | ||
| ); | ||
|
|
||
| if (constructor == nullptr) { | ||
| RAC_LOG_ERROR("DeviceJNI", "Failed to find GPUInfo constructor"); | ||
| return nullptr; | ||
| } | ||
|
|
||
| // Convert C++ strings to Java strings | ||
| jstring device_name = env->NewStringUTF(info.device_name.c_str()); | ||
| jstring driver_version = env->NewStringUTF(info.driver_version.c_str()); | ||
| jstring api_version = env->NewStringUTF( | ||
| runanywhere::VulkanDetector::get_version_string(info.api_version).c_str() | ||
| ); | ||
|
|
||
| // Create GPUInfo object | ||
| jobject gpu_info = env->NewObject( | ||
| gpu_info_class, | ||
| constructor, | ||
| (jboolean)info.is_available, | ||
| device_name, | ||
| driver_version, | ||
| api_version, | ||
| (jlong)info.max_memory_mb, | ||
| (jboolean)info.supports_compute | ||
| ); | ||
|
|
||
| // Clean up local references | ||
| env->DeleteLocalRef(device_name); | ||
| env->DeleteLocalRef(driver_version); | ||
| env->DeleteLocalRef(api_version); | ||
| env->DeleteLocalRef(gpu_info_class); | ||
|
|
||
| RAC_LOG_INFO("DeviceJNI", "GPU detection complete: available=%d, device=%s", | ||
| info.is_available, info.device_name.c_str()); | ||
|
|
||
| return gpu_info; | ||
| } | ||
|
|
||
| /** | ||
| * Quick check if Vulkan is supported | ||
| * | ||
| * Java signature: | ||
| * com.runanywhere.sdk.platform.DeviceCapabilities.isVulkanSupported() -> boolean | ||
| */ | ||
| JNIEXPORT jboolean JNICALL | ||
| Java_com_runanywhere_sdk_platform_DeviceCapabilities_isVulkanSupported( | ||
| JNIEnv* env, jclass /* clazz */) { | ||
|
|
||
| bool supported = runanywhere::VulkanDetector::is_vulkan_supported(); | ||
| RAC_LOG_INFO("DeviceJNI", "Vulkan supported: %s", supported ? "YES" : "NO"); | ||
| return (jboolean)supported; | ||
| } | ||
|
|
||
| /** | ||
| * List all available Vulkan devices | ||
| * | ||
| * Java signature: | ||
| * com.runanywhere.sdk.platform.DeviceCapabilities.listVulkanDevices() -> List<GPUInfo> | ||
| */ | ||
| JNIEXPORT jobject JNICALL | ||
| Java_com_runanywhere_sdk_platform_DeviceCapabilities_listVulkanDevices( | ||
| JNIEnv* env, jclass /* clazz */) { | ||
|
|
||
| RAC_LOG_INFO("DeviceJNI", "Listing all Vulkan devices..."); | ||
|
|
||
| auto devices = runanywhere::VulkanDetector::list_devices(); | ||
|
|
||
| // Find ArrayList class | ||
| jclass array_list_class = env->FindClass("java/util/ArrayList"); | ||
| if (array_list_class == nullptr) { | ||
| RAC_LOG_ERROR("DeviceJNI", "Failed to find ArrayList class"); | ||
| return nullptr; | ||
| } | ||
|
|
||
| jmethodID array_list_constructor = env->GetMethodID(array_list_class, "<init>", "()V"); | ||
| jmethodID array_list_add = env->GetMethodID(array_list_class, "add", "(Ljava/lang/Object;)Z"); | ||
|
|
||
| // Create ArrayList | ||
| jobject device_list = env->NewObject(array_list_class, array_list_constructor); | ||
|
|
||
| // Find GPUInfo class | ||
| jclass gpu_info_class = env->FindClass("com/runanywhere/sdk/platform/GPUInfo"); | ||
| jmethodID gpu_info_constructor = env->GetMethodID( | ||
| gpu_info_class, | ||
| "<init>", | ||
| "(ZLjava/lang/String;Ljava/lang/String;Ljava/lang/String;JZ)V" | ||
| ); | ||
|
|
||
| // Add each device to list | ||
| for (const auto& info : devices) { | ||
| jstring device_name = env->NewStringUTF(info.device_name.c_str()); | ||
| jstring driver_version = env->NewStringUTF(info.driver_version.c_str()); | ||
| jstring api_version = env->NewStringUTF( | ||
| runanywhere::VulkanDetector::get_version_string(info.api_version).c_str() | ||
| ); | ||
|
|
||
| jobject gpu_info = env->NewObject( | ||
| gpu_info_class, | ||
| gpu_info_constructor, | ||
| (jboolean)info.is_available, | ||
| device_name, | ||
| driver_version, | ||
| api_version, | ||
| (jlong)info.max_memory_mb, | ||
| (jboolean)info.supports_compute | ||
| ); | ||
|
|
||
| env->CallBooleanMethod(device_list, array_list_add, gpu_info); | ||
|
|
||
| env->DeleteLocalRef(device_name); | ||
| env->DeleteLocalRef(driver_version); | ||
| env->DeleteLocalRef(api_version); | ||
| env->DeleteLocalRef(gpu_info); | ||
| } | ||
|
|
||
| env->DeleteLocalRef(array_list_class); | ||
| env->DeleteLocalRef(gpu_info_class); | ||
|
|
||
| RAC_LOG_INFO("DeviceJNI", "Found %zu Vulkan device(s)", devices.size()); | ||
|
|
||
| return device_list; | ||
| } | ||
|
|
||
| } // extern "C" |
There was a problem hiding this comment.
New JNI bridge for Vulkan detection is well-structured, but will always return unavailable since GGML_USE_VULKAN is undefined (due to GGML_VULKAN OFF in CMake). All native methods will return empty/unavailable data.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/jni/device_jni.cpp
Line: 1:163
Comment:
New JNI bridge for Vulkan detection is well-structured, but will always return unavailable since `GGML_USE_VULKAN` is undefined (due to `GGML_VULKAN OFF` in CMake). All native methods will return empty/unavailable data.
How can I resolve this? If you propose a fix, please make it concise.| // Cache GPU info to avoid repeated JNI calls | ||
| private var cachedGPUInfo: GPUInfo? = null | ||
| private var cacheInitialized = false |
There was a problem hiding this comment.
GPU info cache never expires. If device state changes (driver update, GPU mode change), cached data will be stale. Consider adding cache TTL or listening to system events for cache invalidation.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/platform/DeviceCapabilities.kt
Line: 32:34
Comment:
GPU info cache never expires. If device state changes (driver update, GPU mode change), cached data will be stale. Consider adding cache TTL or listening to system events for cache invalidation.
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (2)
Prompt To Fix With AIThis is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt
Line: 45:45
Comment:
Vulkan is explicitly disabled in CMake for Android (line 45), but the PR description claims this adds Vulkan GPU acceleration support for Android. This is contradictory - with `GGML_VULKAN OFF`, no Vulkan code will be compiled into the library.
```suggestion
set(GGML_VULKAN ON CACHE BOOL "" FORCE)
```
How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix With AIThis is a comment left during a code review.
Path: sdk/runanywhere-commons/include/rac/backends/rac_llm_llamacpp.h
Line: 1:436
Comment:
Entire header file was reformatted with 436 lines changed, making it difficult to identify actual Vulkan-related changes vs whitespace/formatting changes. Formatting changes should be in separate commits from functional changes.
How can I resolve this? If you propose a fix, please make it concise. |
|
Thanks for the investigation into Vulkan GPU acceleration, @MdTabish24. The build verification work and honest documentation of failures are appreciated. However, after thorough analysis, we're rejecting this PR due to fundamental issues with the integration approach — not just the runtime crashes. Why the crashes happenThe core issue isn't that "Vulkan doesn't work on Android." Vulkan compute does work at production scale on Android — Tencent's ncnn (used in WeChat/QQ on billions of devices) and Alibaba's MNN (used in Taobao/Alipay) both use Vulkan compute for ML inference successfully on the same Mali and Adreno GPUs that crash here. The problem is how this PR integrates Vulkan: 1. Static initialization causes unrecoverable crashes llama.cpp registers the Vulkan backend statically at library load time via The crash happens before 2. No device-specific driver workarounds Frameworks that successfully use Vulkan on Android (ncnn, MNN) maintain extensive per-device bug workaround databases. For example, ncnn's
This PR has zero driver workarounds. It's essentially "enable 3. No dynamic loading ncnn uses a custom Vulkan loader ( 4. Dead/contradictory code
5. llama.cpp's Vulkan backend isn't optimized for mobile Even where it doesn't crash, llama.cpp's Vulkan backend is ~15x slower than CPU on Mali GPUs (discussion #9464). On Adreno 732, it produces gibberish output (issue #16881). Qualcomm themselves built a separate OpenCL backend for llama.cpp rather than fixing Vulkan on Adreno — that tells you where things stand upstream. What a proper integration would requireFor reference, a production-ready Vulkan integration would need:
|
|
@MdTabish24 update : Sorry i didn't Read the previous comment, but just commented on this, cause i was facing major issues with GPU on device thus i though why not update you guys |
Summary
Adds Vulkan GPU acceleration to RunAnywhere Android SDK for potential 2-3x performance improvement. Build is verified and functional, but runtime testing reveals driver compatibility issues on most Android devices.
Motivation
Changes Made
Build System
GGML_VULKAN=ONin CMake for AndroidC++ Implementation
vulkan_detector.h/cpp- GPU detection and capability checkingdevice_jni.cpp- JNI bridge exposing GPU info to Kotlinllamacpp_backend.cpp/h- GPU layer offloading with CPU fallbackrac_llm_llamacpp.cpp- Exception handling for Vulkan crashesrac_backend_llamacpp_jni.cpp- Disables Vulkan in JNI_OnLoad to prevent crashesKotlin SDK
DeviceCapabilities.kt- GPU detection APIGPUInfo.kt- GPU information data classRunAnywhere+DeviceInfo.kt- Public API for device infoRunAnywhere+ModelManagement.kt- ModelLoadOptions with GPU parametersCppBridgeLLM.kt- Bridge for GPU layer configurationScripts & Documentation
verify-vulkan-build.sh- Static verification (no device needed)test-vulkan-on-device.sh- Device testing scriptcompile-all-vulkan-shaders.sh- Shader compilationgenerate-shader-aggregates.py- Shader data aggregationVULKAN_ANDROID_CONTRIBUTION.md- Complete implementation guideVULKAN_TEST_RESULTS.md- Detailed test resultsBuild Verification ✅
Static analysis confirms successful integration:
Environment:
Runtime Testing⚠️
Tested on 2 physical devices:
Device 1: Redmi Note 10S (Mali-G76 MC4)
Device 2: Redmi Note 12 5G (Adreno 619)
Community Evidence:
Known Issues
Testing Performed
Recommendation
DO NOT MERGE for production use.
This PR is submitted for:
Suggested action:
Alternative Approach
For production, recommend:
cmake -DGGML_VULKAN=OFF ... # CPU-only buildBenefits:
Files Changed
New Files (14):
VULKAN_ANDROID_CONTRIBUTION.md,VULKAN_TEST_RESULTS.mdverify-vulkan-build.sh,test-vulkan-on-device.sh,compile-all-vulkan-shaders.sh,generate-shader-aggregates.pyvulkan_detector.h/cpp,device_jni.cpp,patch_vulkan.py,FetchVulkanHpp.cmakeDeviceCapabilities.kt,GPUInfo.kt,RunAnywhere+DeviceInfo.ktModified Files (16):
VERSIONS,CMakeLists.txt(3 files)llamacpp_backend.cpp/h,rac_llm_llamacpp.cpp/h,rac_backend_llamacpp_jni.cppRunAnywhere+ModelManagement.kt,CppBridgeLLM.kt,CppBridgeDevice.kt,DeviceInfoModels.kt,Capability.ktChecklist
Related Issues
N/A - Experimental feature exploration
Screenshots/Logs
See
VULKAN_TEST_RESULTS.mdfor:Questions for Reviewers
Note: This is an experimental feature that demonstrates technical feasibility but is not production-ready due to Android ecosystem limitations.
Summary by CodeRabbit
Release Notes
New Features
Documentation
Greptile Overview
Greptile Summary
This PR adds comprehensive Vulkan GPU acceleration infrastructure for Android, including detection APIs, exception handling, and fallback mechanisms. However, Vulkan is completely disabled in production through triple-redundant safety measures.
Key Changes
Build System:
GGML_VULKAN OFFfor Android (contradicts PR purpose)C++ Implementation:
GGML_VK_DISABLE=1globally on library loadKotlin SDK:
RunAnywhere.getDeviceInfo()with GPU informationDeviceCapabilitiesobject for Vulkan detection with result cachingGPUInfodata class with ML suitability checksgpuLayersparameter (unused since GPU disabled)Actual Runtime Behavior
Despite the extensive GPU code, Vulkan is never used due to:
GGML_VULKAN OFFmeans no Vulkan code compiled into libraryGGML_VK_DISABLE=1before any initializationn_gpu_layers=0on Android (llamacpp_backend.cpp:232-235)The GPU detection APIs will always return unavailable since
GGML_USE_VULKANis undefined.Code Quality Issues
CppBridgeDevice.kt(2,564 lines changed) andrac_llm_llamacpp.h(436 lines) are nearly impossible to review due to whitespace/formatting changes mixed with functional changes.llama_backend_free()on failed instanceAlignment with PR Description
The author correctly labels this as experimental/documentation-only and explicitly recommends DO NOT MERGE. The PR honestly documents:
Recommendation
This PR serves its stated documentary purpose well. The code quality issues (reformatting, resource leaks, redundant disables) should be addressed if this ever becomes production-ready, but for an explicitly non-production experimental branch, the comprehensive documentation and honest assessment of limitations are valuable contributions to the project's knowledge base.
Confidence Score: 2/5
sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt(Vulkan explicitly disabled contradicting PR purpose),rac_llm_llamacpp.cpp(resource leak in exception handling),vulkan_detector.cpp(fork safety concerns), and the heavily reformatted files that obscure actual changesImportant Files Changed
GGML_VK_DISABLE=1in JNI_OnLoad to prevent Vulkan crashes. Creates triple-redundancy with CMake and backend disables. Code works but architecture is redundant.GGML_USE_VULKANis undefined.GGML_VULKAN OFF. This directly contradicts the PR's stated purpose of adding Vulkan GPU acceleration for Android.Sequence Diagram
sequenceDiagram participant App as Android App participant SDK as RunAnywhere SDK participant DevCap as DeviceCapabilities participant JNI as device_jni.cpp participant VkDet as vulkan_detector.cpp participant Backend as LlamaCppBackend participant GGML as llama.cpp/GGML Note over App,GGML: GPU Detection Flow App->>SDK: getDeviceInfo() SDK->>DevCap: detectVulkanGPU() DevCap->>JNI: nativeDetectVulkanGPU() JNI->>VkDet: VulkanDetector::detect() alt Android Platform VkDet->>VkDet: fork() child process VkDet->>VkDet: Child: vkCreateInstance() VkDet->>VkDet: Child: vkEnumeratePhysicalDevices() alt Driver Crash VkDet-->>VkDet: Child: SIGSEGV VkDet->>VkDet: Parent: waitpid() detects signal VkDet-->>JNI: GPUInfo (unavailable) else Success VkDet->>VkDet: Child: exit(0) VkDet-->>JNI: GPUInfo (available) end end JNI-->>DevCap: GPUInfo DevCap-->>SDK: DeviceInfo SDK-->>App: DeviceInfo with GPU Note over App,GGML: Model Load Flow (with GPU disabled) App->>SDK: loadModel(path, gpuLayers=24) SDK->>Backend: create(config) Note over Backend,GGML: JNI_OnLoad already set GGML_VK_DISABLE=1 Backend->>GGML: llama_backend_init() Note over GGML: Vulkan disabled by env var GGML-->>Backend: initialized (CPU only) Backend->>Backend: disable_gpu() hardcoded on Android Backend->>GGML: llama_model_load_from_file(n_gpu_layers=0) GGML-->>Backend: model loaded (CPU only) Backend-->>SDK: handle SDK-->>App: Model ready Note over App,GGML: Actual State: GPU Never Used Note over App,GGML: CMake: GGML_VULKAN=OFF Note over App,GGML: JNI_OnLoad: GGML_VK_DISABLE=1 Note over App,GGML: Backend: n_gpu_layers=0 (Android)