Skip to content

feat(android): add Vulkan GPU acceleration support (experimental)#348

Draft
MdTabish24 wants to merge 6 commits intoRunanywhereAI:mainfrom
MdTabish24:feature/vulkan-gpu-acceleration-experimental
Draft

feat(android): add Vulkan GPU acceleration support (experimental)#348
MdTabish24 wants to merge 6 commits intoRunanywhereAI:mainfrom
MdTabish24:feature/vulkan-gpu-acceleration-experimental

Conversation

@MdTabish24
Copy link

@MdTabish24 MdTabish24 commented Feb 11, 2026

Summary

Adds Vulkan GPU acceleration to RunAnywhere Android SDK for potential 2-3x performance improvement. Build is verified and functional, but runtime testing reveals driver compatibility issues on most Android devices.

Motivation

  • Modern Android devices have powerful GPUs that remain unused for LLM inference
  • Vulkan GPU acceleration could provide 2-3x speedup on compatible devices
  • Investigation needed to determine production viability

Changes Made

Build System

  • ✅ Enabled GGML_VULKAN=ON in CMake for Android
  • ✅ Integrated llama.cpp b7935 Vulkan backend
  • ✅ Compiled 1435 Vulkan compute shaders (147 source files)
  • ✅ Added Android API 26+ compatibility fixes
  • ✅ Created shader compilation scripts

C++ Implementation

  • vulkan_detector.h/cpp - GPU detection and capability checking
  • device_jni.cpp - JNI bridge exposing GPU info to Kotlin
  • llamacpp_backend.cpp/h - GPU layer offloading with CPU fallback
  • rac_llm_llamacpp.cpp - Exception handling for Vulkan crashes
  • rac_backend_llamacpp_jni.cpp - Disables Vulkan in JNI_OnLoad to prevent crashes

Kotlin SDK

  • DeviceCapabilities.kt - GPU detection API
  • GPUInfo.kt - GPU information data class
  • RunAnywhere+DeviceInfo.kt - Public API for device info
  • RunAnywhere+ModelManagement.kt - ModelLoadOptions with GPU parameters
  • CppBridgeLLM.kt - Bridge for GPU layer configuration

Scripts & Documentation

  • verify-vulkan-build.sh - Static verification (no device needed)
  • test-vulkan-on-device.sh - Device testing script
  • compile-all-vulkan-shaders.sh - Shader compilation
  • generate-shader-aggregates.py - Shader data aggregation
  • VULKAN_ANDROID_CONTRIBUTION.md - Complete implementation guide
  • VULKAN_TEST_RESULTS.md - Detailed test results

Build Verification ✅

Static analysis confirms successful integration:

$ bash scripts/verify-vulkan-build.sh

✅ Library size: 63 MB (includes shaders)
✅ Vulkan symbols: 141 found
✅ Compiled shaders: 1435 .spv files
✅ CMake: GGML_VULKAN=ON
✅ Build: VERIFIED

Environment:

  • Platform: Ubuntu 22.04
  • NDK: 27.0.12077973
  • llama.cpp: b7935 (Feb 4, 2025)
  • Target: Android arm64-v8a (API 26+)

Runtime Testing ⚠️

Tested on 2 physical devices:

Device 1: Redmi Note 10S (Mali-G76 MC4)

  • Result: ❌ Crash on launch
  • Cause: Vulkan driver memory allocation failure

Device 2: Redmi Note 12 5G (Adreno 619)

  • Result: ❌ Crash on launch
  • Cause: Same driver issue

Community Evidence:

Known Issues

  1. Crashes on 70-80% of devices - Mali GPUs and budget Adreno
  2. Incorrect output on working devices - Even high-end Adreno 732+
  3. Cannot disable at runtime - Static initialization in llama.cpp
  4. Driver compatibility - Mobile Vulkan drivers are unreliable

Testing Performed

  • ✅ Static analysis: All Vulkan symbols verified
  • ✅ Build verification: Successful compilation
  • ✅ Device testing: 2 devices tested (both failed)
  • ✅ Community research: Confirmed widespread issues
  • ❌ Production testing: Not recommended

Recommendation

DO NOT MERGE for production use.

This PR is submitted for:

  1. Documentation purposes - Shows Vulkan integration is possible
  2. Future reference - When mobile drivers improve
  3. Community discussion - Alternative approaches

Suggested action:

  • Keep as draft/experimental branch
  • Revisit when llama.cpp adds lazy Vulkan initialization
  • Test on 20+ devices to create whitelist

Alternative Approach

For production, recommend:

cmake -DGGML_VULKAN=OFF ...  # CPU-only build

Benefits:

  • ✅ Works on all devices
  • ✅ Reliable output
  • ✅ No crashes
  • ✅ Predictable performance

Files Changed

New Files (14):

  • Documentation: VULKAN_ANDROID_CONTRIBUTION.md, VULKAN_TEST_RESULTS.md
  • Scripts: verify-vulkan-build.sh, test-vulkan-on-device.sh, compile-all-vulkan-shaders.sh, generate-shader-aggregates.py
  • C++ Implementation: vulkan_detector.h/cpp, device_jni.cpp, patch_vulkan.py, FetchVulkanHpp.cmake
  • Kotlin SDK: DeviceCapabilities.kt, GPUInfo.kt, RunAnywhere+DeviceInfo.kt

Modified Files (16):

  • Build: VERSIONS, CMakeLists.txt (3 files)
  • C++ Backend: llamacpp_backend.cpp/h, rac_llm_llamacpp.cpp/h, rac_backend_llamacpp_jni.cpp
  • Kotlin SDK: RunAnywhere+ModelManagement.kt, CppBridgeLLM.kt, CppBridgeDevice.kt, DeviceInfoModels.kt, Capability.kt

Checklist

  • Code follows project style guidelines
  • Build verification completed
  • Runtime testing performed (2 devices)
  • Documentation updated
  • Test scripts provided
  • Known issues documented
  • Clear recommendation provided
  • CI passes (expected to pass build, not runtime)

Related Issues

N/A - Experimental feature exploration

Screenshots/Logs

See VULKAN_TEST_RESULTS.md for:

  • Crash logs with addr2line analysis
  • Device specifications
  • Community evidence
  • Verification commands

Questions for Reviewers

  1. Should we keep this as experimental branch for future use?
  2. Is there interest in device whitelist approach?
  3. Should we contribute Vulkan fixes upstream to llama.cpp?

Note: This is an experimental feature that demonstrates technical feasibility but is not production-ready due to Android ecosystem limitations.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added experimental Vulkan GPU acceleration support for Android devices with compatible hardware.
    • New device information APIs to query GPU capabilities, system specs (CPU cores, RAM), and detect Vulkan support.
    • Added GPU state management with fallback to CPU-only mode if GPU initialization fails.
  • Documentation

    • Comprehensive Vulkan setup, testing, and known issues documentation.

Greptile Overview

Greptile Summary

This PR adds comprehensive Vulkan GPU acceleration infrastructure for Android, including detection APIs, exception handling, and fallback mechanisms. However, Vulkan is completely disabled in production through triple-redundant safety measures.

Key Changes

Build System:

  • Updated llama.cpp to b7199 with Vulkan support
  • Added Vulkan-Hpp C++ bindings
  • Upgraded to C++20 for Vulkan compatibility
  • Critical: CMake explicitly sets GGML_VULKAN OFF for Android (contradicts PR purpose)

C++ Implementation:

  • Fork-based Vulkan detection isolates driver crashes in child process
  • Exception handling with CPU fallback for Vulkan initialization failures
  • GPU layer configuration API added to backend
  • JNI bridge sets GGML_VK_DISABLE=1 globally on library load

Kotlin SDK:

  • New public API: RunAnywhere.getDeviceInfo() with GPU information
  • DeviceCapabilities object for Vulkan detection with result caching
  • GPUInfo data class with ML suitability checks
  • Model loading API accepts gpuLayers parameter (unused since GPU disabled)

Actual Runtime Behavior

Despite the extensive GPU code, Vulkan is never used due to:

  1. CMake: GGML_VULKAN OFF means no Vulkan code compiled into library
  2. JNI_OnLoad: Sets GGML_VK_DISABLE=1 before any initialization
  3. Backend: Hardcoded n_gpu_layers=0 on Android (llamacpp_backend.cpp:232-235)

The GPU detection APIs will always return unavailable since GGML_USE_VULKAN is undefined.

Code Quality Issues

  • Massive reformatting: 5,500+ lines changed with only ~600 lines of actual new code. Files like CppBridgeDevice.kt (2,564 lines changed) and rac_llm_llamacpp.h (436 lines) are nearly impossible to review due to whitespace/formatting changes mixed with functional changes.
  • Resource leak risk: Exception handlers recreate backend without calling llama_backend_free() on failed instance
  • Fork safety concerns: Fork-based detection in ART environment has known limitations and potential zombie process issues
  • Cache staleness: GPU info cached indefinitely without expiration or invalidation

Alignment with PR Description

The author correctly labels this as experimental/documentation-only and explicitly recommends DO NOT MERGE. The PR honestly documents:

  • 70-80% device crash rate on testing
  • Incorrect output even on devices that don't crash
  • Driver compatibility issues across Mali and Adreno GPUs
  • Submitted for future reference when mobile Vulkan drivers improve

Recommendation

This PR serves its stated documentary purpose well. The code quality issues (reformatting, resource leaks, redundant disables) should be addressed if this ever becomes production-ready, but for an explicitly non-production experimental branch, the comprehensive documentation and honest assessment of limitations are valuable contributions to the project's knowledge base.

Confidence Score: 2/5

  • This PR is safe to merge as experimental/documentation code since Vulkan is completely disabled in production, but has code quality issues that prevent production readiness
  • Score reflects the contradictory nature of the PR: extensive GPU infrastructure that's completely disabled, massive reformatting obscuring changes, resource leak risks in exception handlers, and fork-based detection with safety concerns. However, the author explicitly marks this as experimental/DO NOT MERGE for production, and Vulkan is disabled through multiple redundant mechanisms (CMake OFF, env var disable, hardcoded n_gpu_layers=0). The documentation is excellent and honest about limitations. Safe as documentary code, but needs significant work before production use.
  • Pay close attention to sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt (Vulkan explicitly disabled contradicting PR purpose), rac_llm_llamacpp.cpp (resource leak in exception handling), vulkan_detector.cpp (fork safety concerns), and the heavily reformatted files that obscure actual changes

Important Files Changed

Filename Overview
sdk/runanywhere-commons/include/rac/backends/rac_llm_llamacpp.h Massive reformatting (436 lines changed) obscures actual functional changes. Added GPU layer configuration but hard to verify correctness due to formatting noise.
sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp Sets GGML_VK_DISABLE=1 in JNI_OnLoad to prevent Vulkan crashes. Creates triple-redundancy with CMake and backend disables. Code works but architecture is redundant.
sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp Added exception handling for Vulkan crashes with CPU fallback. Potential resource leak when recreating backend without proper cleanup. Logic is complex and error-prone.
sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp Fork-based Vulkan detection to isolate driver crashes. Creative but has resource leak risks and performance concerns. Won't be used since GGML_USE_VULKAN is undefined.
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeDevice.kt Massive reformatting (2564 lines changed) makes review extremely difficult. GPU-related changes are small but impossible to verify without clean diff.
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/platform/GPUInfo.kt GPU info data class with helper methods. 512MB VRAM threshold may be too low for intended 7B models which need ~6GB based on backend logic.
sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt Explicitly disables Vulkan for Android with GGML_VULKAN OFF. This directly contradicts the PR's stated purpose of adding Vulkan GPU acceleration for Android.

Sequence Diagram

sequenceDiagram
    participant App as Android App
    participant SDK as RunAnywhere SDK
    participant DevCap as DeviceCapabilities
    participant JNI as device_jni.cpp
    participant VkDet as vulkan_detector.cpp
    participant Backend as LlamaCppBackend
    participant GGML as llama.cpp/GGML

    Note over App,GGML: GPU Detection Flow
    App->>SDK: getDeviceInfo()
    SDK->>DevCap: detectVulkanGPU()
    DevCap->>JNI: nativeDetectVulkanGPU()
    JNI->>VkDet: VulkanDetector::detect()
    
    alt Android Platform
        VkDet->>VkDet: fork() child process
        VkDet->>VkDet: Child: vkCreateInstance()
        VkDet->>VkDet: Child: vkEnumeratePhysicalDevices()
        alt Driver Crash
            VkDet-->>VkDet: Child: SIGSEGV
            VkDet->>VkDet: Parent: waitpid() detects signal
            VkDet-->>JNI: GPUInfo (unavailable)
        else Success
            VkDet->>VkDet: Child: exit(0)
            VkDet-->>JNI: GPUInfo (available)
        end
    end
    
    JNI-->>DevCap: GPUInfo
    DevCap-->>SDK: DeviceInfo
    SDK-->>App: DeviceInfo with GPU

    Note over App,GGML: Model Load Flow (with GPU disabled)
    App->>SDK: loadModel(path, gpuLayers=24)
    SDK->>Backend: create(config)
    
    Note over Backend,GGML: JNI_OnLoad already set GGML_VK_DISABLE=1
    Backend->>GGML: llama_backend_init()
    Note over GGML: Vulkan disabled by env var
    GGML-->>Backend: initialized (CPU only)
    
    Backend->>Backend: disable_gpu() hardcoded on Android
    Backend->>GGML: llama_model_load_from_file(n_gpu_layers=0)
    GGML-->>Backend: model loaded (CPU only)
    Backend-->>SDK: handle
    SDK-->>App: Model ready

    Note over App,GGML: Actual State: GPU Never Used
    Note over App,GGML: CMake: GGML_VULKAN=OFF
    Note over App,GGML: JNI_OnLoad: GGML_VK_DISABLE=1
    Note over App,GGML: Backend: n_gpu_layers=0 (Android)
Loading

Adds Vulkan GPU acceleration to RunAnywhere Android SDK. Build is verified
with 1435 compiled shaders and 141 Vulkan symbols. Runtime testing on 2
devices shows crashes due to mobile GPU driver bugs.

Build Status: VERIFIED (63MB library)
Runtime Status: FAILS (70-80% devices crash)
Recommendation: CPU-only build for production

- Compiled 1435 Vulkan shaders from llama.cpp b7935
- Added verification scripts (no device needed)
- Tested on Mali-G76 and Adreno 619 (both crash)
- Documented known issues and recommendations

This is an experimental feature for future reference when mobile Vulkan
drivers improve.
Additional changes for Vulkan GPU acceleration:
- Updated llamacpp_backend.cpp/h with GPU layer offloading
- Added JNI CMakeLists.txt for device_jni.cpp
- Updated ModelManagement for GPU options
- Updated CppBridgeLLM for GPU parameters

These files complete the Vulkan integration implementation.
Added important sections:
- Exception handling error fix (GGML_VULKAN=OFF or -fexceptions)
- .so files explanation (why NOT to push build artifacts)
- Complete files checklist (what was pushed vs excluded)
- Build artifacts vs source code distinction

This helps contributors understand what to push and what to exclude.
Additional Vulkan-related changes:
- rac_llm_llamacpp.h: GPU layers configuration in config struct
- rac_backend_llamacpp_jni.cpp: Disable Vulkan in JNI_OnLoad to prevent crash
- DeviceInfoModels.kt: GPU type enumeration for device info
- CppBridgeDevice.kt: GPU family detection in device bridge

These files complete the GPU/Vulkan infrastructure for safe fallback.
Critical files for production safety:
- rac_llm_llamacpp.cpp: Exception handling for Vulkan crashes with automatic CPU fallback
- Capability.kt: GPU capability enumeration for device detection

These files ensure the app doesn't crash if Vulkan fails - it automatically
falls back to CPU-only mode with proper error logging.
Copilot AI review requested due to automatic review settings February 11, 2026 18:47
@ellipsis-dev
Copy link

ellipsis-dev bot commented Feb 11, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

@coderabbitai
Copy link

coderabbitai bot commented Feb 11, 2026

📝 Walkthrough

Walkthrough

This PR introduces Vulkan GPU acceleration support for Android in the RunAnywhere project, adding comprehensive documentation, build/test verification scripts, C++ Vulkan device detection infrastructure, GPU state management in the llama.cpp backend, and multiplatform Kotlin APIs to expose GPU capabilities to applications.

Changes

Cohort / File(s) Summary
Documentation & Verification
VULKAN_ANDROID_CONTRIBUTION.md, VULKAN_TEST_RESULTS.md
Comprehensive guides detailing Vulkan build process, device testing results, known driver issues, and recommendations for Android integration with strong cautions against production use.
Build Verification Scripts
scripts/verify-vulkan-build.sh, scripts/test-vulkan-on-device.sh
Shell scripts for validating Vulkan builds (library size, symbol counts, shader compilation) and automated device testing with crash detection and GPU log collection.
Shader Compilation Tooling
sdk/runanywhere-commons/scripts/compile-all-vulkan-shaders.sh, sdk/runanywhere-commons/scripts/generate-shader-aggregates.py
Automation for compiling Vulkan shaders and generating C++ aggregate data structures mapping shader binaries to arrays.
CMake & Build Configuration
sdk/runanywhere-commons/CMakeLists.txt, sdk/runanywhere-commons/VERSIONS, sdk/runanywhere-commons/cmake/FetchVulkanHpp.cmake
Updated C++ standard to C++20, bumped llama.cpp version (b7650→b7935), raised minimum Android API to 26, and added Vulkan-Hpp header fetching with NDK integration.
Vulkan Detection Infrastructure
sdk/runanywhere-commons/include/rac/infrastructure/device/vulkan_detector.h, sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp
New public API exposing VulkanDeviceInfo struct and VulkanDetector class with device detection, enumeration, and version formatting; includes Android fork-based safety mechanism.
LlamaCpp Backend GPU Support
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.h, sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp, sdk/runanywhere-commons/src/backends/llamacpp/patch_vulkan.py
GPU state tracking (is_using_gpu(), disable_gpu()), comprehensive exception handling with CPU fallback, and CMake patches for Vulkan integration in llama.cpp.
JNI Device Bridge
sdk/runanywhere-commons/src/jni/device_jni.cpp, sdk/runanywhere-commons/src/jni/CMakeLists.txt, sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp
JNI bindings to expose detectVulkanGPU(), isVulkanSupported(), and listVulkanDevices(); Android JNI_OnLoad hook to disable Vulkan by default.
Kotlin Device Capabilities
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/platform/DeviceCapabilities.kt, sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/platform/GPUInfo.kt
New singleton DeviceCapabilities for GPU detection/caching and GPUInfo data class with suitability checks and version parsing.
Kotlin Public APIs
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+DeviceInfo.kt, sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+DeviceInfo.jvmAndroid.kt
Multiplatform device info APIs (getDeviceInfo(), hasGPUAcceleration()) with CPU core/memory/GPU details; Android implementation delegates to DeviceCapabilities.
Kotlin Model Management & Refactoring
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+ModelManagement.kt, sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/data/models/DeviceInfoModels.kt, sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/native/bridge/Capability.kt, sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeDevice.kt
New registerModel() API with ModelLoadOptions for GPU configuration; formatting/style updates to Kotlin files without logic changes.

Sequence Diagrams

sequenceDiagram
    participant App as Android App
    participant JNI as JNI Bridge
    participant Detect as VulkanDetector
    participant Fork as Child Process
    participant VK as Vulkan Driver
    
    App->>JNI: detectVulkanGPU()
    JNI->>Detect: detect()
    Detect->>Detect: test_vulkan_in_child_process()
    Detect->>Fork: fork()
    Fork->>VK: vkCreateInstance()
    alt Driver Crashes
        Fork-->>Detect: exit(1)
        Detect->>Detect: Reset to unavailable
    else Success
        Fork->>VK: vkEnumeratePhysicalDevices()
        Fork->>VK: vkCreateDevice()
        Fork-->>Detect: exit(0)
        Detect->>VK: vkCreateInstance() [safe]
        Detect->>VK: Query device properties
        Detect->>Detect: Populate VulkanDeviceInfo
    end
    Detect->>JNI: Return VulkanDeviceInfo
    JNI->>App: Return GPUInfo object
Loading
sequenceDiagram
    participant Caller as Caller
    participant Backend as LlamaCppBackend
    participant GGML as ggml_backend
    participant GPU as GPU/Vulkan
    
    Caller->>Backend: create(model_path, use_gpu)
    Backend->>GGML: llama_backend_init()
    alt GPU Available & use_gpu=true
        GGML->>GPU: Initialize Vulkan
        alt Vulkan Succeeds
            GPU-->>GGML: Device created
            GGML-->>Backend: GPU ready
        else Vulkan Fails
            GPU--XGMML: Exception/Crash
            Backend->>Backend: catch exception
            Backend->>Backend: disable_gpu()
            Backend->>GGML: llama_backend_init() [CPU]
            GGML-->>Backend: CPU mode ready
        end
    else GPU disabled or unavailable
        GGML->>GGML: Use CPU backend
    end
    Backend-->>Caller: Backend ready
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

This PR spans multiple subsystems (C++, JNI, Kotlin, CMake, shell scripts) with heterogeneous change types. The Vulkan detection logic with Android fork-based safety testing, GPU state management with exception-driven fallback in the backend, and cross-language integration (C++↔JNI↔Kotlin) require careful reasoning across components. While many changes are straightforward (new files, APIs), the logic density in vulkan_detector.cpp and rac_llm_llamacpp.cpp, combined with the variety of integration points, necessitates thorough cross-system review.

Possibly related PRs

Suggested labels

kotlin-sdk

Poem

🐰 A Vulkan dream on Android's stage,
GPU shaders dance across the page,
Yet caution rings—fork-safety's call,
CPU fallback catches all!
Device detection, swift and bright,
(But warnings whisper: use with might.) 🔋

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(android): add Vulkan GPU acceleration support (experimental)' is clear, specific, and directly describes the main change: adding experimental Vulkan GPU acceleration to the Android SDK.
Description check ✅ Passed Pull request description is comprehensive and well-structured. It includes clear motivation, detailed enumeration of changes across build system, C++ implementation, Kotlin SDK, and scripts, build verification results, runtime testing outcomes, known issues, and explicit recommendation against production use.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30 files reviewed, 7 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 232 to 235
#ifdef __ANDROID__
GPU_LOG("Android detected - forcing CPU mode (n_gpu_layers=0) to avoid Vulkan crash");
model_params.n_gpu_layers = 0;
LOGI("Loading model with CPU backend (Android Vulkan workaround)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU is unconditionally disabled on Android (lines 232-235), which contradicts the entire PR purpose. This means Vulkan GPU acceleration is never used in production, even on devices where it might work.

If the intent is to disable GPU due to driver issues, this should be controlled by a runtime flag or configuration option, not hardcoded. The PR description states "DO NOT MERGE for production use," so this is expected behavior for experimental code.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
Line: 232:235

Comment:
GPU is unconditionally disabled on Android (lines 232-235), which contradicts the entire PR purpose. This means Vulkan GPU acceleration is never used in production, even on devices where it might work.

If the intent is to disable GPU due to driver issues, this should be controlled by a runtime flag or configuration option, not hardcoded. The PR description states "DO NOT MERGE for production use," so this is expected behavior for experimental code.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — GPU is intentionally disabled on Android to prevent runtime crashes.
This PR documents integration feasibility rather than enabling production GPU usage.
Open to refactoring this into a runtime flag if maintainers prefer.

Comment on lines +53 to +57
// CRITICAL: Disable Vulkan IMMEDIATELY on library load
// This prevents crash in llama_backend_init() before our code runs
__android_log_print(ANDROID_LOG_WARN, "RAC_GPU_STATUS", "JNI_OnLoad: Disabling Vulkan to prevent crash");
setenv("GGML_VK_DISABLE", "1", 1);
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting GGML_VK_DISABLE=1 in JNI_OnLoad globally disables Vulkan for the entire process. Combined with the hardcoded GPU disable in llamacpp_backend.cpp:233, this creates triple-redundancy that prevents Vulkan from ever being used.

The environment variable approach is the correct safety mechanism, but it should be conditional based on device compatibility checks rather than always enabled.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp
Line: 53:57

Comment:
Setting `GGML_VK_DISABLE=1` in `JNI_OnLoad` globally disables Vulkan for the entire process. Combined with the hardcoded GPU disable in `llamacpp_backend.cpp:233`, this creates triple-redundancy that prevents Vulkan from ever being used.

The environment variable approach is the correct safety mechanism, but it should be conditional based on device compatibility checks rather than always enabled.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — this is intentional for the current experimental phase.

The triple-layer disable (JNI_OnLoad + backend guard + CPU fallback) was added deliberately to guarantee crash-free behavior on Android devices with unstable Vulkan drivers.

During testing, driver crashes were occurring inside llama_backend_init() before any higher-level logic could intervene. Setting GGML_VK_DISABLE in JNI_OnLoad ensures Vulkan is disabled before static initialization runs.

This PR focuses on documenting integration feasibility and driver limitations rather than enabling production GPU usage.

That said, I agree this should evolve into a conditional mechanism (e.g., device whitelist or cached compatibility check) if Vulkan support becomes viable in the future.

Happy to refactor toward a runtime flag–controlled approach if maintainers prefer.

Comment on lines +31 to +94
static bool test_vulkan_in_child_process() {
RAC_LOG_INFO("VulkanDetector", "Testing Vulkan in child process (safe mode)...");

pid_t pid = fork();

if (pid == -1) {
RAC_LOG_ERROR("VulkanDetector", "Fork failed, assuming Vulkan unavailable");
return false;
}

if (pid == 0) {
// Child process - test Vulkan
// If driver crashes here, only child dies

#ifdef GGML_USE_VULKAN
VkInstance instance = VK_NULL_HANDLE;
VkInstanceCreateInfo create_info = {};
create_info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;

VkResult result = vkCreateInstance(&create_info, nullptr, &instance);
if (result != VK_SUCCESS) {
_exit(1); // Failed
}

uint32_t device_count = 0;
vkEnumeratePhysicalDevices(instance, &device_count, nullptr);

if (device_count == 0) {
vkDestroyInstance(instance, nullptr);
_exit(1); // No devices
}

// Test actual device creation (this is where buggy drivers crash)
std::vector<VkPhysicalDevice> devices(device_count);
vkEnumeratePhysicalDevices(instance, &device_count, devices.data());

// Try to create a logical device (fence creation happens here)
VkDeviceQueueCreateInfo queue_info = {};
queue_info.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
queue_info.queueFamilyIndex = 0;
queue_info.queueCount = 1;
float priority = 1.0f;
queue_info.pQueuePriorities = &priority;

VkDeviceCreateInfo device_info = {};
device_info.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
device_info.queueCreateInfoCount = 1;
device_info.pQueueCreateInfos = &queue_info;

VkDevice device = VK_NULL_HANDLE;
result = vkCreateDevice(devices[0], &device_info, nullptr, &device);

if (device != VK_NULL_HANDLE) {
vkDestroyDevice(device, nullptr);
}

vkDestroyInstance(instance, nullptr);

// If we got here without crashing, Vulkan works!
_exit(result == VK_SUCCESS ? 0 : 1);
#else
_exit(1); // Vulkan not compiled
#endif
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fork-based Vulkan detection is creative but has potential issues:

  1. Resource leak risk: If parent terminates before waitpid() completes, child becomes zombie
  2. Signal handling: Parent should handle SIGCHLD or use WNOHANG to prevent blocking indefinitely
  3. Performance: Fork creates full process copy; this detection runs on every app start
  4. Android 10+ restrictions: Some Android security policies restrict fork() in certain contexts

Consider caching the detection result in shared preferences to avoid repeated expensive fork operations.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp
Line: 31:94

Comment:
Fork-based Vulkan detection is creative but has potential issues:

1. **Resource leak risk**: If parent terminates before `waitpid()` completes, child becomes zombie
2. **Signal handling**: Parent should handle `SIGCHLD` or use `WNOHANG` to prevent blocking indefinitely
3. **Performance**: Fork creates full process copy; this detection runs on every app start
4. **Android 10+ restrictions**: Some Android security policies restrict `fork()` in certain contexts

Consider caching the detection result in shared preferences to avoid repeated expensive fork operations.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — fork() in ART environment has risks.
Used here purely for experimental isolation of driver crashes.
Happy to explore safer probing strategies if this direction is valuable.

Comment on lines +64 to +65
std::vector<VkPhysicalDevice> devices(device_count);
vkEnumeratePhysicalDevices(instance, &device_count, devices.data());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory allocated for devices vector is leaked if child process crashes before reaching cleanup. Use RAII or ensure cleanup in all exit paths.

Suggested change
std::vector<VkPhysicalDevice> devices(device_count);
vkEnumeratePhysicalDevices(instance, &device_count, devices.data());
std::vector<VkPhysicalDevice> devices(device_count);
vkEnumeratePhysicalDevices(instance, &device_count, devices.data());
VkDevice device = VK_NULL_HANDLE;
// ... rest of code ...
if (device != VK_NULL_HANDLE) {
vkDestroyDevice(device, nullptr);
}
if (instance != VK_NULL_HANDLE) {
vkDestroyInstance(instance, nullptr);
}
_exit(result == VK_SUCCESS ? 0 : 1);

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp
Line: 64:65

Comment:
Memory allocated for `devices` vector is leaked if child process crashes before reaching cleanup. Use RAII or ensure cleanup in all exit paths.

```suggestion
        std::vector<VkPhysicalDevice> devices(device_count);
        vkEnumeratePhysicalDevices(instance, &device_count, devices.data());
        
        VkDevice device = VK_NULL_HANDLE;
        // ... rest of code ...
        
        if (device != VK_NULL_HANDLE) {
            vkDestroyDevice(device, nullptr);
        }
        if (instance != VK_NULL_HANDLE) {
            vkDestroyInstance(instance, nullptr);
        }
        _exit(result == VK_SUCCESS ? 0 : 1);
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 56 to 107
static bool probe_vulkan_safe() {
#ifdef __ANDROID__
// Try to dlopen libvulkan.so and call vkEnumerateInstanceVersion
void* vk_lib = dlopen("libvulkan.so", RTLD_NOW | RTLD_LOCAL);
if (!vk_lib) {
GPU_LOG("Vulkan probe: libvulkan.so not found");
return false;
}
if (msg.empty())
return;

if (level == GGML_LOG_LEVEL_ERROR) {
RAC_LOG_ERROR("LLM.LlamaCpp.GGML", "%s", msg.c_str());
} else if (level == GGML_LOG_LEVEL_WARN) {
RAC_LOG_WARNING("LLM.LlamaCpp.GGML", "%s", msg.c_str());
} else if (level == GGML_LOG_LEVEL_INFO) {
RAC_LOG_DEBUG("LLM.LlamaCpp.GGML", "%s", msg.c_str());
// Try vkEnumeratePhysicalDevices to see if there's actually a GPU
typedef int (*PFN_vkCreateInstance)(const void*, const void*, void*);
typedef void (*PFN_vkDestroyInstance)(void*, const void*);
typedef int (*PFN_vkEnumeratePhysicalDevices)(void*, uint32_t*, void*);

auto fn_create = (PFN_vkCreateInstance)dlsym(vk_lib, "vkCreateInstance");
auto fn_destroy = (PFN_vkDestroyInstance)dlsym(vk_lib, "vkDestroyInstance");
auto fn_enum = (PFN_vkEnumeratePhysicalDevices)dlsym(vk_lib, "vkEnumeratePhysicalDevices");

if (!fn_create || !fn_destroy || !fn_enum) {
GPU_LOG("Vulkan probe: missing symbols");
dlclose(vk_lib);
return false;
}
}

// =============================================================================
// LLAMACPP BACKEND IMPLEMENTATION
// =============================================================================
// Minimal VkApplicationInfo + VkInstanceCreateInfo (packed structs)
struct { uint32_t sType; const void* pNext; const char* name; uint32_t ver; const char* ename; uint32_t ever; uint32_t api; } app_info = {0, nullptr, "probe", 1, "probe", 1, (1u << 22) | (0 << 12)};
struct { uint32_t sType; const void* pNext; uint32_t flags; const void* pApp; uint32_t elc; const char* const* el; uint32_t exc; const char* const* ex; } create_info = {1, nullptr, 0, &app_info, 0, nullptr, 0, nullptr};

LlamaCppBackend::LlamaCppBackend() {
LOGI("LlamaCppBackend created");
}
void* instance = nullptr;
int result = fn_create(&create_info, nullptr, &instance);
if (result != 0 || !instance) {
GPU_LOG("Vulkan probe: vkCreateInstance failed (%d)", result);
dlclose(vk_lib);
return false;
}

uint32_t gpu_count = 0;
fn_enum(instance, &gpu_count, nullptr);
fn_destroy(instance, nullptr);
dlclose(vk_lib);

LlamaCppBackend::~LlamaCppBackend() {
cleanup();
LOGI("LlamaCppBackend destroyed");
if (gpu_count == 0) {
GPU_LOG("Vulkan probe: no physical devices found");
return false;
}

GPU_LOG("Vulkan probe: found %u GPU(s) - Vulkan OK", gpu_count);
return true;
#else
return true; // On desktop, assume Vulkan works
#endif
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The probe_vulkan_safe() function creates a minimal Vulkan instance but uses packed structs with hard-coded values that may not match actual Vulkan struct layouts. This is fragile and could fail on different Vulkan implementations.

Better approach: Use proper Vulkan SDK types from <vulkan/vulkan.h> instead of manually packed structs on lines 81-82.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
Line: 56:107

Comment:
The `probe_vulkan_safe()` function creates a minimal Vulkan instance but uses packed structs with hard-coded values that may not match actual Vulkan struct layouts. This is fragile and could fail on different Vulkan implementations.

Better approach: Use proper Vulkan SDK types from `<vulkan/vulkan.h>` instead of manually packed structs on lines 81-82.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +72 to +91
} catch (const std::exception& e) {
// Vulkan exception during llama_backend_init() - retry with GPU disabled
#ifdef __ANDROID__
__android_log_print(ANDROID_LOG_ERROR, "RAC_GPU_STATUS",
"Backend init exception (Vulkan crash): %s - retrying CPU only", e.what());
#endif
handle->backend = std::make_unique<runanywhere::LlamaCppBackend>();
handle->backend->disable_gpu();
try {
if (!handle->backend->initialize(init_config)) {
delete handle;
rac_error_set_details("Failed to initialize LlamaCPP backend (CPU fallback)");
return RAC_ERROR_BACKEND_INIT_FAILED;
}
} catch (...) {
delete handle;
rac_error_set_details("LlamaCPP backend init failed even in CPU mode");
return RAC_ERROR_BACKEND_INIT_FAILED;
}
} catch (...) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception handling creates a new LlamaCppBackend instance on failure but doesn't clean up the failed instance properly. The original handle->backend unique_ptr will be destroyed when reassigned, but if llama_backend_init() partially succeeded, there could be leaked Vulkan resources.

Ensure llama_backend_free() is called before recreating the backend.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp
Line: 72:91

Comment:
Exception handling creates a new `LlamaCppBackend` instance on failure but doesn't clean up the failed instance properly. The original `handle->backend` unique_ptr will be destroyed when reassigned, but if `llama_backend_init()` partially succeeded, there could be leaked Vulkan resources.

Ensure `llama_backend_free()` is called before recreating the backend.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 228 to 247
bool gpu_attempted = false;

// CRITICAL FIX: Force CPU mode on Android to avoid Vulkan crash
// Mali-G76 MC4 and other mobile GPUs have buggy Vulkan drivers
#ifdef __ANDROID__
GPU_LOG("Android detected - forcing CPU mode (n_gpu_layers=0) to avoid Vulkan crash");
model_params.n_gpu_layers = 0;
LOGI("Loading model with CPU backend (Android Vulkan workaround)");
#else
if (backend_->is_using_gpu()) {
gpu_attempted = true;
model_params.n_gpu_layers = 999;
GPU_LOG("LOADING MODEL WITH GPU: n_gpu_layers=999, device=%s", backend_->get_gpu_device_name().c_str());
LOGI("Attempting GPU acceleration, device=%s, layers=ALL(999)", backend_->get_gpu_device_name().c_str());
} else {
model_params.n_gpu_layers = 0;
GPU_LOG("LOADING MODEL WITH CPU ONLY: n_gpu_layers=0");
LOGI("Loading model with CPU backend");
}
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable gpu_attempted is set to true only on non-Android platforms (line 238), but the CPU fallback logic on line 265 checks this variable. On Android, GPU is never attempted, so the fallback will never trigger even if Vulkan is somehow enabled.

This creates dead code and incorrect logging messages.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
Line: 228:247

Comment:
The variable `gpu_attempted` is set to `true` only on non-Android platforms (line 238), but the CPU fallback logic on line 265 checks this variable. On Android, GPU is never attempted, so the fallback will never trigger even if Vulkan is somehow enabled.

This creates dead code and incorrect logging messages.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds experimental Vulkan GPU acceleration support to the RunAnywhere Android SDK for potential 2-3x performance improvements. The build is verified and functional with all Vulkan symbols present and 1435 shaders compiled. However, runtime testing on physical devices reveals critical driver compatibility issues causing crashes on 70-80% of Android devices (Mali GPUs, budget Adreno) and incorrect output on high-end devices. The author explicitly recommends NOT merging for production use.

Changes:

  • Added Vulkan GPU acceleration infrastructure with detection, fallback mechanisms, and JNI bridges
  • Integrated llama.cpp b7935 Vulkan backend with 1435 compiled shaders and Android API 26+ compatibility fixes
  • Created comprehensive testing and verification scripts with detailed documentation of known issues

Reviewed changes

Copilot reviewed 25 out of 30 changed files in this pull request and generated no comments.

Show a summary per file
File Description
RunAnywhere+DeviceInfo.jvmAndroid.kt Implements device info API with GPU detection via JNI bridge
GPUInfo.kt Platform-agnostic GPU information data class with validation methods
DeviceCapabilities.kt GPU detection singleton with caching and JNI calls to native Vulkan detector
RunAnywhere+ModelManagement.kt Adds ModelLoadOptions with GPU configuration parameters
DeviceInfoModels.kt Whitespace-only changes (formatting)
Capability.kt Whitespace-only changes (formatting)
RunAnywhere+DeviceInfo.kt Common API definition for device info across platforms
device_jni.cpp JNI bridge exposing Vulkan GPU detection to Kotlin layer
CMakeLists.txt (jni) Adds device_jni.cpp to build
vulkan_detector.cpp Fork-based safe Vulkan detection with driver crash protection
vulkan_detector.h C++ API for Vulkan device detection
rac_llm_llamacpp.cpp Exception handling for Vulkan crashes with CPU fallback
patch_vulkan.py Python script to patch llama.cpp Vulkan CMake for cross-compilation
llamacpp_backend.h Adds GPU state tracking and disable methods
llamacpp_backend.cpp Vulkan probing, GPU detection, and Android CPU-only workaround
rac_backend_llamacpp_jni.cpp Disables Vulkan via environment variable in JNI_OnLoad
CMakeLists.txt (llamacpp) Comprehensive Vulkan build integration with shader compilation
generate-shader-aggregates.py Python script to generate shader data aggregates
compile-all-vulkan-shaders.sh Bash script for parallel shader compilation
FetchVulkanHpp.cmake Downloads Vulkan C++ headers (Android NDK only has C headers)
VERSIONS Updates Android min SDK to 26 and llama.cpp to b7935
CMakeLists.txt (commons) Adds vulkan_detector.cpp and updates C++ standard to C++20
verify-vulkan-build.sh Static verification script for Vulkan build
test-vulkan-on-device.sh Device runtime testing script
VULKAN_TEST_RESULTS.md Comprehensive test results documentation
VULKAN_ANDROID_CONTRIBUTION.md Complete implementation guide

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents
In `@sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp`:
- Around line 31-126: The use of fork() inside test_vulkan_in_child_process is
unsafe on Android JNI and can deadlock when the child uses non-async-signal-safe
calls (e.g., std::vector allocations, vkCreateInstance); replace the fork-based
probe with a safer approach: either move the probe into Java/Kotlin and spawn a
separate process via android.os.Process/Runtime.exec, or switch to posix_spawn
(with POSIX_SPAWN_USEVFORK) instead of fork, or simplify the probe to call
vkEnumerateInstanceVersion (a lightweight, instance-less check) from the
existing detection flow; update/remove fork(), waitpid(), child-only Vulkan code
paths in test_vulkan_in_child_process and ensure the calling code handles the
new boolean result accordingly.

In
`@sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeLLM.kt`:
- Around line 36-298: This file places core business logic in a platform source
set; move all non-JNI business logic into commonMain: relocate CppBridgeLLM's
state/type definitions (LLMState, GenerationMode, StopReason), data classes
(GenerationConfig, ModelConfig, GenerationResult), and any interfaces/protocols
(LLMListener, StreamCallback) plus lifecycle semantics (isRegistered, state,
loadedModelId, checkNativeLibrary, shared) into commonMain; leave only
JNI/platform-specific glue in jvmAndroidMain by converting native interop points
here into actual implementations of expect declarations you define in commonMain
(use expect for functions called from common code and implement them in this
file), and remove any non-platform logic from jvmAndroidMain so it contains only
JNI marshaling, Java/Kotlin type conversions, and `@JvmStatic` callback adapters.

In
`@sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/platform/DeviceCapabilities.kt`:
- Around line 143-158: The external Kotlin JNI names include a "native" prefix
that doesn't match the C++ JNI functions; rename the Kotlin externals
nativeDetectVulkanGPU, nativeIsVulkanSupported, and nativeListVulkanDevices to
detectVulkanGPU, isVulkanSupported, and listVulkanDevices respectively (and
update all call sites that invoke those symbols) so the mangled JNI names match
the C++ implementations; verify the Kotlin method signatures (return types and
parameters for GPUInfo and List<GPUInfo>) match the C++ JNI signatures after
renaming.
🟠 Major comments (23)
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeLLM.kt-1166-1240 (1)

1166-1240: 🛠️ Refactor suggestion | 🟠 Major

Replace manual JSON parsing with kotlinx.serialization.

The manual regex-based JSON parsing (lines 1166-1216) and escape/unescape functions (lines 1221-1240) are error-prone and don't handle edge cases like nested escapes, unicode sequences, or malformed JSON. The unescape function (lines 1233-1240) processes replacements in the wrong order, which will incorrectly handle \\n (literal backslash-n).

As per coding guidelines, prefer well-known libraries over rolling your own for common tasks.

♻️ Recommended refactor using kotlinx.serialization

Define a serializable result class:

`@Serializable`
data class NativeGenerationResult(
    val text: String,
    `@SerialName`("tokens_generated") val tokensGenerated: Int,
    `@SerialName`("tokens_evaluated") val tokensEvaluated: Int,
    `@SerialName`("stop_reason") val stopReason: Int,
    `@SerialName`("tokens_per_second") val tokensPerSecond: Float = 0f
)

Parse with:

private fun parseGenerationResult(json: String, elapsedMs: Long): GenerationResult {
    val native = Json.decodeFromString<NativeGenerationResult>(json)
    return GenerationResult(
        text = native.text,
        tokensGenerated = native.tokensGenerated,
        tokensEvaluated = native.tokensEvaluated,
        stopReason = native.stopReason,
        generationTimeMs = elapsedMs,
        tokensPerSecond = if (elapsedMs > 0) native.tokensGenerated * 1000f / elapsedMs else native.tokensPerSecond
    )
}

As per coding guidelines: "Prefer well-known libraries/frameworks over 'rolling your own' for common tasks."

sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeLLM.kt-479-884 (1)

479-884: 🛠️ Refactor suggestion | 🟠 Major

Replace blocking operations with suspend functions and coroutines.

Operations like create(), loadModel(), generate(), generateStream(), unload(), and destroy() perform I/O-bound work (model loading, inference) but use synchronous blocking with synchronized blocks. This violates the coding guidelines and can cause thread pool exhaustion under load.

Recommended approach:

  • Convert these functions to suspend fun
  • Use withContext(Dispatchers.IO) for JNI calls that may block
  • Replace synchronized blocks with Mutex from kotlinx.coroutines
  • Use Flow for streaming instead of callback-based approach

Example for loadModel:

suspend fun loadModel(
    modelPath: String, 
    modelId: String, 
    modelName: String? = null, 
    config: ModelConfig = ModelConfig.DEFAULT
): Int = withContext(Dispatchers.IO) {
    mutex.withLock {
        // existing logic
    }
}

As per coding guidelines: "Use coroutines and suspend functions instead of async/await patterns for asynchronous operations" and "Use Flow instead of AsyncSequence for reactive streams in Kotlin Multiplatform code."

sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeLLM.kt-547-577 (1)

547-577: ⚠️ Potential issue | 🟠 Major

GPU configuration is accepted but completely ignored in loadModel.

The ModelConfig parameter (with gpuLayers for GPU offloading) is accepted at the method signature but never passed to the native layer. Line 576 passes only modelPath, modelId, and modelName to racLlmComponentLoadModel, while the config parameter is suppressed with @Suppress("UNUSED_PARAMETER").

This is inconsistent with other components: the VAD bridge passes config directly to racVadComponentLoadModel, and LLM's own generate methods pass config to the native layer. The native racLlmComponentConfigure method exists but is never called for model loading.

Pass the config to a post-load configuration call: After loading, call RunAnywhereBridge.racLlmComponentConfigure(handle, config.toJson()) to apply GPU settings, or remove the config parameter from loadModel if GPU configuration is not intended for model loading.

sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeDevice.kt-1015-1019 (1)

1015-1019: ⚠️ Potential issue | 🟠 Major

Avoid logging the full device ID at INFO level — downgrade to DEBUG.

The device ID is a persistent identifier that could be used for user tracking. Logging it at INFO level means it will appear in standard log output on production builds. Consider using DEBUG level to reduce exposure.

🛡️ Suggested fix
         CppBridgePlatformAdapter.logCallback(
-            CppBridgePlatformAdapter.LogLevel.INFO,
+            CppBridgePlatformAdapter.LogLevel.DEBUG,
             TAG,
             "Generated new device ID: $newId",
         )
sdk/runanywhere-commons/scripts/compile-all-vulkan-shaders.sh-68-75 (1)

68-75: ⚠️ Potential issue | 🟠 Major

Script exits 0 even when shaders fail to compile.

After printing the summary with FAILED count, the script proceeds to generate aggregate files and exits successfully. If any shaders failed, the generated output will be incomplete. Exit non-zero when FAILED > 0.

🐛 Suggested fix
 echo "Failed: $FAILED"
 echo ""
+
+if [ "$FAILED" -gt 0 ]; then
+    echo "❌ Some shaders failed to compile. Aborting."
+    exit 1
+fi
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeDevice.kt-401-412 (1)

401-412: ⚠️ Potential issue | 🟠 Major

Device registration request body may contain PII — avoid logging even a preview.

bodyPreview logs up to 200 characters of the registration body at DEBUG level. Since this is device registration JSON, it may include device IDs, model names, and other identifying information. Consider removing the body preview or redacting it to just log the body length.

🛡️ Suggested fix
-                        val bodyPreview = if (body.length > 200) body.substring(0, 200) + "..." else body
                         CppBridgePlatformAdapter.logCallback(
                             CppBridgePlatformAdapter.LogLevel.DEBUG,
                             TAG,
-                            "Device registration body: $bodyPreview",
+                            "Device registration body length: ${body.length}",
                         )
sdk/runanywhere-commons/scripts/generate-shader-aggregates.py-54-93 (1)

54-93: ⚠️ Potential issue | 🟠 Major

Aggregate arrays reference shader symbols that may not exist — no validation.

The script generates aggregate arrays (lines 54–93) that hardcoded symbols like add_f32_f32_f32_data and add_f32_f32_f32_rte_len. These assume every combination of {op} × {suffixes[t0]} × {suffixes[t1]} × {suffixes[t2]} × {rte} has a matching .spv file. If any variant is missing, the generated .cpp will fail to compile with undefined symbol errors.

Additionally, shader names are derived directly from file stems (line 29), so filenames containing hyphens or spaces would produce invalid C identifiers in the generated code.

Validate that each expected shader symbol exists before writing the aggregate arrays, or detect and skip missing variants.

sdk/runanywhere-commons/VERSIONS-25-26 (1)

25-26: ⚠️ Potential issue | 🟠 Major

Gradle files still hardcode minSdk = 24 while VERSIONS specifies 26 — complete the migration.

VERSIONS has been updated to ANDROID_MIN_SDK=26 for Vulkan 1.1 support, but gradle build files across kotlin, flutter, and react-native modules continue to hardcode minSdk = 24. This violates the guideline requiring version management to be centralized in VERSIONS and sourced by all build scripts.

The minSdk bump to 26 is justified — Vulkan 1.1 requires API 26+ for vkGetPhysicalDeviceFeatures2. However, gradle files must be updated to source from VERSIONS instead of hardcoding. Vulkan is production-ready with full GPU detection, capability reporting, and ML inference acceleration; this is not an experimental feature.

Update gradle files to source ANDROID_MIN_SDK from VERSIONS or apply the value consistently across all build systems.

sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+ModelManagement.jvmAndroid.kt-430-433 (1)

430-433: ⚠️ Potential issue | 🟠 Major

Global singleton listener is not safe for concurrent downloads.

CppBridgeDownload.downloadListener is set to a new listener per downloadModel call (Line 433) and cleared in finally (Line 526). If two downloads run concurrently, the second call overwrites the first's listener, and the finally block of whichever finishes first nulls it out for both.

Consider using a per-download-ID listener map, or guard against concurrent downloads at a higher level.

sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+ModelManagement.jvmAndroid.kt-774-780 (1)

774-780: ⚠️ Potential issue | 🟠 Major

cancelDownload does not actually cancel the in-progress download.

This only clears the local path in the registry but never calls any cancellation API on CppBridgeDownload. The download continues running, consuming bandwidth and storage, while the caller believes it was cancelled.

Proposed fix sketch
 actual suspend fun RunAnywhere.cancelDownload(modelId: String) {
     if (!isInitialized) {
         throw SDKError.notInitialized("SDK not initialized")
     }
+    CppBridgeDownload.cancelDownload(modelId)
     // Update C++ registry to mark download cancelled
     CppBridgeModelRegistry.updateDownloadStatus(modelId, null)
 }
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+ModelManagement.jvmAndroid.kt-797-810 (1)

797-810: ⚠️ Potential issue | 🟠 Major

deleteAllModels and refreshModelRegistry are no-op stubs exposed as public API.

deleteAllModels() silently does nothing and refreshModelRegistry() has a TODO comment. Callers will assume these work. At minimum, throw NotImplementedError or log a warning so failures aren't silent.

Proposed fix
 actual suspend fun RunAnywhere.deleteAllModels() {
     if (!isInitialized) {
         throw SDKError.notInitialized("SDK not initialized")
     }
-    // Would need to parse and delete each - simplified
+    val models = CppBridgeModelRegistry.getDownloaded()
+    for (model in models) {
+        CppBridgeModelRegistry.remove(model.modelId)
+    }
+    synchronized(modelCacheLock) {
+        registeredModels.clear()
+    }
 }
 
 actual suspend fun RunAnywhere.refreshModelRegistry() {
     if (!isInitialized) {
         throw SDKError.notInitialized("SDK not initialized")
     }
-    // Trigger registry refresh via native call
-    // TODO: Implement via CppBridge
+    modelsLogger.warn("refreshModelRegistry is not yet implemented")
+    // TODO: Implement via CppBridge — tracked in issue `#XXX`
 }
sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp-236-242 (1)

236-242: ⚠️ Potential issue | 🟠 Major

is_vulkan_supported() calls detect() every time — expensive and forks on Android.

Each call to is_vulkan_supported() calls detect(), which on Android forks a child process and creates a full Vulkan instance. This should be cached after the first call.

Proposed caching approach
 bool VulkanDetector::is_vulkan_supported() {
 `#ifdef` GGML_USE_VULKAN
-    return detect().is_available;
+    static bool cached = false;
+    static bool result = false;
+    if (!cached) {
+        result = detect().is_available;
+        cached = true;
+    }
+    return result;
 `#else`
     return false;
 `#endif`
 }

Consider also caching detect() results, with appropriate thread-safety (e.g., std::call_once).

sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp-128-234 (1)

128-234: ⚠️ Potential issue | 🟠 Major

Vulkan instance leaks on exception in detect().

If an exception is thrown after vkCreateInstance succeeds (Line 153) but before vkDestroyInstance (Line 221), the Vulkan instance handle is leaked. The catch blocks reset info but never destroy instance.

Proposed fix using RAII or explicit cleanup
     } catch (const std::exception& e) {
         RAC_LOG_ERROR("VulkanDetector", "Exception during Vulkan detection: %s", e.what());
+        if (instance != VK_NULL_HANDLE) {
+            vkDestroyInstance(instance, nullptr);
+        }
         info = VulkanDeviceInfo();  // Reset to default (not available)
     } catch (...) {
         RAC_LOG_ERROR("VulkanDetector", "Unknown exception during Vulkan detection");
+        if (instance != VK_NULL_HANDLE) {
+            vkDestroyInstance(instance, nullptr);
+        }
         info = VulkanDeviceInfo();  // Reset to default (not available)
     }

Better yet, use a scope guard or RAII wrapper for the Vulkan instance.

sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp-52-57 (1)

52-57: ⚠️ Potential issue | 🟠 Major

Unconditionally disabling Vulkan in JNI_OnLoad negates the entire GPU acceleration feature.

Setting GGML_VK_DISABLE=1 at library load time means Vulkan can never be used at runtime, regardless of device capability. This effectively makes all the Vulkan detection, GPU layer offloading, and fallback logic dead code. The PR acknowledges this is a safety measure due to widespread driver crashes, but the approach is opaque — downstream developers loading this library will have no way to opt in to Vulkan even on known-good devices.

Consider adding a check for a user-settable property or environment variable (e.g., GGML_VK_FORCE_ENABLE) so that developers can override this during testing, or move the decision to the Kotlin layer where DeviceCapabilities.shouldUseGPU() already exists.

sdk/runanywhere-commons/src/backends/llamacpp/patch_vulkan.py-73-100 (1)

73-100: ⚠️ Potential issue | 🟠 Major

Patch 5 comments out all add_custom_command blocks, not just shader-related ones.

The comment on line 73 says "that depend on vulkan-shaders-gen," but the code on line 81 only checks for the presence of add_custom_command — it doesn't verify that the block references vulkan-shaders-gen. This will silently disable any unrelated custom commands in the Vulkan CMake file.

🐛 Proposed fix: filter for vulkan-shaders-gen
 for line in lines:
     # Check if we're starting an add_custom_command
-    if 'add_custom_command' in line and not line.strip().startswith('#'):
+    if 'add_custom_command' in line and 'vulkan-shaders-gen' in line and not line.strip().startswith('#'):
         in_custom_command = True

Note: If the vulkan-shaders-gen reference doesn't appear on the opening add_custom_command line itself, you may need a two-pass approach: first identify all add_custom_command blocks, then check if the block body references vulkan-shaders-gen before commenting it out.

sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp-66-109 (1)

66-109: ⚠️ Potential issue | 🟠 Major

Resource leak: if std::make_unique throws inside a catch handler, handle is leaked.

On lines 78 and 96, std::make_unique<runanywhere::LlamaCppBackend>() can throw std::bad_alloc. Since handle is a raw pointer (allocated at line 50), an exception here escapes the catch handler and handle is never freed.

Additionally, the catch(const std::exception&) and catch(...) branches (lines 72–109) contain nearly identical fallback logic. This duplication increases maintenance risk and makes the leak harder to spot.

🐛 Proposed fix: use a helper lambda and guard the raw pointer
+    // Helper: retry initialization in CPU-only mode
+    auto retry_cpu_init = [&]() -> bool {
+        try {
+            handle->backend = std::make_unique<runanywhere::LlamaCppBackend>();
+            handle->backend->disable_gpu();
+            return handle->backend->initialize(init_config);
+        } catch (...) {
+            return false;
+        }
+    };
+
     // Initialize backend - wrap in try-catch for Vulkan exceptions
     try {
         if (!handle->backend->initialize(init_config)) {
             delete handle;
             rac_error_set_details("Failed to initialize LlamaCPP backend");
             return RAC_ERROR_BACKEND_INIT_FAILED;
         }
     } catch (const std::exception& e) {
-        // Vulkan exception during llama_backend_init() - retry with GPU disabled
 `#ifdef` __ANDROID__
         __android_log_print(ANDROID_LOG_ERROR, "RAC_GPU_STATUS",
             "Backend init exception (Vulkan crash): %s - retrying CPU only", e.what());
 `#endif`
-        handle->backend = std::make_unique<runanywhere::LlamaCppBackend>();
-        handle->backend->disable_gpu();
-        try {
-            if (!handle->backend->initialize(init_config)) {
-                delete handle;
-                rac_error_set_details("Failed to initialize LlamaCPP backend (CPU fallback)");
-                return RAC_ERROR_BACKEND_INIT_FAILED;
-            }
-        } catch (...) {
+        if (!retry_cpu_init()) {
             delete handle;
-            rac_error_set_details("LlamaCPP backend init failed even in CPU mode");
+            rac_error_set_details("Failed to initialize LlamaCPP backend (CPU fallback)");
             return RAC_ERROR_BACKEND_INIT_FAILED;
         }
     } catch (...) {
 `#ifdef` __ANDROID__
         __android_log_print(ANDROID_LOG_ERROR, "RAC_GPU_STATUS",
             "Backend init unknown exception - retrying CPU only");
 `#endif`
-        handle->backend = std::make_unique<runanywhere::LlamaCppBackend>();
-        handle->backend->disable_gpu();
-        try {
-            if (!handle->backend->initialize(init_config)) {
-                delete handle;
-                rac_error_set_details("Failed to initialize LlamaCPP backend (CPU fallback)");
-                return RAC_ERROR_BACKEND_INIT_FAILED;
-            }
-        } catch (...) {
+        if (!retry_cpu_init()) {
             delete handle;
-            rac_error_set_details("LlamaCPP backend init failed even in CPU mode");
+            rac_error_set_details("Failed to initialize LlamaCPP backend (CPU fallback)");
             return RAC_ERROR_BACKEND_INIT_FAILED;
         }
     }
sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt-155-227 (1)

155-227: 🛠️ Refactor suggestion | 🟠 Major

Duplicate shader compilation: once at configure-time (Lines 155–227) and again via add_custom_command at build-time (Lines 325–331).

Shaders are compiled via execute_process during CMake configure, then an add_custom_command with the same output files is added for build-time. The build-time command uses a different script (compile-shaders-during-build.sh) than the configure-time one (compile-shaders-fast.sh + generate-shader-cpp.py). If the configure-time compilation succeeds, the build-time command won't trigger (outputs already exist). If it does trigger, it may produce different results. This dual approach is confusing and error-prone — consider consolidating into one path.

Also applies to: 317-337

sdk/runanywhere-commons/CMakeLists.txt-41-43 (1)

41-43: ⚠️ Potential issue | 🟠 Major

C++ standard mismatch: CMAKE_CXX_STANDARD 20 (Line 42) vs target_compile_features cxx_std_17 (Line 261).

Line 42 sets the project-wide default to C++20 (for Vulkan-Hpp), but Line 261 advertises cxx_std_17 as a PUBLIC compile feature on rac_commons. This means downstream consumers of rac_commons only get the C++17 guarantee, while the library itself compiles at C++20. If this upgrade is intentional, the target_compile_features should match:

-target_compile_features(rac_commons PUBLIC cxx_std_17)
+target_compile_features(rac_commons PUBLIC cxx_std_20)

Otherwise, downstream targets that link rac_commons and use its headers may fail if any C++20 constructs leak into the public headers. Also note that the coding guidelines state "C++17 standard required" — if the bump to C++20 is permanent, update the guidelines accordingly.

Also applies to: 261-261

sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt-293-310 (1)

293-310: ⚠️ Potential issue | 🟠 Major

Hardcoded aarch64-linux-android Vulkan lib path — breaks non-arm64 ABIs.

Line 303 constructs the Vulkan library path using aarch64-linux-android, but the build supports armeabi-v7a, x86, and x86_64 ABIs (see NEON toggle at Lines 90–97). Use ANDROID_TOOLCHAIN_ROOT or map ANDROID_ABI to the correct target triple:

# Map ABI to NDK target triple
if(ANDROID_ABI STREQUAL "arm64-v8a")
    set(ANDROID_TARGET_TRIPLE "aarch64-linux-android")
elseif(ANDROID_ABI STREQUAL "armeabi-v7a")
    set(ANDROID_TARGET_TRIPLE "arm-linux-androideabi")
elseif(ANDROID_ABI STREQUAL "x86_64")
    set(ANDROID_TARGET_TRIPLE "x86_64-linux-android")
elseif(ANDROID_ABI STREQUAL "x86")
    set(ANDROID_TARGET_TRIPLE "i686-linux-android")
endif()
sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt-54-55 (1)

54-55: ⚠️ Potential issue | 🟠 Major

Hardcoded Linux host paths — macOS developers cannot build.

Lines 54–55 hardcode /usr/bin/gcc and /usr/bin/g++, and Lines 69–73 only handle WIN32 vs Linux prebuilt paths. macOS (Darwin) is a common Android dev host but has no else branch here. Similarly, Line 162 hardcodes g++ for the shader generator tool, and Lines 178/180/319 assume linux-x86_64 for the NDK glslc path.

Consider detecting the host OS and setting paths accordingly:

if(CMAKE_HOST_SYSTEM_NAME STREQUAL "Darwin")
    set(HOST_PREBUILT_TAG "darwin-x86_64")
elseif(CMAKE_HOST_SYSTEM_NAME STREQUAL "Linux")
    set(HOST_PREBUILT_TAG "linux-x86_64")
elseif(CMAKE_HOST_SYSTEM_NAME STREQUAL "Windows")
    set(HOST_PREBUILT_TAG "windows-x86_64")
endif()

Then use ${HOST_PREBUILT_TAG} in all NDK tool paths, and CMAKE_CXX_COMPILER_LAUNCHER or find_program instead of hardcoding /usr/bin/g++.

Also applies to: 69-73

sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp-265-271 (1)

265-271: ⚠️ Potential issue | 🟠 Major

CPU fallback llama_model_load_from_file is not wrapped in try/catch.

The GPU attempt at Line 252 is wrapped in a try/catch, but the CPU fallback at Line 270 is not. If the fallback also throws (e.g., corrupted model file, OOM), the exception will propagate uncaught through JNI, crashing the app.

🛡️ Proposed fix
     if (!model_ && gpu_attempted) {
         GPU_LOG("GPU LOAD FAILED - falling back to CPU");
         LOGE("GPU model loading failed, falling back to CPU");
         model_params.n_gpu_layers = 0;
         backend_->disable_gpu();
-        model_ = llama_model_load_from_file(model_path.c_str(), model_params);
+        try {
+            model_ = llama_model_load_from_file(model_path.c_str(), model_params);
+        } catch (const std::exception& e) {
+            GPU_LOG("CPU fallback EXCEPTION: %s", e.what());
+            LOGE("CPU fallback exception: %s", e.what());
+            model_ = nullptr;
+        } catch (...) {
+            GPU_LOG("CPU fallback UNKNOWN EXCEPTION");
+            LOGE("Unknown exception during CPU fallback");
+            model_ = nullptr;
+        }
     }
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp-80-82 (1)

80-82: ⚠️ Potential issue | 🟠 Major

Hand-rolled Vulkan struct definitions with hardcoded sType magic numbers are fragile and hard to maintain.

These anonymous structs replicate VkApplicationInfo (sType=0) and VkInstanceCreateInfo (sType=1) with assumed member order and padding. While this works on common ABIs, any Vulkan header change or unusual platform alignment could silently break the probe. Since vulkan_detector.cpp already includes <vulkan/vulkan.h> and performs the same probing more robustly, consider either:

  1. Reusing VulkanDetector::is_vulkan_supported() (already available), or
  2. If you need the dlopen approach, #include <vulkan/vulkan.h> conditionally and use proper struct types.
sdk/runanywhere-commons/src/jni/device_jni.cpp-99-161 (1)

99-161: ⚠️ Potential issue | 🟠 Major

Missing null checks for JNI class/method lookups in listVulkanDevices — potential crash.

array_list_constructor, array_list_add (Line 114–115) and gpu_info_class, gpu_info_constructor (Line 121–126) are used without null checks. If the GPUInfo class is stripped by ProGuard/R8 in a release build, this will dereference null and crash. The detectVulkanGPU function correctly checks these — apply the same pattern here.

🛡️ Proposed fix (partial — around GPUInfo lookup)
     // Find GPUInfo class
     jclass gpu_info_class = env->FindClass("com/runanywhere/sdk/platform/GPUInfo");
+    if (gpu_info_class == nullptr) {
+        RAC_LOG_ERROR("DeviceJNI", "Failed to find GPUInfo class");
+        env->DeleteLocalRef(array_list_class);
+        return nullptr;
+    }
     jmethodID gpu_info_constructor = env->GetMethodID(
         gpu_info_class, 
         "<init>", 
         "(ZLjava/lang/String;Ljava/lang/String;Ljava/lang/String;JZ)V"
     );
+    if (gpu_info_constructor == nullptr) {
+        RAC_LOG_ERROR("DeviceJNI", "Failed to find GPUInfo constructor");
+        env->DeleteLocalRef(gpu_info_class);
+        env->DeleteLocalRef(array_list_class);
+        return nullptr;
+    }
🟡 Minor comments (16)
VULKAN_TEST_RESULTS.md-54-58 (1)

54-58: ⚠️ Potential issue | 🟡 Minor

Add language specifiers to fenced code blocks for better rendering.

Lines 54 and 70 have fenced code blocks without language specifiers. Since these contain crash logs/output, a text or log specifier would satisfy markdownlint (MD040) and improve syntax highlighting.

-```
+```text
 Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
scripts/test-vulkan-on-device.sh-55-65 (1)

55-65: ⚠️ Potential issue | 🟡 Minor

Crash detection only checks for SIGSEGV (signal 11) — other fatal signals will be missed.

Vulkan driver issues can also cause SIGABRT (signal 6), SIGBUS (signal 7), or other fatal signals. Broaden the pattern to catch all fatal signals.

🐛 Suggested fix
 # Check for crash
-if adb logcat -d | grep -q "Fatal signal 11"; then
+if adb logcat -d | grep -q "Fatal signal"; then
     echo "   ❌ CRASH DETECTED"
     echo ""
     echo "   Crash details:"
-    adb logcat -d | grep -A 5 "Fatal signal 11" | head -10
+    adb logcat -d | grep -A 5 "Fatal signal" | head -15
scripts/verify-vulkan-build.sh-20-25 (1)

20-25: ⚠️ Potential issue | 🟡 Minor

Size check is fragile — du -h output format varies and will break for non-MB sizes.

${SIZE%M} strips a trailing "M", but if du -h returns a value in "K" (small file) or "G" (large file), the substitution leaves the unit suffix intact, causing the -lt integer comparison to fail with a bash error (and script exit via set -e).

Use byte-based comparison instead:

🐛 Suggested fix
 echo ""
 echo "1. Checking library size..."
-SIZE=$(du -h "$LIB_PATH" | cut -f1)
-echo "   Library size: $SIZE"
-if [ "${SIZE%M}" -lt 50 ]; then
+SIZE_BYTES=$(stat --format="%s" "$LIB_PATH" 2>/dev/null || stat -f "%z" "$LIB_PATH" 2>/dev/null)
+SIZE_MB=$((SIZE_BYTES / 1024 / 1024))
+echo "   Library size: ${SIZE_MB}MB"
+if [ "$SIZE_MB" -lt 50 ]; then
     echo "   ❌ FAIL: Library too small (expected >50MB with Vulkan)"
     exit 1
 fi
sdk/runanywhere-commons/scripts/generate-shader-aggregates.py-28-45 (1)

28-45: ⚠️ Potential issue | 🟡 Minor

Shader filenames with hyphens or special characters produce invalid C identifiers.

shader_name = spv_file.stem is used directly as a C identifier ({shader_name}_data). SPIR-V shader files often use hyphens in names (e.g., my-shader.spv), which are invalid in C/C++ identifiers. Sanitize the name by replacing non-alphanumeric characters with underscores.

🐛 Suggested fix
+import re
+
+def sanitize_identifier(name):
+    """Replace non-alphanumeric characters with underscores for valid C identifiers."""
+    return re.sub(r'[^a-zA-Z0-9_]', '_', name)
+
 def generate_cpp_with_aggregates(vulkan_dir, output_cpp):
     ...
         for spv_file in spv_files:
-            shader_name = spv_file.stem
+            shader_name = sanitize_identifier(spv_file.stem)
VULKAN_ANDROID_CONTRIBUTION.md-599-599 (1)

599-599: ⚠️ Potential issue | 🟡 Minor

Date appears to be wrong — "February 11, 2025" should likely be 2026.

Given the PR and repo activity context, this seems like a typo.

sdk/runanywhere-commons/include/rac/backends/rac_llm_llamacpp.h-54-55 (1)

54-55: ⚠️ Potential issue | 🟡 Minor

Comment should mention Vulkan alongside Metal.

Line 54 says "Number of layers to offload to GPU (Metal on iOS/macOS)" but this PR adds Vulkan GPU offloading on Android. Update the comment to reflect both acceleration backends.

Proposed fix
-    /** Number of layers to offload to GPU (Metal on iOS/macOS) */
+    /** Number of layers to offload to GPU (Metal on iOS/macOS, Vulkan on Android) */
     int32_t gpu_layers;
sdk/runanywhere-commons/cmake/FetchVulkanHpp.cmake-92-100 (1)

92-100: ⚠️ Potential issue | 🟡 Minor

Missing macOS host prebuilt path for NDK Vulkan headers.

The NDK sysroot path only handles WIN32 and a Linux fallback. Developers building on macOS (common for Android development) won't get the NDK Vulkan C headers path. The macOS prebuilt directory is darwin-x86_64.

Proposed fix
     if(ANDROID_NDK)
-        if(WIN32)
+        if(APPLE)
+            set(NDK_VULKAN_INCLUDE "${ANDROID_NDK}/toolchains/llvm/prebuilt/darwin-x86_64/sysroot/usr/include")
+        elseif(WIN32)
             set(NDK_VULKAN_INCLUDE "${ANDROID_NDK}/toolchains/llvm/prebuilt/windows-x86_64/sysroot/usr/include")
         else()
             set(NDK_VULKAN_INCLUDE "${ANDROID_NDK}/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include")
sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+ModelManagement.kt-67-82 (1)

67-82: ⚠️ Potential issue | 🟡 Minor

downloadSize is assigned the value of memoryRequirement — semantic mismatch.

The memoryRequirement parameter is documented as "Estimated memory usage in bytes" (Line 38), but it's stored in downloadSize (Line 76). These are conceptually different: download size is the file size on disk, while memory requirement is RAM needed at runtime. This will mislead callers and downstream consumers that read downloadSize.

Either rename the parameter to downloadSize or add a dedicated memoryRequirement field to ModelInfo.

VULKAN_ANDROID_CONTRIBUTION.md-227-282 (1)

227-282: ⚠️ Potential issue | 🟡 Minor

Duplicate "Files Modified/Created" section.

The heading "## Files Modified/Created" appears at Line 227 with content and again at Line 278 with an empty body. Remove the duplicate at Line 278.

Proposed fix
----
-
-## Files Modified/Created
-
-
-
 ---
 
 ## Expected Performance (If It Worked)
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+ModelManagement.jvmAndroid.kt-592-602 (1)

592-602: ⚠️ Potential issue | 🟡 Minor

Exception cause not being passed to SDKError.download() - but proposed fix signature is incorrect.

The original exception e should be chained as the cause when re-throwing, which the current code omits. However, the proposed fix is incorrect: passing e as the second parameter would assign it to the code parameter instead of cause.

The correct fix is to pass the exception using the named parameter:

Corrected fix
         } catch (e: Exception) {
             logger.error("Failed to move archive to temp location: ${e.message}")
-            throw SDKError.download("Failed to prepare archive for extraction: ${e.message}")
+            throw SDKError.download("Failed to prepare archive for extraction: ${e.message}", cause = e)
         }

The SDKError.download() signature is: fun download(message: String, code: ErrorCode = ErrorCode.DOWNLOAD_FAILED, cause: Throwable? = null), so the cause must be passed as a named parameter or as the third positional argument.

sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp-110-117 (1)

110-117: ⚠️ Potential issue | 🟡 Minor

nativeIsRegistered always returns JNI_TRUE — does not reflect actual state.

The TODO-like comment says "return true if the native library is loaded," but this is misleading since the method name implies it checks service registration status. Callers relying on this to guard operations will never see false.

sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp-24-29 (1)

24-29: ⚠️ Potential issue | 🟡 Minor

Multi-statement macros are unsafe without do { ... } while(0) wrapper.

The non-Android LOGi/LOGe/LOGw macros expand to two separate fprintf statements. When used inside an unbraced if/else, only the first fprintf will be conditionally executed, while the newline is always printed.

🐛 Proposed fix
 `#else`
 `#include` <cstdio>
-#define LOGi(...) fprintf(stdout, "[INFO] " __VA_ARGS__); fprintf(stdout, "\n")
-#define LOGe(...) fprintf(stderr, "[ERROR] " __VA_ARGS__); fprintf(stderr, "\n")
-#define LOGw(...) fprintf(stdout, "[WARN] " __VA_ARGS__); fprintf(stdout, "\n")
+#define LOGi(...) do { fprintf(stdout, "[INFO] " __VA_ARGS__); fprintf(stdout, "\n"); } while(0)
+#define LOGe(...) do { fprintf(stderr, "[ERROR] " __VA_ARGS__); fprintf(stderr, "\n"); } while(0)
+#define LOGw(...) do { fprintf(stdout, "[WARN] " __VA_ARGS__); fprintf(stdout, "\n"); } while(0)
 `#endif`
sdk/runanywhere-commons/src/backends/llamacpp/patch_vulkan.py-43-71 (1)

43-71: ⚠️ Potential issue | 🟡 Minor

Script is not idempotent — repeated runs will double-prefix comment markers.

If CMake re-runs the patch step (e.g., during reconfiguration), lines already prefixed with # will be prefixed again (# # ExternalProject_Add...). The not line.strip().startswith('#') guard on entry (line 52) prevents re-entering a block, but any line already commented by a prior run that happens to match a later patch pattern (e.g., Patch 6's regex, Patch 8's regex) may still get double-commented.

Consider adding an early exit or sentinel check (e.g., a marker comment like # PATCHED_BY_RAC) at the top of the file on first run.

💡 Suggested idempotency guard
+SENTINEL = "# PATCHED_BY_RAC_VULKAN"
+
+if SENTINEL in content:
+    print("File already patched, skipping")
+    sys.exit(0)
+
 # Patch 1: Remove COMPONENTS glslc (make glslc optional)
 content = content.replace(...)
 ...
 
+# Add sentinel at top
+content = SENTINEL + "\n" + content
+
 with open(file_path, 'w') as f:
     f.write(content)
sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp-122-126 (1)

122-126: ⚠️ Potential issue | 🟡 Minor

Update hardcoded version string to match VERSIONS file.

The version "b7199" is stale. The project's VERSIONS file (the single source of truth) specifies LLAMACPP_VERSION=b7935 (Feb 4, 2025) with full Vulkan support. Both the CMakeLists.txt (line 12) and the hardcoded JNI string must be updated to match.

sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp-13-19 (1)

13-19: ⚠️ Potential issue | 🟡 Minor

Non-Android GPU_LOG macro lacks a trailing newline.

On Android, __android_log_print adds a newline automatically. The fprintf(stderr, ...) variant does not. This will produce concatenated log lines on non-Android platforms.

Proposed fix
 `#else`
-#define GPU_LOG(...) fprintf(stderr, __VA_ARGS__)
+#define GPU_LOG(...) do { fprintf(stderr, __VA_ARGS__); fprintf(stderr, "\n"); } while(0)
 `#endif`
sdk/runanywhere-commons/src/jni/device_jni.cpp-19-76 (1)

19-76: ⚠️ Potential issue | 🟡 Minor

Missing null checks on NewStringUTF results before passing to NewObject.

If NewStringUTF returns nullptr (OOM), a pending exception is set, and the subsequent NewObject call is undefined behavior per JNI spec. Check for null / pending exceptions after the string allocations.

🛡️ Proposed fix
     // Convert C++ strings to Java strings
     jstring device_name = env->NewStringUTF(info.device_name.c_str());
     jstring driver_version = env->NewStringUTF(info.driver_version.c_str());
     jstring api_version = env->NewStringUTF(
         runanywhere::VulkanDetector::get_version_string(info.api_version).c_str()
     );
+    
+    if (!device_name || !driver_version || !api_version) {
+        RAC_LOG_ERROR("DeviceJNI", "Failed to allocate JNI strings");
+        if (device_name) env->DeleteLocalRef(device_name);
+        if (driver_version) env->DeleteLocalRef(driver_version);
+        if (api_version) env->DeleteLocalRef(api_version);
+        env->DeleteLocalRef(gpu_info_class);
+        return nullptr;
+    }
🧹 Nitpick comments (22)
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeLLM.kt (3)

585-590: Listener exceptions are swallowed with insufficient logging.

Multiple locations catch exceptions from llmListener callbacks and either completely ignore them (lines 585-590, 656-659) or log at WARN level (lines 619-625, 841-849, 1151-1159). While preventing listener bugs from breaking core functionality is correct, silent failures make debugging difficult for SDK users.

Recommended approach:

  • Log all listener exceptions at ERROR level (not WARN or silent)
  • Include the full exception stack trace in the log
  • Consider adding a global error handler that SDK users can override for monitoring

Example:

try {
    llmListener?.onModelLoaded(modelId, modelPath)
} catch (e: Exception) {
    CppBridgePlatformAdapter.logCallback(
        CppBridgePlatformAdapter.LogLevel.ERROR,  // Changed from WARN
        TAG,
        "Error in LLM listener onModelLoaded: ${e.message}\n${e.stackTraceToString()}"
    )
}

Also applies to: 619-625, 656-659, 678-682, 722-726, 767-771, 841-849, 1151-1159


40-132: Consider using sealed classes or enums instead of Int constants for type safety.

The state constants (LLMState, GenerationMode, StopReason) use Int values, which allows invalid states to be set and reduces type safety. While this matches the C++ API, the Kotlin layer can provide a type-safe wrapper.

Recommended approach:

sealed class LLMState(val value: Int) {
    object NotCreated : LLMState(0)
    object Created : LLMState(1)
    object Loading : LLMState(2)
    object Ready : LLMState(3)
    object Generating : LLMState(4)
    object Unloading : LLMState(5)
    object Error : LLMState(6)
    
    fun isReady(): Boolean = this is Ready
    
    companion object {
        fun fromInt(value: Int): LLMState = when(value) {
            0 -> NotCreated
            1 -> Created
            // ... etc
            else -> Error
        }
    }
}

This provides compile-time safety while maintaining C++ interop through the value property.

As per coding guidelines: "Use sealed classes for type-safe error handling in Kotlin code" and "Always use data classes and structured types (enums, sealed classes) instead of strings for models and configuration."


605-616: Direct coupling to CppBridgeModelAssignment and CppBridgeState violates ModuleRegistry pattern.

The code directly calls CppBridgeModelAssignment.setAssignmentStatusCallback() and CppBridgeState.setComponentStateCallback() (lines 605-616, 828-839), creating tight coupling. This makes the component difficult to test in isolation and limits extensibility.

Recommended approach:

  • Use EventBus pattern to publish state change events
  • Let interested components subscribe to these events
  • Remove direct dependencies on CppBridgeModelAssignment and CppBridgeState

Example:

// After model loads
EventBus.publish(
    ModelLoadedEvent(
        componentType = ComponentType.LLM,
        modelId = modelId,
        modelPath = modelPath,
        status = AssignmentStatus.READY
    )
)

As per coding guidelines: "Use the ModuleRegistry provider pattern for extensibility and plugin architecture instead of hard-coded dependencies" and "Use EventBus for component communication instead of direct coupling between components."

Also applies to: 828-839

sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeDevice.kt (3)

314-320: Swallowed exception loses diagnostic context when Supabase URL retrieval fails.

Detekt flags this catch block. If racDevConfigGetSupabaseUrl() throws, the exception is silently consumed and baseUrl becomes null, eventually leading to the "No base URL configured" error at Line 383 with no clue about the root cause. Log the exception at DEBUG.

♻️ Suggested fix
                             } catch (e: Exception) {
-                                null
+                                CppBridgePlatformAdapter.logCallback(
+                                    CppBridgePlatformAdapter.LogLevel.DEBUG,
+                                    TAG,
+                                    "Failed to get Supabase URL: ${e.message}",
+                                )
+                                null
                             }

1274-1281: escapeJson doesn't handle all JSON-special characters.

The function misses Unicode control characters (U+0000–U+001F besides \n, \r, \t) which the JSON spec requires to be escaped. For device info strings this is low-risk, but a malicious or unusual device name could produce invalid JSON.


536-579: Manual JSON construction is fragile — consider using a JSON library.

Building JSON via string concatenation with a custom escapeJson is error-prone (e.g., trailing commas if fields are conditionally included, incomplete escaping). If org.json or kotlinx.serialization is already a dependency, it would be safer and more maintainable.

sdk/runanywhere-commons/scripts/compile-all-vulkan-shaders.sh (2)

28-31: Dead code — this check is unreachable.

Lines 16–19 already exit if HOST_GLSLC is empty. If execution reaches Line 28, HOST_GLSLC is guaranteed to be a valid path, making this block unreachable.

♻️ Remove redundant check
-if [ ! -f "$HOST_GLSLC" ] && [ ! -x "$(command -v glslc)" ]; then
-    echo "Error: glslc not found. Please install Vulkan SDK."
-    exit 1
-fi

14-19: Prefer command -v over which for POSIX portability.

which is not guaranteed on all systems and can behave inconsistently. command -v is the POSIX-standard way to locate executables.

♻️ Suggested fix
-HOST_GLSLC=$(which glslc)
+HOST_GLSLC=$(command -v glslc || true)
 if [ -z "$HOST_GLSLC" ]; then
     echo "Error: glslc not found in PATH. Please install Vulkan SDK."
     exit 1
 fi
scripts/test-vulkan-on-device.sh (1)

51-77: Logcat is read three separate times — capture once to a temp file.

Lines 56, 60, and 72 each invoke adb logcat -d separately, which is slow over USB and could yield inconsistent results if logs are rotating. Capture once to a temp file.

♻️ Suggested improvement
+LOGCAT_FILE=$(mktemp)
+trap "rm -f $LOGCAT_FILE" EXIT
 adb logcat -c
 adb shell am start -n com.runanywhere.runanywhereai.debug/com.runanywhere.runanywhereai.MainActivity > /dev/null 2>&1
 sleep 5
+adb logcat -d > "$LOGCAT_FILE"

 # Check for crash
-if adb logcat -d | grep -q "Fatal signal 11"; then
+if grep -q "Fatal signal" "$LOGCAT_FILE"; then
     echo "   ❌ CRASH DETECTED"
     echo ""
     echo "   Crash details:"
-    adb logcat -d | grep -A 5 "Fatal signal 11" | head -10
+    grep -A 5 "Fatal signal" "$LOGCAT_FILE" | head -15
sdk/runanywhere-commons/scripts/generate-shader-aggregates.py (1)

56-93: Duplicate code — data and len array generation are structurally identical.

The two nested loop blocks (Lines 56–73 and 76–93) differ only in the array type and the _data/_len suffix. Extract a helper to reduce duplication.

sdk/runanywhere-commons/include/rac/infrastructure/device/vulkan_detector.h (2)

13-21: Public symbols lack the rac_ prefix required by coding guidelines.

VulkanDeviceInfo and VulkanDetector are exported via RAC_API from the include/rac/ path but don't follow the rac_ naming convention. While the runanywhere namespace provides scoping, the coding guidelines state all public symbols must be prefixed with rac_. Consider renaming to rac_VulkanDeviceInfo / RacVulkanDeviceInfo (or equivalent) to stay consistent with the rest of the public API surface.

As per coding guidelines: "All public symbols must be prefixed with rac_ (RunAnywhere Commons)".


29-55: Public API header should document lifecycle and thread-safety.

The VulkanDetector methods (detect(), list_devices()) create and destroy Vulkan instances internally, which is expensive. Callers should know:

  • Whether results can be cached
  • Thread-safety of static methods (safe to call from multiple threads?)
  • On Android, detect() forks a child process — this has significant implications (e.g., not safe in multi-threaded contexts after fork)

A brief note in the class-level doc would suffice.

As per coding guidelines: "Public C API headers in include/rac/ must document vtable operations, error codes, and lifecycle requirements".

sdk/runanywhere-commons/cmake/FetchVulkanHpp.cmake (1)

51-76: No integrity verification for downloaded headers — supply chain risk.

The headers are downloaded over HTTPS from GitHub without any checksum validation. A compromised CDN or MITM could inject malicious code into the Vulkan C++ headers. Consider adding EXPECTED_HASH SHA256=<hash> to the file(DOWNLOAD ...) calls, or at minimum document why this was deemed acceptable.

sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+ModelManagement.jvmAndroid.kt (1)

961-1058: Replace hand-rolled regex JSON parsing with kotlinx.serialization.

The custom parseModelAssignmentsJson() parser using regex is fragile and will fail on escaped quotes in values, nested objects/arrays, null values, decimal/negative numbers, and whitespace variations. The split("},\\s*\\{") approach misunderstands JSON structure when field values contain } or {.

Since kotlinx-serialization-json is already a dependency in the project, define a data class for the model assignment JSON structure and use Json.decodeFromString() instead. This will handle all standard JSON edge cases correctly.

Current implementation pattern
private fun parseModelAssignmentsJson(json: String): List<ModelInfo> {
    // ... regex-based field extraction
    val id = extractJsonString(jsonObj, "id") ?: continue
    val name = extractJsonString(jsonObj, "name") ?: id
    // ... more regex extractions
}
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.h (2)

88-90: disable_gpu() and is_using_gpu() are not protected by mutex_.

While use_gpu_ is written by disable_gpu() and read by is_using_gpu() / get_device_type(), none of these acquire mutex_. Currently safe because disable_gpu() is only called during initialization before concurrent generation begins, but this is a fragile implicit contract. If the fallback logic ever runs concurrently (e.g., mid-generation retry), this becomes a data race.

Consider either making use_gpu_ an std::atomic<bool> or documenting the single-threaded precondition.

♻️ Minimal fix: use atomic
-    bool is_using_gpu() const { return use_gpu_; }
-    std::string get_gpu_device_name() const { return gpu_device_name_; }
-    void disable_gpu() { use_gpu_ = false; }  // Disable GPU after failure
+    bool is_using_gpu() const { return use_gpu_.load(std::memory_order_acquire); }
+    std::string get_gpu_device_name() const { return gpu_device_name_; }
+    void disable_gpu() { use_gpu_.store(false, std::memory_order_release); }
-    bool use_gpu_ = false;
+    std::atomic<bool> use_gpu_{false};

27-32: DeviceType::GPU is ambiguous — consider adding VULKAN for clarity.

The enum jumps from GPU=1 to METAL=3 (no value 2), and GPU is used to represent the Vulkan backend path (per get_device_type() in the .cpp). Since Metal and CUDA are explicit accelerator types, adding VULKAN=2 (or aliasing GPU to VULKAN) would make the intent clearer and future-proof the enum.

sdk/runanywhere-kotlin/src/commonMain/kotlin/com/runanywhere/sdk/public/extensions/RunAnywhere+DeviceInfo.kt (1)

38-43: Two GPUInfo classes with different fields exist — naming collision risk.

This public GPUInfo in com.runanywhere.sdk.public.extensions has 4 properties, while the platform-internal GPUInfo in com.runanywhere.sdk.platform has 6 (adds driverVersion and supportsCompute). Having two identically-named data classes in different packages will cause import confusion for developers and within the SDK itself. Consider renaming the platform-internal one (e.g., PlatformGPUInfo or VulkanGPUInfo) to disambiguate.

sdk/runanywhere-commons/src/backends/llamacpp/patch_vulkan.py (1)

109-136: Patch 7's foreach exit condition relies on endforeach string match, ignoring nesting depth.

At line 129, the block exit is triggered by the presence of the string endforeach regardless of the tracked foreach_depth. If there were nested foreach loops (unlikely but possible), this would exit prematurely. Since you're already tracking depth, consider using foreach_depth <= 0 as the exit condition (similar to Patches 4 and 5), with endforeach as an additional safety check.

♻️ Suggested fix
         # Check if we've closed all parentheses (endforeach)
-        if 'endforeach' in line:
+        if 'endforeach' in line or foreach_depth <= 0:
             in_foreach = False
             foreach_depth = 0
sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/platform/DeviceCapabilities.kt (1)

33-34: Thread safety: cachedGPUInfo / cacheInitialized are accessed without synchronization.

On Android, detectVulkanGPU() and clearCache() can be called from different threads (e.g., main thread + coroutine dispatcher). Without @Volatile or synchronized, a JVM thread may see stale values due to memory visibility. Consider using @Volatile on both fields, or wrapping access in a synchronized(this) block.

Also applies to: 46-49, 119-122

sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt (1)

296-300: Directly mutating LINK_LIBRARIES target property is fragile.

Removing Vulkan::Vulkan from LINK_LIBRARIES via get_target_property/set_target_properties bypasses CMake's dependency tracking and can break in future CMake versions. A safer alternative is to use an interface library or override the Vulkan_LIBRARY variable before find_package(Vulkan) is called by llama.cpp.

sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp (2)

230-247: Android always forces n_gpu_layers=0 — Vulkan GPU acceleration is a complete no-op on Android.

The #ifdef __ANDROID__ block unconditionally sets model_params.n_gpu_layers = 0, overriding any probe or configuration result. This means the extensive Vulkan integration in this PR (shader compilation, detection, JNI bridge) is never actually exercised at runtime on Android.

Given the PR notes that crashes affect ~70–80% of devices, this safety measure is understandable for the experimental status. However, consider gating this on a runtime flag (e.g., config["force_cpu"]) rather than a blanket compile-time guard, so the opt-in path for testing on known-good devices remains accessible.

 `#ifdef` __ANDROID__
-    GPU_LOG("Android detected - forcing CPU mode (n_gpu_layers=0) to avoid Vulkan crash");
-    model_params.n_gpu_layers = 0;
-    LOGI("Loading model with CPU backend (Android Vulkan workaround)");
+    bool force_cpu = !backend_->is_using_gpu() || !config.value("enable_vulkan", false);
+    if (force_cpu) {
+        GPU_LOG("Android: CPU mode (Vulkan not requested or disabled)");
+        model_params.n_gpu_layers = 0;
+    } else {
+        gpu_attempted = true;
+        model_params.n_gpu_layers = 999;
+        GPU_LOG("Android: Vulkan GPU mode (n_gpu_layers=999)");
+    }
 `#else`

56-107: probe_vulkan_safe() duplicates logic already in VulkanDetector::is_vulkan_supported() with less safety.

The VulkanDetector (in vulkan_detector.cpp) already implements Vulkan availability checks, including a fork-based safe detection on Android and proper use of Vulkan headers. This probe function reimplements it using hand-rolled structs via dlopen. Consider delegating to the detector to avoid maintaining two separate probing code paths.

Comment on lines +31 to +126
static bool test_vulkan_in_child_process() {
RAC_LOG_INFO("VulkanDetector", "Testing Vulkan in child process (safe mode)...");

pid_t pid = fork();

if (pid == -1) {
RAC_LOG_ERROR("VulkanDetector", "Fork failed, assuming Vulkan unavailable");
return false;
}

if (pid == 0) {
// Child process - test Vulkan
// If driver crashes here, only child dies

#ifdef GGML_USE_VULKAN
VkInstance instance = VK_NULL_HANDLE;
VkInstanceCreateInfo create_info = {};
create_info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;

VkResult result = vkCreateInstance(&create_info, nullptr, &instance);
if (result != VK_SUCCESS) {
_exit(1); // Failed
}

uint32_t device_count = 0;
vkEnumeratePhysicalDevices(instance, &device_count, nullptr);

if (device_count == 0) {
vkDestroyInstance(instance, nullptr);
_exit(1); // No devices
}

// Test actual device creation (this is where buggy drivers crash)
std::vector<VkPhysicalDevice> devices(device_count);
vkEnumeratePhysicalDevices(instance, &device_count, devices.data());

// Try to create a logical device (fence creation happens here)
VkDeviceQueueCreateInfo queue_info = {};
queue_info.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
queue_info.queueFamilyIndex = 0;
queue_info.queueCount = 1;
float priority = 1.0f;
queue_info.pQueuePriorities = &priority;

VkDeviceCreateInfo device_info = {};
device_info.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
device_info.queueCreateInfoCount = 1;
device_info.pQueueCreateInfos = &queue_info;

VkDevice device = VK_NULL_HANDLE;
result = vkCreateDevice(devices[0], &device_info, nullptr, &device);

if (device != VK_NULL_HANDLE) {
vkDestroyDevice(device, nullptr);
}

vkDestroyInstance(instance, nullptr);

// If we got here without crashing, Vulkan works!
_exit(result == VK_SUCCESS ? 0 : 1);
#else
_exit(1); // Vulkan not compiled
#endif
}

// Parent process - wait for child
int status;
pid_t result = waitpid(pid, &status, 0);

if (result == -1) {
RAC_LOG_ERROR("VulkanDetector", "waitpid failed");
return false;
}

if (WIFEXITED(status)) {
int exit_code = WEXITSTATUS(status);
if (exit_code == 0) {
RAC_LOG_INFO("VulkanDetector", "✅ Vulkan test passed in child process");
return true;
} else {
RAC_LOG_WARNING("VulkanDetector", "❌ Vulkan test failed in child (exit code: %d)", exit_code);
return false;
}
}

if (WIFSIGNALED(status)) {
int signal = WTERMSIG(status);
RAC_LOG_ERROR("VulkanDetector", "❌ Child process crashed with signal %d (SIGSEGV=%d) - BUGGY DRIVER DETECTED!",
signal, SIGSEGV);
RAC_LOG_ERROR("VulkanDetector", "This device has a buggy Vulkan driver. Using CPU fallback.");
return false;
}

return false;
}
#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

fork() is unsafe in Android's multi-threaded JNI environment.

On Android, calling fork() from a JNI context is dangerous because the ART runtime has many internal threads (GC, finalizer, JIT compiler, binder). After fork(), only the calling thread survives — any mutex held by other threads (including the heap allocator lock) becomes permanently locked. The child process then calls std::vector allocation (Line 64), Vulkan instance creation, and other non-async-signal-safe functions, which can deadlock.

A safer alternative would be to use a separate process via android.os.Process or Runtime.exec() from the Java/Kotlin layer, or use posix_spawn with POSIX_SPAWN_USEVFORK which has fewer issues. Alternatively, use vkEnumerateInstanceVersion() as a lightweight probe before attempting full initialization — it doesn't require instance creation and is less likely to crash.

🤖 Prompt for AI Agents
In `@sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp` around
lines 31 - 126, The use of fork() inside test_vulkan_in_child_process is unsafe
on Android JNI and can deadlock when the child uses non-async-signal-safe calls
(e.g., std::vector allocations, vkCreateInstance); replace the fork-based probe
with a safer approach: either move the probe into Java/Kotlin and spawn a
separate process via android.os.Process/Runtime.exec, or switch to posix_spawn
(with POSIX_SPAWN_USEVFORK) instead of fork, or simplify the probe to call
vkEnumerateInstanceVersion (a lightweight, instance-less check) from the
existing detection flow; update/remove fork(), waitpid(), child-only Vulkan code
paths in test_vulkan_in_child_process and ensure the calling code handles the
new boolean result accordingly.

Comment on lines 36 to 298
object CppBridgeLLM {
/**
* LLM component state constants matching C++ RAC_LLM_STATE_* values.
*/
object LLMState {
/** Component not created */
const val NOT_CREATED = 0
/** Component created but no model loaded */
const val CREATED = 1
/** Model is loading */
const val LOADING = 2
/** Model loaded and ready for inference */
const val READY = 3
/** Inference in progress */
const val GENERATING = 4
/** Model is unloading */
const val UNLOADING = 5
/** Component in error state */
const val ERROR = 6
/**
* Get a human-readable name for the LLM state.
*/
fun getName(state: Int): String =
when (state) {
NOT_CREATED -> "NOT_CREATED"
CREATED -> "CREATED"
LOADING -> "LOADING"
READY -> "READY"
GENERATING -> "GENERATING"
UNLOADING -> "UNLOADING"
ERROR -> "ERROR"
else -> "UNKNOWN($state)"
}
/**
* Check if the state indicates the component is usable.
*/
fun isReady(state: Int): Boolean = state == READY
}
/**
* LLM generation mode constants.
*/
object GenerationMode {
/** Standard completion mode */
const val COMPLETION = 0
/** Chat/instruction mode */
const val CHAT = 1
/** Fill-in-the-middle mode */
const val INFILL = 2
}
/**
* LLM stop reason constants.
*/
object StopReason {
/** Generation still in progress */
const val NOT_STOPPED = 0
/** Reached end of sequence token */
const val EOS = 1
/** Reached maximum token limit */
const val MAX_TOKENS = 2
/** Hit a stop sequence */
const val STOP_SEQUENCE = 3
/** Generation was cancelled */
const val CANCELLED = 4
/** Generation failed */
const val ERROR = 5
/**
* Get a human-readable name for the stop reason.
*/
fun getName(reason: Int): String =
when (reason) {
NOT_STOPPED -> "NOT_STOPPED"
EOS -> "EOS"
MAX_TOKENS -> "MAX_TOKENS"
STOP_SEQUENCE -> "STOP_SEQUENCE"
CANCELLED -> "CANCELLED"
ERROR -> "ERROR"
else -> "UNKNOWN($reason)"
}
}
@Volatile
private var isRegistered: Boolean = false
@Volatile
private var state: Int = LLMState.NOT_CREATED
@Volatile
private var handle: Long = 0
@Volatile
private var loadedModelId: String? = null
@Volatile
private var loadedModelPath: String? = null
@Volatile
private var isCancelled: Boolean = false
@Volatile
private var isNativeLibraryLoaded: Boolean = false
private val lock = Any()
/**
* Tag for logging.
*/
private const val TAG = "CppBridgeLLM"
/**
* Check if native LLM library is available.
*/
val isNativeAvailable: Boolean
get() = isNativeLibraryLoaded
/**
* Initialize native library availability check.
* Should be called during SDK initialization.
*/
fun checkNativeLibrary() {
try {
// Try to call a simple native method to verify library is loaded
// If it throws UnsatisfiedLinkError, the library isn't available
isNativeLibraryLoaded = true
CppBridgePlatformAdapter.logCallback(
CppBridgePlatformAdapter.LogLevel.DEBUG,
TAG,
"Native LLM library check passed",
)
} catch (e: UnsatisfiedLinkError) {
isNativeLibraryLoaded = false
CppBridgePlatformAdapter.logCallback(
CppBridgePlatformAdapter.LogLevel.WARN,
TAG,
"Native LLM library not available: ${e.message}",
)
}
}
/**
* Singleton shared instance for accessing the LLM component.
* Matches iOS CppBridge.LLM.shared pattern.
*/
val shared: CppBridgeLLM = this
/**
* Optional listener for LLM events.
* Set this before calling [register] to receive events.
*/
@Volatile
var llmListener: LLMListener? = null
/**
* Optional streaming callback for token-by-token generation.
* This is invoked for each generated token during streaming.
*/
@Volatile
var streamCallback: StreamCallback? = null
/**
* LLM generation configuration.
*
* @param maxTokens Maximum number of tokens to generate
* @param temperature Sampling temperature (0.0 to 2.0)
* @param topP Top-p (nucleus) sampling parameter
* @param topK Top-k sampling parameter
* @param repeatPenalty Penalty for repeating tokens
* @param stopSequences List of sequences that stop generation
* @param seed Random seed for reproducibility (-1 for random)
*/
data class GenerationConfig(
val maxTokens: Int = 512,
val temperature: Float = 0.7f,
val topP: Float = 0.9f,
val topK: Int = 40,
val repeatPenalty: Float = 1.1f,
val stopSequences: List<String> = emptyList(),
val seed: Long = -1,
) {
/**
* Convert to JSON string for C++ interop.
*/
fun toJson(): String {
return buildString {
append("{")
append("\"max_tokens\":$maxTokens,")
append("\"temperature\":$temperature,")
append("\"top_p\":$topP,")
append("\"top_k\":$topK,")
append("\"repeat_penalty\":$repeatPenalty,")
append("\"stop_sequences\":[")
stopSequences.forEachIndexed { index, seq ->
if (index > 0) append(",")
append("\"${escapeJson(seq)}\"")
}
append("],")
append("\"seed\":$seed")
append("}")
}
}
companion object {
/** Default configuration */
val DEFAULT = GenerationConfig()
}
}
/**
* LLM model configuration.
*
* @param contextLength Context window size in tokens
* @param gpuLayers Number of layers to offload to GPU (-1 for auto)
* @param threads Number of threads for inference (-1 for auto)
* @param batchSize Batch size for prompt processing
* @param useMemoryMap Whether to use memory-mapped loading
* @param useLocking Whether to use file locking
*/
data class ModelConfig(
val contextLength: Int = 4096,
val gpuLayers: Int = -1,
val threads: Int = -1,
val batchSize: Int = 512,
val useMemoryMap: Boolean = true,
val useLocking: Boolean = false,
) {
/**
* Convert to JSON string for C++ interop.
*/
fun toJson(): String {
return buildString {
append("{")
append("\"context_length\":$contextLength,")
append("\"gpu_layers\":$gpuLayers,")
append("\"threads\":$threads,")
append("\"batch_size\":$batchSize,")
append("\"use_memory_map\":$useMemoryMap,")
append("\"use_locking\":$useLocking")
append("}")
}
}
companion object {
/** Default configuration */
val DEFAULT = ModelConfig()
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Major architectural violation: Business logic must be in commonMain/, not jvmAndroidMain/.

This file contains significant cross-platform business logic (state management, lifecycle semantics, data classes for GenerationConfig/ModelConfig/GenerationResult, interfaces) in a platform-specific source set. As per coding guidelines, all business logic, protocols, interfaces, and structures must be defined in commonMain/. Only platform-specific JNI marshaling should remain in jvmAndroidMain/.

Recommended refactoring:

  1. Move business logic to commonMain/: LLMState, GenerationMode, StopReason, data classes (GenerationConfig, ModelConfig, GenerationResult), interfaces (LLMListener, StreamCallback), and lifecycle semantics
  2. Use expect/actual pattern: Define expect declarations in commonMain/ for native interop
  3. Keep only JNI-specific marshaling in jvmAndroidMain/: JNI method calls, Java type conversions, @JvmStatic callbacks

This separation ensures the business logic is testable, reusable across platforms, and follows the iOS CppBridge+LLM.swift architecture mentioned in the comments.

As per coding guidelines: "All business logic, protocols, interfaces, and structures MUST be defined in commonMain/ for Kotlin Multiplatform projects" and "Use expect/actual ONLY for platform-specific implementations, not for business logic."

🧰 Tools
🪛 detekt (1.23.8)

[warning] 84-84: The caught exception is swallowed. The original exception could be lost.

(detekt.exceptions.SwallowedException)

🤖 Prompt for AI Agents
In
`@sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/foundation/bridge/extensions/CppBridgeLLM.kt`
around lines 36 - 298, This file places core business logic in a platform source
set; move all non-JNI business logic into commonMain: relocate CppBridgeLLM's
state/type definitions (LLMState, GenerationMode, StopReason), data classes
(GenerationConfig, ModelConfig, GenerationResult), and any interfaces/protocols
(LLMListener, StreamCallback) plus lifecycle semantics (isRegistered, state,
loadedModelId, checkNativeLibrary, shared) into commonMain; leave only
JNI/platform-specific glue in jvmAndroidMain by converting native interop points
here into actual implementations of expect declarations you define in commonMain
(use expect for functions called from common code and implement them in this
file), and remove any non-platform logic from jvmAndroidMain so it contains only
JNI marshaling, Java/Kotlin type conversions, and `@JvmStatic` callback adapters.

Comment on lines +143 to +158
@JvmStatic
private external fun nativeDetectVulkanGPU(): GPUInfo

/**
* Native method to check Vulkan support
* Implemented in: sdk/runanywhere-commons/src/jni/device_jni.cpp
*/
@JvmStatic
private external fun nativeIsVulkanSupported(): Boolean

/**
* Native method to list all Vulkan devices
* Implemented in: sdk/runanywhere-commons/src/jni/device_jni.cpp
*/
@JvmStatic
private external fun nativeListVulkanDevices(): List<GPUInfo>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: JNI method names don't match between Kotlin and C++ — native calls will always fail.

The Kotlin native methods are named nativeDetectVulkanGPU, nativeIsVulkanSupported, and nativeListVulkanDevices, but the corresponding JNI functions in device_jni.cpp are named detectVulkanGPU, isVulkanSupported, and listVulkanDevices (without the native prefix). JNI resolution is based on exact mangled names, so these will never bind and every call will throw UnsatisfiedLinkError, caught silently by the try/catch blocks.

Either rename the Kotlin externals to drop the native prefix, or rename the C++ JNI functions to include it.

Option A: Fix Kotlin side
     `@JvmStatic`
-    private external fun nativeDetectVulkanGPU(): GPUInfo
+    private external fun detectVulkanGPU(): GPUInfo
     
     `@JvmStatic`
-    private external fun nativeIsVulkanSupported(): Boolean
+    private external fun isVulkanSupported(): Boolean
     
     `@JvmStatic`
-    private external fun nativeListVulkanDevices(): List<GPUInfo>
+    private external fun listVulkanDevices(): List<GPUInfo>

Then update calling sites (lines 52, 83, 103) accordingly.

Option B: Fix C++ side (device_jni.cpp)
-Java_com_runanywhere_sdk_platform_DeviceCapabilities_detectVulkanGPU(
+Java_com_runanywhere_sdk_platform_DeviceCapabilities_nativeDetectVulkanGPU(

-Java_com_runanywhere_sdk_platform_DeviceCapabilities_isVulkanSupported(
+Java_com_runanywhere_sdk_platform_DeviceCapabilities_nativeIsVulkanSupported(

-Java_com_runanywhere_sdk_platform_DeviceCapabilities_listVulkanDevices(
+Java_com_runanywhere_sdk_platform_DeviceCapabilities_nativeListVulkanDevices(
🤖 Prompt for AI Agents
In
`@sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/platform/DeviceCapabilities.kt`
around lines 143 - 158, The external Kotlin JNI names include a "native" prefix
that doesn't match the C++ JNI functions; rename the Kotlin externals
nativeDetectVulkanGPU, nativeIsVulkanSupported, and nativeListVulkanDevices to
detectVulkanGPU, isVulkanSupported, and listVulkanDevices respectively (and
update all call sites that invoke those symbols) so the mangled JNI names match
the C++ implementations; verify the Kotlin method signatures (return types and
parameters for GPUInfo and List<GPUInfo>) match the C++ JNI signatures after
renaming.

Resolved conflicts:
- CMakeLists.txt: Merged Vulkan C++20 requirement + vulkan_detector.cpp with upstream VLM/Diffusion features
- VERSIONS: Kept Vulkan-specific values (ANDROID_MIN_SDK=26, LLAMACPP_VERSION=b7935)
- Backend files: Accepted upstream changes (Vulkan logic in main CMakeLists, not backend)

All Vulkan files preserved:
- vulkan_detector.cpp/h
- device_jni.cpp
- GPUInfo.kt, DeviceCapabilities.kt
- FetchVulkanHpp.cmake
- Documentation and test scripts
@MdTabish24 MdTabish24 marked this pull request as draft February 11, 2026 19:12
@MdTabish24 MdTabish24 marked this pull request as ready for review February 11, 2026 19:36
@MdTabish24 MdTabish24 marked this pull request as draft February 11, 2026 19:36
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@sdk/runanywhere-commons/CMakeLists.txt`:
- Around line 41-42: The CMakeLists currently sets a global C++20 via
CMAKE_CXX_STANDARD while also declaring target_compile_features(rac_commons
PUBLIC cxx_std_17), causing a standard mismatch and unnecessarily forcing C++20
for all consumers; change the global standard back to C++17 (ensure
CMAKE_CXX_STANDARD is 17) and remove/align any conflicting public cxx_std
settings on rac_commons, then scope C++20 only to the Vulkan-specific targets
(e.g., the Vulkan-Hpp or llamacpp/vulkan backend target) by adding
target_compile_features(<vulkan_target> PRIVATE cxx_std_20) or guarding that
target compile feature behind the Android/Vulkan build option so other platforms
remain C++17.
- Line 165: The new vulkan_detector.cpp should not be unconditionally added to
RAC_INFRASTRUCTURE_SOURCES; wrap its inclusion in a CMake conditional that
checks the RAC_BUILD_VULKAN option and restricts platforms to those that support
Vulkan (e.g., non-Apple UNIX/Windows). Update the CMakeLists.txt so that
vulkan_detector.cpp is appended to RAC_INFRASTRUCTURE_SOURCES only when
RAC_BUILD_VULKAN is enabled and the target platform permits Vulkan (use a
condition like if(RAC_BUILD_VULKAN AND (WIN32 OR (UNIX AND NOT APPLE))) before
adding the file).

In `@sdk/runanywhere-commons/VERSIONS`:
- Around line 72-75: Update the release date in the VERSIONS comment: change the
text "LATEST release (Feb 4, 2025)" to "LATEST release (Feb 4, 2026)" in the
block that documents LLAMACPP_VERSION=b7935 so the comment accurately reflects
the release date.
- Around line 25-26: The ANDROID_MIN_SDK change in VERSIONS raises the native
C++ build minimum to 26 but Gradle APKs still hardcode minSdk=24, creating an
inconsistency; to fix, either (A) introduce a separate variable
ANDROID_MIN_SDK_VULKAN=26 in VERSIONS and update
build-android.sh/load-versions.sh and the CMake invocation to use
ANDROID_MIN_SDK_VULKAN for Vulkan/native builds only (keeping ANDROID_MIN_SDK
for general use), or (B) update all build.gradle and build.gradle.kts modules to
set minSdk to 26 so Gradle and native builds match; alternatively, if you intend
to keep the mismatch, add a clear doc comment in VERSIONS explaining the runtime
Vulkan detection strategy and why native libs target API26 while APKs remain 24.

Comment on lines +41 to +42
# C++20 required for Vulkan-Hpp and modern llama.cpp features
set(CMAKE_CXX_STANDARD 20)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

C++ standard contradiction and overly broad C++20 requirement.

Two issues here:

  1. Line 42 sets CMAKE_CXX_STANDARD 20 but line 283 still declares target_compile_features(rac_commons PUBLIC cxx_std_17). These should be consistent — the PUBLIC compile feature propagates to all consumers of rac_commons, so the mismatch means downstream targets may build at C++17 while this library expects C++20.

  2. Bumping the entire project from C++17 to C++20 for an experimental Vulkan feature affects all platforms (iOS, macOS, Linux), not just Android. If only Vulkan-Hpp needs C++20, scope this to the Vulkan-related targets or guard it behind the Android/Vulkan build path.

Proposed fix — scope C++20 to Vulkan builds only
-# C++20 required for Vulkan-Hpp and modern llama.cpp features
-set(CMAKE_CXX_STANDARD 20)
+# C++17 standard required
+set(CMAKE_CXX_STANDARD 17)

Then, in the Vulkan-specific target or the llamacpp backend CMakeLists.txt that actually uses Vulkan-Hpp:

target_compile_features(vulkan_related_target PRIVATE cxx_std_20)

Based on learnings: "C++17 standard required" applies to sdk/runanywhere-commons/**/*.{cpp,hpp}.

Also applies to: 283-283

🤖 Prompt for AI Agents
In `@sdk/runanywhere-commons/CMakeLists.txt` around lines 41 - 42, The CMakeLists
currently sets a global C++20 via CMAKE_CXX_STANDARD while also declaring
target_compile_features(rac_commons PUBLIC cxx_std_17), causing a standard
mismatch and unnecessarily forcing C++20 for all consumers; change the global
standard back to C++17 (ensure CMAKE_CXX_STANDARD is 17) and remove/align any
conflicting public cxx_std settings on rac_commons, then scope C++20 only to the
Vulkan-specific targets (e.g., the Vulkan-Hpp or llamacpp/vulkan backend target)
by adding target_compile_features(<vulkan_target> PRIVATE cxx_std_20) or
guarding that target compile feature behind the Android/Vulkan build option so
other platforms remain C++17.

src/infrastructure/telemetry/telemetry_json.cpp
src/infrastructure/telemetry/telemetry_manager.cpp
src/infrastructure/device/rac_device_manager.cpp
src/infrastructure/device/vulkan_detector.cpp
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Vulkan detector is unconditionally compiled on all platforms — should be guarded.

vulkan_detector.cpp is added to RAC_INFRASTRUCTURE_SOURCES without any platform or feature guard. This means it compiles on iOS, macOS, and Linux builds where Vulkan detection is irrelevant (or may fail to compile if it includes Vulkan headers). Per coding guidelines, optional backends must use RAC_BUILD_* options.

Proposed fix — guard with platform and option check
     src/infrastructure/device/rac_device_manager.cpp
-    src/infrastructure/device/vulkan_detector.cpp
 )
+
+# Vulkan detector - Android only (experimental)
+if(RAC_PLATFORM_ANDROID)
+    list(APPEND RAC_INFRASTRUCTURE_SOURCES
+        src/infrastructure/device/vulkan_detector.cpp
+    )
+endif()

As per coding guidelines, CMakeLists.txt must use RAC_BUILD_* options for optional backends and provide clear dependency management.

🤖 Prompt for AI Agents
In `@sdk/runanywhere-commons/CMakeLists.txt` at line 165, The new
vulkan_detector.cpp should not be unconditionally added to
RAC_INFRASTRUCTURE_SOURCES; wrap its inclusion in a CMake conditional that
checks the RAC_BUILD_VULKAN option and restricts platforms to those that support
Vulkan (e.g., non-Apple UNIX/Windows). Update the CMakeLists.txt so that
vulkan_detector.cpp is appended to RAC_INFRASTRUCTURE_SOURCES only when
RAC_BUILD_VULKAN is enabled and the target platform permits Vulkan (use a
condition like if(RAC_BUILD_VULKAN AND (WIN32 OR (UNIX AND NOT APPLE))) before
adding the file).

Comment on lines +25 to +26
# Android minimum SDK version (API 26 = Android 8.0 for Vulkan 1.1 support)
ANDROID_MIN_SDK=26
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all files that reference ANDROID_MIN_SDK or minSdk
echo "=== Files referencing ANDROID_MIN_SDK ==="
rg 'ANDROID_MIN_SDK' --type-add 'gradle:*.gradle.kts' --type-add 'gradle2:*.gradle' -n

echo ""
echo "=== Files referencing minSdk in gradle/cmake/shell ==="
rg 'minSdk' -n -g '*.gradle.kts' -g '*.gradle' -g 'CMakeLists.txt' -g '*.cmake' -g '*.sh'

echo ""
echo "=== How VERSIONS file is sourced (look for 'source' or 'include') ==="
rg 'VERSIONS|\.\/VERSIONS|source.*VERSIONS' -n -g '*.gradle.kts' -g '*.gradle' -g 'CMakeLists.txt' -g '*.cmake' -g '*.sh'

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 3123


🏁 Script executed:

#!/bin/bash
# Check how VERSIONS is sourced by build scripts
echo "=== load-versions.sh content (how it exports ANDROID_MIN_SDK) ==="
sed -n '1,70p' sdk/runanywhere-commons/scripts/load-versions.sh

echo ""
echo "=== build-android.sh usage of ANDROID_MIN_SDK ==="
sed -n '200,210p' sdk/runanywhere-commons/scripts/build-android.sh

echo ""
echo "=== Check if gradle files have any property sources ==="
rg -A 2 -B 2 'minSdk' sdk/runanywhere-kotlin/build.gradle.kts | head -20

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 3423


🏁 Script executed:

#!/bin/bash
# Check if there's a gradle.properties or other config that might link to VERSIONS
echo "=== Look for gradle.properties or version sourcing ==="
find . -name "gradle.properties" -o -name "local.properties" | head -10

echo ""
echo "=== Check VULKAN_ANDROID_CONTRIBUTION.md for context on the change ==="
sed -n '240,260p' VULKAN_ANDROID_CONTRIBUTION.md

echo ""
echo "=== Check if build-android.sh is the primary Android build method ==="
sed -n '1,50p' sdk/runanywhere-commons/scripts/build-android.sh | head -30

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 2518


🏁 Script executed:

#!/bin/bash
# Check gradle.properties content for any VERSIONS sourcing
echo "=== gradle.properties files ==="
cat sdk/runanywhere-kotlin/gradle.properties

echo ""
echo "=== Check main gradle.properties ==="
cat gradle.properties

echo ""
echo "=== Verify: does build-android.sh call gradle? ==="
grep -n "gradle\|gradlew" sdk/runanywhere-commons/scripts/build-android.sh | head -20

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 2893


🏁 Script executed:

#!/bin/bash
# Check what build-android.sh actually builds
echo "=== build-android.sh: what does it build? ==="
sed -n '20,100p' sdk/runanywhere-commons/scripts/build-android.sh

echo ""
echo "=== Check CMakeLists.txt for minSdk enforcement ==="
head -50 sdk/runanywhere-commons/CMakeLists.txt

echo ""
echo "=== Search for API level checks related to Vulkan ==="
rg 'API.*26|MIN.*26|minSdk.*26' --type cmake --type cpp -n -B 2 -A 2 | head -30

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 4621


🏁 Script executed:

#!/bin/bash
# Check Vulkan requirements and how it's integrated
echo "=== Check for Vulkan API level requirements ==="
rg -i 'vulkan.*api.*2[6-9]|vulkan.*level|vulkan.*minSdk' -n -B 2 -A 2 | head -40

echo ""
echo "=== Search for runtime checks on API level for Vulkan ==="
rg -i 'vulkan.*build.*sdk|sdk.*build.*vulkan|dynamic.*vulkan|runtime.*vulkan' sdk/runanywhere-commons/src -n -B 1 -A 1 | head -30

echo ""
echo "=== Check VERSIONS file full content around ANDROID_MIN_SDK ==="
sed -n '20,85p' sdk/runanywhere-commons/VERSIONS

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 5164


ANDROID_MIN_SDK=26 affects C++ native library builds globally, creating build system inconsistency.

This change raises the minimum API level for all C++ native code compiled via build-android.sh (which loads ANDROID_MIN_SDK from VERSIONS via load-versions.sh and passes it to CMake). However, Gradle-based APK builds hardcode minSdk=24 across all modules and do not source this variable. This creates a mismatch: APKs can still target API 24-25 devices, but their bundled native libraries are compiled assuming API 26+ features (Vulkan 1.1). This could cause crashes on older devices if they attempt to use Vulkan features at runtime.

Consider one of:

  1. Decouple Vulkan from global minSdk: Add ANDROID_MIN_SDK_VULKAN=26 and conditionally apply it only to Vulkan-specific code paths or builds.
  2. Update Gradle declarations: If API 26 is required globally, update all build.gradle.kts and build.gradle files to match ANDROID_MIN_SDK=26 for consistency.
  3. Document the decision: If this mismatch is intentional (relying on runtime Vulkan feature detection), explicitly document why native code targets 26 while APKs target 24.
🤖 Prompt for AI Agents
In `@sdk/runanywhere-commons/VERSIONS` around lines 25 - 26, The ANDROID_MIN_SDK
change in VERSIONS raises the native C++ build minimum to 26 but Gradle APKs
still hardcode minSdk=24, creating an inconsistency; to fix, either (A)
introduce a separate variable ANDROID_MIN_SDK_VULKAN=26 in VERSIONS and update
build-android.sh/load-versions.sh and the CMake invocation to use
ANDROID_MIN_SDK_VULKAN for Vulkan/native builds only (keeping ANDROID_MIN_SDK
for general use), or (B) update all build.gradle and build.gradle.kts modules to
set minSdk to 26 so Gradle and native builds match; alternatively, if you intend
to keep the mismatch, add a clear doc comment in VERSIONS explaining the runtime
Vulkan detection strategy and why native libs target API26 while APKs remain 24.

Comment on lines +72 to +75
# b7935 - LATEST release (Feb 4, 2025) with full Vulkan support
# Includes arr_dmmv shaders and proper CoopMat handling
# Previous: b7650 (missing modern Vulkan shaders)
LLAMACPP_VERSION=b7935
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

llama.cpp b7935 release date February 2025

💡 Result:

llama.cpp tag b7935 was released on February 4, 2026 (GitHub shows “released this 04 Feb 15:37”). It is not a February 2025 release. [1][2]


Fix the release date typo: change "Feb 4, 2025" to "Feb 4, 2026".

The version bump to b7935 is correct. However, the comment states "LATEST release (Feb 4, 2025)" but the release was actually February 4, 2026. Update the year in the comment to reflect the accurate release date.

🤖 Prompt for AI Agents
In `@sdk/runanywhere-commons/VERSIONS` around lines 72 - 75, Update the release
date in the VERSIONS comment: change the text "LATEST release (Feb 4, 2025)" to
"LATEST release (Feb 4, 2026)" in the block that documents
LLAMACPP_VERSION=b7935 so the comment accurately reflects the release date.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

26 files reviewed, 8 comments

Edit Code Review Agent Settings | Greptile

device_info.pQueueCreateInfos = &queue_info;

VkDevice device = VK_NULL_HANDLE;
result = vkCreateDevice(devices[0], &device_info, nullptr, &device);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vkCreateDevice result is checked but potentially crashes before the check. If the driver crashes during device creation, the process is already dead and _exit(result == VK_SUCCESS ? 0 : 1) will never execute.

The fork approach isolates crashes correctly, but this logic assumes the function returns rather than crashing.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/infrastructure/device/vulkan_detector.cpp
Line: 81:81

Comment:
`vkCreateDevice` result is checked but potentially crashes before the check. If the driver crashes during device creation, the process is already dead and `_exit(result == VK_SUCCESS ? 0 : 1)` will never execute.

The fork approach isolates crashes correctly, but this logic assumes the function returns rather than crashing.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +78 to +79
handle->backend = std::make_unique<runanywhere::LlamaCppBackend>();
handle->backend->disable_gpu();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new backend instance doesn't call llama_backend_free() on the failed instance. If llama_backend_init() partially succeeded before the exception, Vulkan resources may leak.

Suggested change
handle->backend = std::make_unique<runanywhere::LlamaCppBackend>();
handle->backend->disable_gpu();
handle->backend.reset(); // Calls destructor which calls llama_backend_free()
handle->backend = std::make_unique<runanywhere::LlamaCppBackend>();
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp
Line: 78:79

Comment:
Creating a new backend instance doesn't call `llama_backend_free()` on the failed instance. If `llama_backend_init()` partially succeeded before the exception, Vulkan resources may leak.

```suggestion
        handle->backend.reset();  // Calls destructor which calls llama_backend_free()
        handle->backend = std::make_unique<runanywhere::LlamaCppBackend>();
```

How can I resolve this? If you propose a fix, please make it concise.

* Check if this GPU is suitable for ML inference
*/
fun isSuitableForML(): Boolean {
return isAvailable && supportsCompute && maxMemoryMB >= 512
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimum VRAM requirement of 512MB may be too low for 7B+ models mentioned in PR. Based on llamacpp_backend.cpp:306, 7B models need ~6GB GPU memory for 2048 context. Consider raising threshold or adding model-size-aware checks.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/platform/GPUInfo.kt
Line: 25:25

Comment:
Minimum VRAM requirement of 512MB may be too low for 7B+ models mentioned in PR. Based on `llamacpp_backend.cpp:306`, 7B models need ~6GB GPU memory for 2048 context. Consider raising threshold or adding model-size-aware checks.

How can I resolve this? If you propose a fix, please make it concise.

// CRITICAL: Disable Vulkan IMMEDIATELY on library load
// This prevents crash in llama_backend_init() before our code runs
__android_log_print(ANDROID_LOG_WARN, "RAC_GPU_STATUS", "JNI_OnLoad: Disabling Vulkan to prevent crash");
setenv("GGML_VK_DISABLE", "1", 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setenv("GGML_VK_DISABLE", "1", 1) globally disables Vulkan before any detection runs. Combined with GGML_VULKAN OFF in CMake, this creates triple redundancy. If the goal is experimental Vulkan support, at least one of these should be conditional (e.g., based on a runtime flag or device whitelist).

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/jni/rac_backend_llamacpp_jni.cpp
Line: 56:56

Comment:
`setenv("GGML_VK_DISABLE", "1", 1)` globally disables Vulkan before any detection runs. Combined with `GGML_VULKAN OFF` in CMake, this creates triple redundancy. If the goal is experimental Vulkan support, at least one of these should be conditional (e.g., based on a runtime flag or device whitelist).

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1 to +163
/**
* JNI Bridge for Device Capabilities (GPU Detection)
*
* Exposes Vulkan GPU detection to Kotlin/Java layer
*/

#include <jni.h>
#include "rac/infrastructure/device/vulkan_detector.h"
#include "rac/core/rac_logger.h"

extern "C" {

/**
* Detect Vulkan GPU and return GPUInfo object
*
* Java signature:
* com.runanywhere.sdk.platform.DeviceCapabilities.detectVulkanGPU() -> GPUInfo
*/
JNIEXPORT jobject JNICALL
Java_com_runanywhere_sdk_platform_DeviceCapabilities_detectVulkanGPU(
JNIEnv* env, jclass /* clazz */) {

RAC_LOG_INFO("DeviceJNI", "Detecting Vulkan GPU from JNI...");

// Detect GPU using C++ layer
auto info = runanywhere::VulkanDetector::detect();

// Find GPUInfo class
jclass gpu_info_class = env->FindClass("com/runanywhere/sdk/platform/GPUInfo");
if (gpu_info_class == nullptr) {
RAC_LOG_ERROR("DeviceJNI", "Failed to find GPUInfo class");
return nullptr;
}

// Find constructor: GPUInfo(boolean, String, String, String, long, boolean)
jmethodID constructor = env->GetMethodID(
gpu_info_class,
"<init>",
"(ZLjava/lang/String;Ljava/lang/String;Ljava/lang/String;JZ)V"
);

if (constructor == nullptr) {
RAC_LOG_ERROR("DeviceJNI", "Failed to find GPUInfo constructor");
return nullptr;
}

// Convert C++ strings to Java strings
jstring device_name = env->NewStringUTF(info.device_name.c_str());
jstring driver_version = env->NewStringUTF(info.driver_version.c_str());
jstring api_version = env->NewStringUTF(
runanywhere::VulkanDetector::get_version_string(info.api_version).c_str()
);

// Create GPUInfo object
jobject gpu_info = env->NewObject(
gpu_info_class,
constructor,
(jboolean)info.is_available,
device_name,
driver_version,
api_version,
(jlong)info.max_memory_mb,
(jboolean)info.supports_compute
);

// Clean up local references
env->DeleteLocalRef(device_name);
env->DeleteLocalRef(driver_version);
env->DeleteLocalRef(api_version);
env->DeleteLocalRef(gpu_info_class);

RAC_LOG_INFO("DeviceJNI", "GPU detection complete: available=%d, device=%s",
info.is_available, info.device_name.c_str());

return gpu_info;
}

/**
* Quick check if Vulkan is supported
*
* Java signature:
* com.runanywhere.sdk.platform.DeviceCapabilities.isVulkanSupported() -> boolean
*/
JNIEXPORT jboolean JNICALL
Java_com_runanywhere_sdk_platform_DeviceCapabilities_isVulkanSupported(
JNIEnv* env, jclass /* clazz */) {

bool supported = runanywhere::VulkanDetector::is_vulkan_supported();
RAC_LOG_INFO("DeviceJNI", "Vulkan supported: %s", supported ? "YES" : "NO");
return (jboolean)supported;
}

/**
* List all available Vulkan devices
*
* Java signature:
* com.runanywhere.sdk.platform.DeviceCapabilities.listVulkanDevices() -> List<GPUInfo>
*/
JNIEXPORT jobject JNICALL
Java_com_runanywhere_sdk_platform_DeviceCapabilities_listVulkanDevices(
JNIEnv* env, jclass /* clazz */) {

RAC_LOG_INFO("DeviceJNI", "Listing all Vulkan devices...");

auto devices = runanywhere::VulkanDetector::list_devices();

// Find ArrayList class
jclass array_list_class = env->FindClass("java/util/ArrayList");
if (array_list_class == nullptr) {
RAC_LOG_ERROR("DeviceJNI", "Failed to find ArrayList class");
return nullptr;
}

jmethodID array_list_constructor = env->GetMethodID(array_list_class, "<init>", "()V");
jmethodID array_list_add = env->GetMethodID(array_list_class, "add", "(Ljava/lang/Object;)Z");

// Create ArrayList
jobject device_list = env->NewObject(array_list_class, array_list_constructor);

// Find GPUInfo class
jclass gpu_info_class = env->FindClass("com/runanywhere/sdk/platform/GPUInfo");
jmethodID gpu_info_constructor = env->GetMethodID(
gpu_info_class,
"<init>",
"(ZLjava/lang/String;Ljava/lang/String;Ljava/lang/String;JZ)V"
);

// Add each device to list
for (const auto& info : devices) {
jstring device_name = env->NewStringUTF(info.device_name.c_str());
jstring driver_version = env->NewStringUTF(info.driver_version.c_str());
jstring api_version = env->NewStringUTF(
runanywhere::VulkanDetector::get_version_string(info.api_version).c_str()
);

jobject gpu_info = env->NewObject(
gpu_info_class,
gpu_info_constructor,
(jboolean)info.is_available,
device_name,
driver_version,
api_version,
(jlong)info.max_memory_mb,
(jboolean)info.supports_compute
);

env->CallBooleanMethod(device_list, array_list_add, gpu_info);

env->DeleteLocalRef(device_name);
env->DeleteLocalRef(driver_version);
env->DeleteLocalRef(api_version);
env->DeleteLocalRef(gpu_info);
}

env->DeleteLocalRef(array_list_class);
env->DeleteLocalRef(gpu_info_class);

RAC_LOG_INFO("DeviceJNI", "Found %zu Vulkan device(s)", devices.size());

return device_list;
}

} // extern "C"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New JNI bridge for Vulkan detection is well-structured, but will always return unavailable since GGML_USE_VULKAN is undefined (due to GGML_VULKAN OFF in CMake). All native methods will return empty/unavailable data.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/jni/device_jni.cpp
Line: 1:163

Comment:
New JNI bridge for Vulkan detection is well-structured, but will always return unavailable since `GGML_USE_VULKAN` is undefined (due to `GGML_VULKAN OFF` in CMake). All native methods will return empty/unavailable data.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +32 to +34
// Cache GPU info to avoid repeated JNI calls
private var cachedGPUInfo: GPUInfo? = null
private var cacheInitialized = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU info cache never expires. If device state changes (driver update, GPU mode change), cached data will be stale. Consider adding cache TTL or listening to system events for cache invalidation.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-kotlin/src/jvmAndroidMain/kotlin/com/runanywhere/sdk/platform/DeviceCapabilities.kt
Line: 32:34

Comment:
GPU info cache never expires. If device state changes (driver update, GPU mode change), cached data will be stale. Consider adding cache TTL or listening to system events for cache invalidation.

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 11, 2026

Additional Comments (2)

sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt
Vulkan is explicitly disabled in CMake for Android (line 45), but the PR description claims this adds Vulkan GPU acceleration support for Android. This is contradictory - with GGML_VULKAN OFF, no Vulkan code will be compiled into the library.

    set(GGML_VULKAN ON CACHE BOOL "" FORCE)
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt
Line: 45:45

Comment:
Vulkan is explicitly disabled in CMake for Android (line 45), but the PR description claims this adds Vulkan GPU acceleration support for Android. This is contradictory - with `GGML_VULKAN OFF`, no Vulkan code will be compiled into the library.

```suggestion
    set(GGML_VULKAN ON CACHE BOOL "" FORCE)
```

How can I resolve this? If you propose a fix, please make it concise.

sdk/runanywhere-commons/include/rac/backends/rac_llm_llamacpp.h
Entire header file was reformatted with 436 lines changed, making it difficult to identify actual Vulkan-related changes vs whitespace/formatting changes. Formatting changes should be in separate commits from functional changes.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/include/rac/backends/rac_llm_llamacpp.h
Line: 1:436

Comment:
Entire header file was reformatted with 436 lines changed, making it difficult to identify actual Vulkan-related changes vs whitespace/formatting changes. Formatting changes should be in separate commits from functional changes.

How can I resolve this? If you propose a fix, please make it concise.

@AmanSwar
Copy link

Thanks for the investigation into Vulkan GPU acceleration, @MdTabish24. The build verification work and honest documentation of failures are appreciated. However, after thorough analysis, we're rejecting this PR due to fundamental issues with the integration approach — not just the runtime crashes.

Why the crashes happen

The core issue isn't that "Vulkan doesn't work on Android." Vulkan compute does work at production scale on Android — Tencent's ncnn (used in WeChat/QQ on billions of devices) and Alibaba's MNN (used in Taobao/Alipay) both use Vulkan compute for ML inference successfully on the same Mali and Adreno GPUs that crash here.

The problem is how this PR integrates Vulkan:

1. Static initialization causes unrecoverable crashes

llama.cpp registers the Vulkan backend statically at library load time via ggml_backend_vk_reg() in its global constructor. This means:

The crash happens before JNI_OnLoad, before main(), before any user code. The try-catch blocks in rac_llm_llamacpp.cpp can never execute because SIGSEGV is not a C++ exception — it's a signal that bypasses exception handling entirely.

2. No device-specific driver workarounds

Frameworks that successfully use Vulkan on Android (ncnn, MNN) maintain extensive per-device bug workaround databases. For example, ncnn's gpu.cpp includes:

  • bug_storage_buffer_no_l1 — Adreno GPUs lack L1 cache for storage buffers, must use VkImage instead of VkBuffer
  • Mali-G76 buffer index wrapping at 65,536 elements (compiler only uses low 16 bits)
  • Mali uniform buffer layout corruption in driver versions r19-r25
  • Adreno shader compiler failures with nested conditionals converging inside loops

This PR has zero driver workarounds. It's essentially "enable GGML_VULKAN flag and hope."

3. No dynamic loading

ncnn uses a custom Vulkan loader (simplevk) that dlopen()s libvulkan.so at runtime — if it fails, the app gracefully falls back to CPU. llama.cpp also supports this via GGML_BACKEND_DL=ON, which builds libggml-vulkan.so as a separate loadable module. This PR statically links Vulkan into the main binary, making crashes unavoidable on incompatible devices.

4. Dead/contradictory code

  • VulkanDetector (fork-based probe) is implemented but never called from anywhere
  • detect_gpu_capabilities() is declared in the header but never implemented
  • use_gpu_ is always false, so the GPU fallback branch is dead code
  • CMake disables Vulkan (GGML_VULKAN OFF), making all exception handling code unreachable
  • setenv("GGML_VK_DISABLE") in JNI_OnLoad uses a non-standard env var name (llama.cpp uses GGML_DISABLE_VULKAN)
  • The triple-disable (CMake OFF + env var + n_gpu_layers=0) means zero Vulkan code actually executes

5. llama.cpp's Vulkan backend isn't optimized for mobile

Even where it doesn't crash, llama.cpp's Vulkan backend is ~15x slower than CPU on Mali GPUs (discussion #9464). On Adreno 732, it produces gibberish output (issue #16881). Qualcomm themselves built a separate OpenCL backend for llama.cpp rather than fixing Vulkan on Adreno — that tells you where things stand upstream.

What a proper integration would require

For reference, a production-ready Vulkan integration would need:

  1. Dynamic backend loading — Build with GGML_BACKEND_DL=ON, ship libggml-vulkan.so separately, dlopen() only after safe probing
  2. Safe Vulkan probingdlopen("libvulkan.so")vkEnumeratePhysicalDevices() → verify device count > 0, all before loading the Vulkan backend
  3. Device bug database — Known-bad driver versions, per-vendor workarounds (like ncnn's gpu.cpp)
  4. Per-operator fallback — Run supported ops on GPU, unsupported on CPU (not all-or-nothing)
  5. Upstream improvement — llama.cpp's Vulkan shaders need mobile-specific optimization before Android GPU acceleration is viable

@Siddhesh2377
Copy link
Collaborator

Siddhesh2377 commented Feb 12, 2026

@MdTabish24
I would say GPU support on android is a waste of time, I know GPU are powerfull to do any Task but it sound best on paper, because Communicating With GPU on Android is a Hell, as there is too much of Back-an-Fourth in this, Android GPU's aren't Design like X86 or x86_64 GPU's Thus they can perform well on Gaming but stuck at ML Ops or Heavy Attention Calculations, For now Best Options are CPU / NPU at best

update : Sorry i didn't Read the previous comment, but just commented on this, cause i was facing major issues with GPU on device thus i though why not update you guys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments