Skip to content

VLM in Flutter SDK#367

Open
VyasGuru wants to merge 5 commits intoRunanywhereAI:mainfrom
VyasGuru:VLM-sdk
Open

VLM in Flutter SDK#367
VyasGuru wants to merge 5 commits intoRunanywhereAI:mainfrom
VyasGuru:VLM-sdk

Conversation

@VyasGuru
Copy link
Contributor

@VyasGuru VyasGuru commented Feb 16, 2026

The flutter sdk and the example app have been updated to support VLM.
The React-Native sdk also has been updated.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Vision Language Model (VLM) support for image understanding and description.
    • Introduced Vision Hub with Vision Chat feature for real-time image analysis.
    • Integrated camera access for live vision processing.
    • Added photo library support to analyze images from your device.
  • Dependencies

    • Added camera, image picker, and image processing libraries.

Greptile Summary

Added comprehensive Vision Language Model (VLM) support to both Flutter and React Native SDKs with example implementations.

Key Changes:

  • Flutter SDK: New DartBridgeVLM with FFI C++ interop, isolate-based streaming to prevent ANR, supports three image formats (file path, RGB pixels, base64)
  • React Native SDK: VLMBridge C++ implementation with proper lifecycle management, backend registration pattern, and streaming callbacks
  • Example Apps: Camera-based VLM interfaces with three modes (single capture, gallery pick, auto-streaming at 2.5s intervals)
  • Type Safety: Sealed classes in Dart and discriminated unions in TypeScript for robust image format handling
  • Permissions: Added camera and photo library permissions for iOS/Android with user-friendly dialogs
  • File Discovery: Recursive GGUF and mmproj file detection for VLM models

Architecture:

  • Matches iOS implementation patterns (CppBridge+VLM.swift)
  • Backend-agnostic design using rac_vlm_component_* C APIs
  • Proper resource cleanup and cancellation support
  • Telemetry integration for model load tracking

Confidence Score: 4/5

  • This PR is safe to merge with minor observations noted
  • The implementation is comprehensive and follows established patterns from iOS SDK. Code quality is high with proper error handling, resource management, and type safety. The issues previously flagged (pixel copying efficiency, unawaited disposal, recursive depth) have been acknowledged. No critical bugs found in the review.
  • No files require special attention - previous threading concerns were already documented

Important Files Changed

Filename Overview
sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge_vlm.dart Added VLM bridge implementation for Flutter with FFI C++ interop, isolate-based streaming to prevent ANR, and comprehensive lifecycle management
sdk/runanywhere-flutter/packages/runanywhere/lib/public/runanywhere.dart Added VLM model loading/unloading methods, image processing (streaming and non-streaming), with proper error handling and telemetry integration
sdk/runanywhere-react-native/packages/llamacpp/cpp/bridges/VLMBridge.cpp C++ VLM bridge implementation with proper resource management, model lifecycle, and streaming support
sdk/runanywhere-react-native/packages/llamacpp/src/RunAnywhere+VLM.ts React Native VLM extension with backend registration, model loading, and streaming image processing aligned with iOS implementation
examples/flutter/RunAnywhereAI/lib/features/vision/vlm_view_model.dart Flutter VLM view model with camera management, auto-streaming mode, and proper resource disposal patterns
examples/react-native/RunAnywhereAI/src/hooks/useVLMCamera.ts React hook for VLM camera with three modes (single capture, gallery, auto-streaming), proper lifecycle management and memoized service instance
sdk/runanywhere-react-native/packages/core/src/services/FileSystem.ts Added VLM-specific file discovery methods for GGUF and mmproj files with recursive directory search

Sequence Diagram

sequenceDiagram
    participant App as Example App
    participant SDK as RunAnywhere SDK
    participant Bridge as VLM Bridge (Dart/TS)
    participant CPP as C++ VLM Component
    participant Model as VLM Model (llama.cpp)

    Note over App,Model: VLM Model Loading Flow
    App->>SDK: loadVLMModel(modelId)
    SDK->>SDK: Find model in registry
    SDK->>SDK: Resolve GGUF file path
    SDK->>SDK: Find mmproj file
    SDK->>Bridge: loadModel(modelPath, mmprojPath)
    Bridge->>CPP: rac_vlm_component_create()
    CPP-->>Bridge: handle
    Bridge->>CPP: rac_vlm_component_load_model()
    CPP->>Model: Load model & mmproj
    Model-->>CPP: Model ready
    CPP-->>Bridge: RAC_SUCCESS
    Bridge-->>SDK: Success
    SDK-->>App: Model loaded

    Note over App,Model: Image Processing (Streaming)
    App->>SDK: processImageStream(image, prompt)
    SDK->>Bridge: processImageStream()
    Bridge->>Bridge: Spawn background isolate
    Bridge->>CPP: rac_vlm_component_process_stream()
    CPP->>Model: Encode image
    CPP->>Model: Generate tokens
    loop Token Generation
        Model-->>CPP: Token
        CPP-->>Bridge: onToken callback
        Bridge-->>SDK: Stream token
        SDK-->>App: Display token
    end
    Model-->>CPP: Complete
    CPP-->>Bridge: onComplete callback
    Bridge-->>SDK: Stream complete
    SDK-->>App: Final metrics

    Note over App,Model: Camera Auto-Streaming Mode
    App->>App: Start 2.5s timer
    loop Every 2.5 seconds
        App->>App: Capture camera frame
        App->>SDK: processImageStream(frame)
        SDK->>Bridge: Process with streaming
        Bridge->>CPP: Generate description
        CPP-->>App: Stream tokens to UI
    end
Loading

Last reviewed commit: 9ed97e3

Context used:

  • Context from dashboard - CLAUDE.md (source)

Have made changes to the flutter sdk for supporting vlm, following the swift sdk. Once the example app also has the corresponding changes, ill be able to fully test out this thing. Have test built it.
@coderabbitai
Copy link

coderabbitai bot commented Feb 16, 2026

📝 Walkthrough

Walkthrough

This PR adds comprehensive Vision Language Model (VLM) support to the RunAnywhere SDK and example applications across Flutter and React Native platforms, including model registration, camera integration, native bridges, type definitions, and image processing APIs with both streaming and non-streaming modes.

Changes

Cohort / File(s) Summary
Flutter iOS Platform Configuration
examples/flutter/RunAnywhereAI/ios/Podfile, examples/flutter/RunAnywhereAI/ios/Runner.xcodeproj/project.pbxproj, examples/flutter/RunAnywhereAI/ios/Runner/Info.plist
Added PERMISSION_CAMERA to preprocessor definitions, ENABLE_RESOURCE_ACCESS_CAMERA build flags, and new camera/speech/microphone usage description keys and support flags in Info.plist.
Flutter Vision UI Components
examples/flutter/RunAnywhereAI/lib/features/vision/vision_hub_view.dart, examples/flutter/RunAnywhereAI/lib/features/vision/vlm_camera_view.dart, examples/flutter/RunAnywhereAI/lib/features/vision/vlm_view_model.dart
New Vision Hub screen with Vision Chat and Image Generation options; full VLM camera interface with camera preview, live streaming, description panel, and model selection; comprehensive ViewModel managing camera lifecycle, model state, and image processing (single capture, gallery, auto-streaming).
Flutter App Integration
examples/flutter/RunAnywhereAI/lib/app/content_view.dart, examples/flutter/RunAnywhereAI/lib/app/runanywhere_ai_app.dart
Added Vision tab at index 1 in content view and VisionHubView import; registered SmolVLM 500M Instruct model in two locations with llama.cpp framework and multimodal modality.
Flutter Services & Permissions
examples/flutter/RunAnywhereAI/lib/core/services/permission_service.dart, examples/flutter/RunAnywhereAI/pubspec.yaml
Added requestCameraPermission and isCameraPermissionGranted methods; added three new dependencies (camera, image_picker, image).
Flutter Model Management
examples/flutter/RunAnywhereAI/lib/features/models/model_types.dart, examples/flutter/RunAnywhereAI/lib/features/models/model_status_components.dart, examples/flutter/RunAnywhereAI/lib/features/models/model_selection_sheet.dart
Added ModelSelectionContext.vlm enum value with title/category mappings; extended ModelRequiredOverlay to support VLM context with icon and description; deferred Navigator.pop calls to post-frame callbacks to prevent re-entrant navigation issues.
Flutter SDK Core Types
sdk/runanywhere-flutter/packages/runanywhere/lib/core/types/sdk_component.dart, sdk/runanywhere-flutter/packages/runanywhere/lib/foundation/error_types/sdk_error.dart
Added SDKComponent.vlm enum value; introduced VLM error types (vlmNotInitialized, vlmModelLoadFailed, vlmProcessingFailed, vlmInvalidImage, vlmCancelled) with error mapping and recovery suggestions.
Flutter SDK Native Bridge Infrastructure
sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge.dart, sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge_vlm.dart, sdk/runanywhere-flutter/packages/runanywhere/lib/native/ffi_types.dart
Added DartBridgeVLM singleton with loadModel, processImage (sync), processImageStream (async) methods using isolate-based processing; extended FFI types with VLM image format enums, option/result structs, and callback signatures for streaming.
Flutter SDK Public API & Types
sdk/runanywhere-flutter/packages/runanywhere/lib/public/runanywhere.dart, sdk/runanywhere-flutter/packages/runanywhere/lib/public/types/vlm_types.dart, sdk/runanywhere-flutter/packages/runanywhere/lib/public/types/types.dart
Added public VLM methods (describeImage, askAboutImage, processImage, processImageStream) with model lifecycle (loadVLMModel, unloadVLMModel, cancelVLMGeneration); defined VLMImage sealed class with file path/RGB pixels/base64 variants; VLMResult and VLMStreamingResult types with metrics and streaming support.
Flutter LlamaCPP Backend
sdk/runanywhere-flutter/packages/runanywhere/lib/public/extensions/runanywhere_frameworks.dart, sdk/runanywhere-flutter/packages/runanywhere_llamacpp/lib/llamacpp.dart, sdk/runanywhere-flutter/packages/runanywhere_llamacpp/lib/native/llamacpp_bindings.dart
Updated framework selection logic with explicit breaks; added SDKComponent.vlm to LlamaCPP capabilities; implemented _registerVlm method with guarded registration and corresponding unregisterVlm cleanup; added registerVlm/unregisterVlm FFI bindings.
React Native App Integration
examples/react-native/RunAnywhereAI/App.tsx
Registered SmolVLM 500M Instruct model via LlamaCPP.addVLMModel with model ID, URL, and memory requirement.
React Native Vision UI
examples/react-native/RunAnywhereAI/src/screens/VLMScreen.tsx, examples/react-native/RunAnywhereAI/src/screens/VisionHubScreen.tsx
Replaced placeholder VLMScreen with full implementation: live camera preview, description streaming panel, control bar (Photos/Main/Live/Model buttons), processing overlay, model-required overlay, model selection flow; added Image Generation disabled row to VisionHubScreen.
React Native Configuration
examples/react-native/RunAnywhereAI/android/app/src/main/AndroidManifest.xml, examples/react-native/RunAnywhereAI/ios/Podfile, examples/react-native/RunAnywhereAI/ios/RunAnywhereAI/Info.plist
Added android.permission.CAMERA; added Camera and PhotoLibrary permissions to iOS Podfile; added NSCameraUsageDescription and NSPhotoLibraryUsageDescription to Info.plist.
React Native Services
examples/react-native/RunAnywhereAI/src/services/VLMService.ts, sdk/runanywhere-react-native/packages/core/src/services/FileSystem.ts
Created VLMService with loadModel, isModelLoaded, describeImage, cancel, release methods; extended FileSystem with archive type inference, archive extraction handling, findGGUFInDirectory, and findMmprojForModel utilities.
React Native Hooks
examples/react-native/RunAnywhereAI/src/hooks/useVLMCamera.ts
New hook encapsulating VLM operations: camera permission requests, model loading, capture-and-describe, gallery selection, auto-streaming with interval-based frame capture, and proper cleanup on unmount.
React Native Model Management
examples/react-native/RunAnywhereAI/src/components/model/ModelSelectionSheet.tsx
Added VLM context to ModelSelectionContext enum with category filtering (Vision + Multimodal) and fallback handling for vision-only models when multimodal unavailable.
React Native Type Definitions
sdk/runanywhere-react-native/packages/core/src/types/VLMTypes.ts, sdk/runanywhere-react-native/packages/core/src/types/index.ts
Defined VLMImageFormat enum, VLMImage discriminated union (file path/RGB pixels/base64), VLMGenerationOptions, VLMResult, VLMStreamingResult, and VLMErrorCode types; re-exported from central types index.
React Native SDK Core API
sdk/runanywhere-react-native/packages/core/src/Public/Extensions/RunAnywhere+Models.ts, sdk/runanywhere-react-native/packages/core/src/Public/Extensions/index.ts, sdk/runanywhere-react-native/packages/core/src/index.ts
Added getMmprojPath utility; threaded framework parameter through FileSystem.downloadModel; exported extended model management functions from SDK entry point.
React Native LlamaCPP Module
sdk/runanywhere-react-native/packages/llamacpp/src/LlamaCPP.ts, sdk/runanywhere-react-native/packages/llamacpp/src/LlamaCppProvider.ts, sdk/runanywhere-react-native/packages/llamacpp/src/index.ts
Added LlamaCPPVLMModelOptions interface and addVLMModel method; updated capabilities to include vlm; implemented registerVLM flow separate from LLM backend; exported VLM functions and types from module entry point.
React Native VLM SDK Extension
sdk/runanywhere-react-native/packages/llamacpp/src/RunAnywhere+VLM.ts
Comprehensive VLM module: backend registration, model lifecycle (load/unload/check), image description (describeImage, askAboutImage), core processing (processImage with JSON options parsing), streaming support (AsyncGenerator with real-time tokens and metrics), cancellation, and helpers for image format conversion and base64 encoding.
React Native C++ Native Module
sdk/runanywhere-react-native/packages/llamacpp/src/native/NativeRunAnywhereLlama.ts, sdk/runanywhere-react-native/packages/llamacpp/src/specs/RunAnywhereLlama.nitro.ts
Updated module availability check to use singleton getter; added VLM methods to native interface specification: registerVLMBackend, loadVLMModel, isVLMModelLoaded, unloadVLMModel, processVLMImage, processVLMImageStream (with callback), cancelVLMGeneration.
React Native C++ Bridge Classes
sdk/runanywhere-react-native/packages/llamacpp/cpp/HybridRunAnywhereLlama.hpp, sdk/runanywhere-react-native/packages/llamacpp/cpp/HybridRunAnywhereLlama.cpp, sdk/runanywhere-react-native/packages/llamacpp/cpp/bridges/VLMBridge.hpp, sdk/runanywhere-react-native/packages/llamacpp/cpp/bridges/VLMBridge.cpp
Added VLM lifecycle, model loading, image processing (non-streaming and streaming with callbacks) methods to hybrid module; created VLMBridge singleton managing native component lifecycle, model state, and both synchronous/streaming image description with structured result output and error handling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

flutter-sdk, flutter-sample, react-native-sdk, react-native-sample, vision-language-model, camera-integration

🐰 A rabbit hops through cameras and frames,
Glimpsing visions with flowing streams,
Models dance on bridges bright,
From iOS code to Android light!
Language and sight now intertwine,
As visions of VLM align!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'VLM in Flutter SDK' is partially related to the changeset but is overly narrow and incomplete, as the PR also significantly updates the React Native SDK, example apps, and shared components. Consider a more comprehensive title like 'Add VLM support to Flutter and React Native SDKs with example implementations' to accurately reflect the scope of changes across multiple platforms and SDKs.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The PR description is comprehensive and well-structured, covering Flutter SDK changes, React Native SDK changes, example app updates, type safety, permissions, and file discovery. However, the required template sections (Type of Change, Testing, Labels, Checklist) are missing.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

VyasGuru and others added 4 commits February 17, 2026 19:32
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@VyasGuru VyasGuru marked this pull request as ready for review February 17, 2026 23:36
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

51 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +526 to +537
} else if (imageFormat == RacVlmImageFormat.rgbPixels &&
pixelData != null) {
// Allocate native memory for pixel data
pixelDataPtr = calloc<Uint8>(pixelData.length);
for (int i = 0; i < pixelData.length; i++) {
pixelDataPtr![i] = pixelData[i];
}
imagePtr.ref.filePath = nullptr;
imagePtr.ref.pixelData = pixelDataPtr!;
imagePtr.ref.base64Data = nullptr;
imagePtr.ref.dataSize = pixelData.length;
} else if (imageFormat == RacVlmImageFormat.base64 && base64Data != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pixel data is copied in a loop which is inefficient for large images

Suggested change
} else if (imageFormat == RacVlmImageFormat.rgbPixels &&
pixelData != null) {
// Allocate native memory for pixel data
pixelDataPtr = calloc<Uint8>(pixelData.length);
for (int i = 0; i < pixelData.length; i++) {
pixelDataPtr![i] = pixelData[i];
}
imagePtr.ref.filePath = nullptr;
imagePtr.ref.pixelData = pixelDataPtr!;
imagePtr.ref.base64Data = nullptr;
imagePtr.ref.dataSize = pixelData.length;
} else if (imageFormat == RacVlmImageFormat.base64 && base64Data != null) {
// Allocate native memory for pixel data
pixelDataPtr = calloc<Uint8>(pixelData.length);
final uint8List = pixelDataPtr!.asTypedList(pixelData.length);
uint8List.setAll(0, pixelData);
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge_vlm.dart
Line: 526:537

Comment:
pixel data is copied in a loop which is inefficient for large images

```suggestion
      // Allocate native memory for pixel data
      pixelDataPtr = calloc<Uint8>(pixelData.length);
      final uint8List = pixelDataPtr!.asTypedList(pixelData.length);
      uint8List.setAll(0, pixelData);
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +687 to +691
// Allocate native memory for pixel data
pixelDataPtr = calloc<Uint8>(params.pixelData!.length);
for (int i = 0; i < params.pixelData!.length; i++) {
pixelDataPtr![i] = params.pixelData![i];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same inefficient pixel data copying pattern

Suggested change
// Allocate native memory for pixel data
pixelDataPtr = calloc<Uint8>(params.pixelData!.length);
for (int i = 0; i < params.pixelData!.length; i++) {
pixelDataPtr![i] = params.pixelData![i];
}
// Allocate native memory for pixel data
pixelDataPtr = calloc<Uint8>(params.pixelData!.length);
final uint8List = pixelDataPtr!.asTypedList(params.pixelData!.length);
uint8List.setAll(0, params.pixelData!);
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge_vlm.dart
Line: 687:691

Comment:
same inefficient pixel data copying pattern

```suggestion
      // Allocate native memory for pixel data
      pixelDataPtr = calloc<Uint8>(params.pixelData!.length);
      final uint8List = pixelDataPtr!.asTypedList(params.pixelData!.length);
      uint8List.setAll(0, params.pixelData!);
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +88 to +89
unawaited(_cameraController?.dispose());
_cameraController = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unawaited used on dispose calls but camera disposal errors should be handled

Suggested change
unawaited(_cameraController?.dispose());
_cameraController = null;
_cameraController?.dispose().catchError((e) {
debugPrint('Camera disposal error: $e');
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: examples/flutter/RunAnywhereAI/lib/features/vision/vlm_view_model.dart
Line: 88:89

Comment:
`unawaited` used on dispose calls but camera disposal errors should be handled

```suggestion
    _cameraController?.dispose().catchError((e) {
      debugPrint('Camera disposal error: $e');
    });
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +188 to +210
const performAutoStreamCapture = useCallback(async () => {
if (!cameraRef.current) return;

try {
// FIX: Removed 'qualityPrioritization'
const photo = await cameraRef.current.takePhoto({
enableShutterSound: false
});

let accumulatedText = '';
await vlmService.describeImage(
photo.path,
AUTO_STREAM_PROMPT,
AUTO_STREAM_MAX_TOKENS,
(token) => {
accumulatedText += token;
setCurrentDescription(accumulatedText);
}
);
} catch (err) {
console.warn('[useVLMCamera] Auto-stream skipped frame:', err);
}
}, [cameraRef, vlmService]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding a guard to prevent queuing captures if one is already in progress

Suggested change
const performAutoStreamCapture = useCallback(async () => {
if (!cameraRef.current) return;
try {
// FIX: Removed 'qualityPrioritization'
const photo = await cameraRef.current.takePhoto({
enableShutterSound: false
});
let accumulatedText = '';
await vlmService.describeImage(
photo.path,
AUTO_STREAM_PROMPT,
AUTO_STREAM_MAX_TOKENS,
(token) => {
accumulatedText += token;
setCurrentDescription(accumulatedText);
}
);
} catch (err) {
console.warn('[useVLMCamera] Auto-stream skipped frame:', err);
}
}, [cameraRef, vlmService]);
const performAutoStreamCapture = useCallback(async () => {
if (!cameraRef.current || isProcessing) return;
setIsProcessing(true);
try {
const photo = await cameraRef.current.takePhoto({
enableShutterSound: false
});
let accumulatedText = '';
await vlmService.describeImage(
photo.path,
AUTO_STREAM_PROMPT,
AUTO_STREAM_MAX_TOKENS,
(token) => {
accumulatedText += token;
setCurrentDescription(accumulatedText);
}
);
} catch (err) {
console.warn('[useVLMCamera] Auto-stream skipped frame:', err);
} finally {
setIsProcessing(false);
}
}, [cameraRef, vlmService, isProcessing]);
Prompt To Fix With AI
This is a comment left during a code review.
Path: examples/react-native/RunAnywhereAI/src/hooks/useVLMCamera.ts
Line: 188:210

Comment:
consider adding a guard to prevent queuing captures if one is already in progress

```suggestion
  const performAutoStreamCapture = useCallback(async () => {
    if (!cameraRef.current || isProcessing) return;
    
    setIsProcessing(true);
    try {
      const photo = await cameraRef.current.takePhoto({
        enableShutterSound: false
      });

      let accumulatedText = '';
      await vlmService.describeImage(
        photo.path,
        AUTO_STREAM_PROMPT,
        AUTO_STREAM_MAX_TOKENS,
        (token) => {
          accumulatedText += token;
          setCurrentDescription(accumulatedText);
        }
      );
    } catch (err) {
      console.warn('[useVLMCamera] Auto-stream skipped frame:', err);
    } finally {
      setIsProcessing(false);
    }
  }, [cameraRef, vlmService, isProcessing]);
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +630 to +663
*/
async findGGUFInDirectory(directory: string): Promise<string> {
if (!RNFS) {
throw new Error('react-native-fs not available');
}

try {
const contents = await RNFS.readDir(directory);

// Look for .gguf files (not mmproj)
for (const item of contents) {
if (item.isFile() && item.name.endsWith('.gguf') && !item.name.includes('mmproj')) {
logger.info(`Found main GGUF model: ${item.name}`);
return item.path;
}
}

// If not found, check nested directories
for (const item of contents) {
if (item.isDirectory()) {
try {
return await this.findGGUFInDirectory(item.path);
} catch {
// Continue searching other directories
}
}
}

throw new Error(`No GGUF model file found in ${directory}`);
} catch (error) {
logger.error(`Error finding GGUF file: ${error}`);
throw error;
}
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding a depth limit or breadth-first search to prevent excessive recursion for deeply nested directories

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-react-native/packages/core/src/services/FileSystem.ts
Line: 630:663

Comment:
consider adding a depth limit or breadth-first search to prevent excessive recursion for deeply nested directories

How can I resolve this? If you propose a fix, please make it concise.

@shubhammalhotra28
Copy link
Contributor

@coderabbitai please do a detailed review for this PR. Thanks!

@shubhammalhotra28
Copy link
Contributor

@greptile please do a detailed review for this PR. Thanks!

@coderabbitai
Copy link

coderabbitai bot commented Feb 17, 2026

@shubhammalhotra28: I'll conduct a detailed review of this PR which adds Vision Language Model (VLM) support to both Flutter and React Native SDKs.

Based on the PR objectives, I'll review:

Core SDK Changes:

  • Flutter VLM implementation (DartBridgeVLM, isolate-based FFI, public APIs)
  • React Native VLM implementation (TypeScript extensions, C++ bridges, native interop)
  • Type definitions and error handling for both platforms
  • FileSystem service enhancements for VLM model handling

Example Applications:

  • Flutter example app (VLM ViewModel, camera view, UI components)
  • React Native example app (VLM hooks, screens, camera integration)
  • Camera permissions and lifecycle management

Architecture & Integration:

  • Shared C++ core integration (rac_vlm_component_* APIs)
  • Streaming token generation and callbacks
  • Resource cleanup and memory management
  • Error handling and recovery mechanisms

I'll analyze the code quality, architectural patterns, potential issues, and provide actionable feedback.

✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sdk/runanywhere-flutter/packages/runanywhere_llamacpp/lib/llamacpp.dart (1)

274-280: ⚠️ Potential issue | 🟠 Major

dispose() does not reset _isVlmRegistered, causing stale state on re-registration.

dispose() clears _bindings, _registeredModels, and _isRegistered, but leaves _isVlmRegistered = true. If register() is called again after dispose(), _registerVlm() (Line 160) will see _isVlmRegistered == true and skip VLM registration entirely — even though the native bindings were destroyed.

🐛 Proposed fix
   static void dispose() {
     _bindings = null;
     _registeredModels.clear();
     _isRegistered = false;
+    _isVlmRegistered = false;
     _logger.info('LlamaCpp disposed');
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-flutter/packages/runanywhere_llamacpp/lib/llamacpp.dart`
around lines 274 - 280, dispose() currently clears _bindings, _registeredModels,
and _isRegistered but fails to reset _isVlmRegistered, which causes
_registerVlm() to skip re-registering VLMs after dispose; update dispose() to
also set _isVlmRegistered = false so that subsequent calls to register() and
_registerVlm() will perform VLM registration against the new/native bindings,
ensuring the internal state (_bindings, _registeredModels, _isRegistered,
_isVlmRegistered) is fully reset.
🧹 Nitpick comments (22)
sdk/runanywhere-flutter/packages/runanywhere/lib/foundation/error_types/sdk_error.dart (2)

322-323: vlmCancelled recovery suggestion reads as a description, not actionable guidance.

Every other recoverySuggestion tells the user what to do. 'The VLM generation was cancelled by the user.' simply restates the error. Consider something like 'No action needed. Re-initiate VLM processing to try again.' to stay consistent with the pattern.

✏️ Suggested wording
       case SDKErrorType.vlmCancelled:
-        return 'The VLM generation was cancelled by the user.';
+        return 'No action needed. Re-initiate VLM processing to try again.';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-flutter/packages/runanywhere/lib/foundation/error_types/sdk_error.dart`
around lines 322 - 323, The recoverySuggestion for SDKErrorType.vlmCancelled
currently restates the error; update the recovery guidance in the
recoverySuggestion mapping (where SDKErrorType.vlmCancelled is handled) to
provide an actionable suggestion such as "No action needed. Re-initiate VLM
processing to try again." so it matches the pattern of other cases and gives
clear next steps to the user.

653-693: VLM factory constructors omit underlyingError, unlike similar existing factories.

vlmModelLoadFailed and vlmProcessingFailed don't accept an optional underlyingError parameter, while analogous factories like modelLoadFailed (line 373), requestFailed (line 479), and databaseInitializationFailed (line 588) do. Preserving the exception chain aids debugging.

♻️ Proposed fix to add underlyingError support
   /// VLM model load failed error
-  static SDKError vlmModelLoadFailed(String message) {
+  static SDKError vlmModelLoadFailed(String message, {Object? underlyingError}) {
     return SDKError(
       'VLM model load failed: $message',
       SDKErrorType.vlmModelLoadFailed,
+      underlyingError: underlyingError,
     );
   }

   /// VLM processing failed error
-  static SDKError vlmProcessingFailed(String message) {
+  static SDKError vlmProcessingFailed(String message, {Object? underlyingError}) {
     return SDKError(
       'VLM processing failed: $message',
       SDKErrorType.vlmProcessingFailed,
+      underlyingError: underlyingError,
     );
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-flutter/packages/runanywhere/lib/foundation/error_types/sdk_error.dart`
around lines 653 - 693, Add an optional dynamic underlyingError parameter to the
VLM factory functions vlmModelLoadFailed and vlmProcessingFailed and pass it
into the SDKError constructor so the exception chain is preserved (mirror the
signatures/pattern used by modelLoadFailed, requestFailed, and
databaseInitializationFailed); update vlmModelLoadFailed(String message) to
vlmModelLoadFailed(String message, [dynamic? underlyingError]) and
vlmProcessingFailed(String message, [dynamic? underlyingError]) and include
underlyingError in the returned SDKError(...) call.
examples/flutter/RunAnywhereAI/pubspec.yaml (1)

46-47: Using pure-Dart image for per-frame BGRA→RGB conversion will be a performance bottleneck in streaming VLM pipelines.

The Dart Image Library provides the ability to load, save, and manipulate images in a variety of image file formats. The library can be used with both dart:io and dart:html, for command-line, Flutter, and web applications. However, it performs all processing in pure Dart on the CPU with no native acceleration. For a VLM streaming use case (continuously converting BGRA camera frames), running this on the main Dart thread will cause dropped frames and UI jank.

Mitigations to consider:

  • Push the conversion into the native FFI/C++ layer already used elsewhere in the VLM pipeline, so the pixel format conversion happens before the buffer reaches Dart.
  • If it must stay in Dart, dispatch via compute() or a dedicated Isolate to avoid blocking the UI thread.
  • Consider whether the camera streaming API can be configured to produce a format the model already accepts natively, avoiding the conversion entirely.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/pubspec.yaml` around lines 46 - 47, The
pubspec currently depends on the pure-Dart "image" package for per-frame
BGRA→RGB conversion (the "image" dependency), which will cause UI jank in
streaming VLM pipelines; instead, move the conversion out of the main Dart
thread by implementing the pixel-format conversion in the native FFI/C++ layer
already used elsewhere in the pipeline so frames arrive to Dart in the
model-native format, or if native changes are not possible, dispatch the
per-frame conversion off the UI isolate via compute()/a dedicated Isolate when
handling camera frames (where the conversion currently occurs), and/or configure
the camera streaming API to emit a supported pixel format to eliminate the
conversion step.
examples/flutter/RunAnywhereAI/lib/app/runanywhere_ai_app.dart (1)

231-231: Log message says "VLM models registered" (plural) but only one model is registered.

If this is the only VLM entry planned for this release, align the message with the code:

📝 Suggested correction
-    debugPrint('✅ VLM models registered');
+    debugPrint('✅ VLM model registered (SmolVLM 500M)');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/app/runanywhere_ai_app.dart` at line 231,
The log string used in debugPrint ('✅ VLM models registered') is plural but only
one VLM model is registered; update the message to singular (e.g., '✅ VLM model
registered') or, if the count is dynamic, interpolate the actual number when
calling debugPrint. Locate the debugPrint call in runanywhere_ai_app.dart (the
'✅ VLM models registered' string) and change it to the appropriate singular text
or to use the model count variable in the message.
examples/flutter/RunAnywhereAI/lib/core/services/permission_service.dart (1)

176-233: Consider extracting a shared permission-request helper to reduce duplication.

requestCameraPermission, requestMicrophonePermission, and requestSpeechRecognitionPermission are structurally identical — they differ only in the Permission type and user-facing strings. A private generic method would eliminate ~50 lines of repeated logic.

♻️ Sketch of a shared helper
+ Future<bool> _requestPermission(
+   BuildContext context, {
+   required Permission permission,
+   required String title,
+   required String message,
+   required String deniedSnackbarMessage,
+ }) async {
+   final status = await permission.status;
+   if (status.isGranted) return true;
+
+   if (status.isPermanentlyDenied) {
+     if (!context.mounted) return false;
+     final shouldOpenSettings = await _showSettingsDialog(context, title: title, message: message);
+     if (shouldOpenSettings) await openAppSettings();
+     return false;
+   }
+
+   final result = await permission.request();
+   if (result.isGranted) return true;
+   if (!context.mounted) return false;
+
+   if (result.isPermanentlyDenied) {
+     final shouldOpenSettings = await _showSettingsDialog(context, title: title, message: message);
+     if (shouldOpenSettings) await openAppSettings();
+   } else if (result.isDenied) {
+     _showDeniedSnackbar(context, deniedSnackbarMessage);
+   }
+   return false;
+ }

Then each public method becomes a one-liner delegating to _requestPermission.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/core/services/permission_service.dart`
around lines 176 - 233, Extract the duplicated logic in
requestCameraPermission/requestMicrophonePermission/requestSpeechRecognitionPermission
into a single private helper (e.g. _requestPermission) that accepts the
Permission object and the title/message/snackbar text, then replace each public
method with a call that passes
Permission.camera/Permission.microphone/Permission.speech and the appropriate
strings; ensure _requestPermission handles status.isGranted,
status.isPermanentlyDenied (showing _showSettingsDialog + openAppSettings),
requests permission, handles
result.isGranted/result.isPermanentlyDenied/result.isDenied, checks
context.mounted before UI actions, and returns the same boolean semantics as the
original methods.
examples/react-native/RunAnywhereAI/src/screens/VLMScreen.tsx (1)

93-99: handleMainAction also depends on [vlm] — recreated every render.

Same issue as above — vlm object identity changes on every render. Since this is passed to a TouchableOpacity.onPress, the constant recreation is wasteful but not infinite-looping. Consistent fix would be to depend on the specific vlm properties used.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/react-native/RunAnywhereAI/src/screens/VLMScreen.tsx` around lines
93 - 99, The handleMainAction callback is being recreated every render because
it depends on the whole vlm object; change its dependency to the specific
properties/methods it uses instead of vlm. Update the useCallback for
handleMainAction to depend on vlm.isAutoStreaming, vlm.toggleAutoStreaming and
vlm.captureAndDescribe (or destructure those three from vlm above and depend on
the destructured values) so the function is only recreated when the actual
inputs change.
examples/flutter/RunAnywhereAI/lib/features/vision/vision_hub_view.dart (1)

70-96: Prefer design system constants over raw Colors.indigo and inline TextStyle.

The Vision Chat tile above correctly uses AppColors.primaryPurple, but this "coming soon" tile reaches outside the design system. Using Colors.indigo directly also makes future theming harder.

Additionally, enabled: false (Line 71) is redundant when no onTap is provided — ListTile.onTap defaults to null, which already prevents interaction.

♻️ Proposed fix
-          ListTile(
-            enabled: false,
+          ListTile(
             leading: Container(
               width: 44,
               height: 44,
               decoration: BoxDecoration(
-                color: Colors.indigo.withValues(alpha: 0.5),
+                color: AppColors.primaryPurple.withValues(alpha: 0.4),
                 borderRadius: BorderRadius.circular(10),
               ),
               child: Icon(
                 Icons.auto_awesome,
-                color: Colors.white.withValues(alpha: 0.5),
+                color: Colors.white.withValues(alpha: 0.4),
               ),
             ),
-            title: Text(
-              'Image Generation',
-              style: TextStyle(
-                color: AppColors.textSecondary(context).withValues(alpha: 0.5),
-              ),
+            title: Text(
+              'Image Generation',
+              style: AppTypography.body(context).copyWith(
+                color: AppColors.textSecondary(context).withValues(alpha: 0.5),
+              ),
             ),
-            subtitle: Text(
-              'Create images with Stable Diffusion',
-              style: TextStyle(
-                color: AppColors.textSecondary(context).withValues(alpha: 0.5),
-              ),
+            subtitle: Text(
+              'Create images with Stable Diffusion',
+              style: AppTypography.caption(context).copyWith(
+                color: AppColors.textSecondary(context).withValues(alpha: 0.5),
+              ),
             ),
           ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/features/vision/vision_hub_view.dart`
around lines 70 - 96, Replace the hard-coded Colors.indigo and inline TextStyle
usages in the ListTile with design-system tokens and remove the redundant
enabled: false; specifically, in the Vision "Image Generation" ListTile replace
Container.decoration.color: Colors.indigo.withValues(alpha: 0.5) with a design
token (e.g., AppColors.primaryPurple.withOpacity(0.5) or the appropriate
AppColors.* value), replace Icon and Text TextStyle color calls that use
.withValues(alpha: 0.5) with the design system color helpers (e.g.,
AppColors.textSecondary(context).withOpacity(0.5) or an existing AppTextStyles
variant) instead of inline TextStyle, and delete the enabled: false property
since ListTile.onTap is null by default; update references in this widget
(ListTile, the leading Container, Icon, title Text, subtitle Text) accordingly.
examples/flutter/RunAnywhereAI/lib/features/vision/vlm_camera_view.dart (1)

316-328: Unnecessary unawaited wrapper around snackbar.

ScaffoldMessenger.of(context).showSnackBar(...) is synchronous (returns ScaffoldFeatureController, not a Future). The .closed.then((_) => null) chain creates a Future that is then unawaited — but there's no reason to reference .closed at all. Simplify to a direct showSnackBar call.

🔧 Simplified snackbar
       onPressed: () {
         unawaited(Clipboard.setData(ClipboardData(text: _viewModel.currentDescription)));
-        unawaited(
-          ScaffoldMessenger.of(context).showSnackBar(
-            const SnackBar(
-              content: Text('Description copied to clipboard'),
-              duration: Duration(seconds: 2),
-            ),
-          ).closed.then((_) => null),
+        ScaffoldMessenger.of(context).showSnackBar(
+          const SnackBar(
+            content: Text('Description copied to clipboard'),
+            duration: Duration(seconds: 2),
+          ),
         );
       },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/features/vision/vlm_camera_view.dart`
around lines 316 - 328, The onPressed handler wraps
ScaffoldMessenger.of(context).showSnackBar(...).closed in an unnecessary
unawaited Future chain; remove the .closed.then(...) and the unawaited wrapper
for the snackbar call and just call
ScaffoldMessenger.of(context).showSnackBar(...) directly while keeping the
Clipboard.setData call (Clipboard.setData(ClipboardData(text:
_viewModel.currentDescription))) as-is; update the onPressed closure containing
Clipboard.setData and ScaffoldMessenger.of(context).showSnackBar to remove
references to .closed and unawaited around the snackbar.
sdk/runanywhere-flutter/packages/runanywhere/lib/public/extensions/runanywhere_frameworks.dart (1)

37-97: Inconsistent indentation within getFrameworks method body.

Lines 39–82 (the switch block) are indented at 2 spaces from the method boundary, while lines 86–96 (the model-filtering loop) use the expected 4-space indent. This makes the method body look like two different scopes. Likely a merge/rebase artifact.

🔧 Re-indent the switch block to match the rest of the method
   static Future<List<InferenceFramework>> getFrameworks(
-    SDKComponent capability) async {
-  final frameworks = <InferenceFramework>{};
-
-  // Map capability to model categories
-  final Set<ModelCategory> relevantCategories;
-
-  switch (capability) {
-    case SDKComponent.llm:
-      relevantCategories = {
-        ModelCategory.language,
-        ModelCategory.multimodal
-      };
-      break;
-
-    case SDKComponent.stt:
-      relevantCategories = {ModelCategory.speechRecognition};
-      break;
-
-    case SDKComponent.tts:
-      relevantCategories = {ModelCategory.speechSynthesis};
-      break;
-
-    case SDKComponent.vad:
-      relevantCategories = {ModelCategory.audio};
-      break;
-
-    case SDKComponent.voice:
-      relevantCategories = {
-        ModelCategory.language,
-        ModelCategory.speechRecognition,
-        ModelCategory.speechSynthesis
-      };
-      break;
-
-    case SDKComponent.embedding:
-      relevantCategories = {
-        ModelCategory.language,
-        ModelCategory.multimodal
-      };
-      break;
-
-    case SDKComponent.vlm:
-      relevantCategories = {ModelCategory.multimodal};
-      break;
-  }
-
-    
+      SDKComponent capability) async {
+    final frameworks = <InferenceFramework>{};
+
+    // Map capability to model categories
+    final Set<ModelCategory> relevantCategories;
+
+    switch (capability) {
+      case SDKComponent.llm:
+        relevantCategories = {
+          ModelCategory.language,
+          ModelCategory.multimodal,
+        };
+        break;
+
+      case SDKComponent.stt:
+        relevantCategories = {ModelCategory.speechRecognition};
+        break;
+
+      case SDKComponent.tts:
+        relevantCategories = {ModelCategory.speechSynthesis};
+        break;
+
+      case SDKComponent.vad:
+        relevantCategories = {ModelCategory.audio};
+        break;
+
+      case SDKComponent.voice:
+        relevantCategories = {
+          ModelCategory.language,
+          ModelCategory.speechRecognition,
+          ModelCategory.speechSynthesis,
+        };
+        break;
+
+      case SDKComponent.embedding:
+        relevantCategories = {
+          ModelCategory.language,
+          ModelCategory.multimodal,
+        };
+        break;
+
+      case SDKComponent.vlm:
+        relevantCategories = {ModelCategory.multimodal};
+        break;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-flutter/packages/runanywhere/lib/public/extensions/runanywhere_frameworks.dart`
around lines 37 - 97, The switch block in getFrameworks is mis-indented (2
spaces) causing inconsistent scope appearance; re-indent the entire switch and
its case blocks (the assignment to relevantCategories for each SDKComponent
case) to match the method's 4-space indentation, ensuring the declaration of
relevantCategories and subsequent logic (the for loop over
RunAnywhere.availableModels, use of frameworks, and building result) are
consistently indented and aligned with the rest of the method body.
sdk/runanywhere-react-native/packages/llamacpp/src/LlamaCPP.ts (1)

230-297: addVLMModel largely duplicates addModel — consider extracting shared logic.

The file-existence check, ModelInfo construction, and ModelRegistry.registerModel call are repeated almost verbatim between addModel (Lines 143–206) and addVLMModel (Lines 230–297). The only differences are the hardcoded category, supportsThinking, and mmprojUrl in metadata.

Extracting a private helper (e.g., _registerModelInfo(options, overrides)) would reduce the duplication while keeping both public methods thin and readable. Not blocking, but worth addressing before the two paths inevitably drift out of sync.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/src/LlamaCPP.ts` around lines
230 - 297, addVLMModel duplicates addModel: extract the shared file-existence,
ModelInfo construction and ModelRegistry.registerModel logic into a private
helper (e.g., _registerModelInfo or createModelInfoAndRegister) that takes the
incoming options and an overrides object for category, supportsThinking, and
metadata.mmprojUrl; have addModel and addVLMModel call this helper, passing
LlamaCPPVLMModelOptions/ModelOptions respectively, reuse FileSystem checks
(modelExists/getModelPath), populate common fields (id, name, format, localPath,
isDownloaded, createdAt/updatedAt, etc.) inside the helper, and call
ModelRegistry.registerModel(modelInfo) there to keep both public methods thin
and prevent divergence.
sdk/runanywhere-react-native/packages/llamacpp/cpp/HybridRunAnywhereLlama.cpp (2)

100-120: base64Decode rebuilds its lookup table on every call.

The 256-element lookup table T is reconstructed each invocation. For hot paths processing streaming frames, consider making it static const or moving it to namespace scope.

Move lookup table to static scope
 std::vector<uint8_t> base64Decode(const std::string& encoded) {
   static const std::string base64_chars =
       "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

+  static const auto T = []() {
+    std::vector<int> table(256, -1);
+    for (int i = 0; i < 64; i++) table[base64_chars[i]] = i;
+    return table;
+  }();
+
   std::vector<uint8_t> decoded;
-  std::vector<int> T(256, -1);
-  for (int i = 0; i < 64; i++) T[base64_chars[i]] = i;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-react-native/packages/llamacpp/cpp/HybridRunAnywhereLlama.cpp`
around lines 100 - 120, The base64Decode function rebuilds the 256-entry lookup
table T on every call; move the table out of the function so it is constructed
once (e.g., declare a static const std::array<int,256> or a namespace-scope
const vector initialized from base64_chars) and have base64Decode reference that
table instead of creating T locally; keep base64_chars accessible (static or
namespace-scope) and preserve the existing mapping logic used by base64Decode.

540-556: Stream callbacks capture references to stack-local variables — safe only because processStream blocks.

streamCallbacks.onToken and streamCallbacks.onError capture fullResponse and streamError by reference. This is safe only if VLMBridge::shared().processStream() is synchronous and blocks until streaming completes (matching the LLM generateStream pattern on Lines 324-346). If the streaming call is ever made asynchronous, these references become dangling.

Worth a brief comment to document this invariant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-react-native/packages/llamacpp/cpp/HybridRunAnywhereLlama.cpp`
around lines 540 - 556, The stream callbacks (VLMStreamCallbacks streamCallbacks
with onToken and onError) currently capture fullResponse and streamError by
reference which is only safe because VLMBridge::shared().processStream() is
synchronous; add a concise comment above the streamCallbacks definition stating
that invariant (i.e., these references must not outlive the call because
processStream blocks, matching the generateStream pattern) and either change the
captures to by-value or use an owning/shared string (e.g.,
std::shared_ptr<std::string>) if you plan to make processing asynchronous in the
future; ensure the comment references onToken, onError, fullResponse,
streamError and VLMBridge::shared().processStream() so future maintainers notice
the requirement.
sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge_vlm.dart (2)

672-708: Duplicated image struct setup between _processInIsolate and _vlmStreamingIsolateEntry.

Lines 672-708 are nearly identical to Lines 514-548. This duplication increases maintenance burden — any format change needs updating in both places.

Consider extracting a shared helper function for image struct initialization.

Proposed extraction
/// Shared helper to populate image struct and allocate native memory.
/// Returns allocated pointers that must be freed by the caller.
({Pointer<Utf8>? filePathPtr, Pointer<Uint8>? pixelDataPtr, Pointer<Utf8>? base64DataPtr})
_setupImageStruct(
  Pointer<RacVlmImageStruct> imagePtr,
  int imageFormat,
  String? filePath,
  Uint8List? pixelData,
  int width,
  int height,
  String? base64Data,
) {
  imagePtr.ref.format = imageFormat;
  imagePtr.ref.width = width;
  imagePtr.ref.height = height;
  
  Pointer<Utf8>? filePathPtr;
  Pointer<Uint8>? pixelDataPtr;
  Pointer<Utf8>? base64DataPtr;

  if (imageFormat == RacVlmImageFormat.filePath && filePath != null) {
    filePathPtr = filePath.toNativeUtf8();
    imagePtr.ref.filePath = filePathPtr;
    imagePtr.ref.pixelData = nullptr;
    imagePtr.ref.base64Data = nullptr;
    imagePtr.ref.dataSize = 0;
  } else if (imageFormat == RacVlmImageFormat.rgbPixels && pixelData != null) {
    pixelDataPtr = calloc<Uint8>(pixelData.length);
    pixelDataPtr.asTypedList(pixelData.length).setAll(0, pixelData);
    imagePtr.ref.filePath = nullptr;
    imagePtr.ref.pixelData = pixelDataPtr;
    imagePtr.ref.base64Data = nullptr;
    imagePtr.ref.dataSize = pixelData.length;
  } else if (imageFormat == RacVlmImageFormat.base64 && base64Data != null) {
    base64DataPtr = base64Data.toNativeUtf8();
    imagePtr.ref.filePath = nullptr;
    imagePtr.ref.pixelData = nullptr;
    imagePtr.ref.base64Data = base64DataPtr;
    imagePtr.ref.dataSize = base64Data.length;
  }
  
  return (filePathPtr: filePathPtr, pixelDataPtr: pixelDataPtr, base64DataPtr: base64DataPtr);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge_vlm.dart`
around lines 672 - 708, There is duplicated image-struct initialization in
_processInIsolate and _vlmStreamingIsolateEntry; extract a shared helper (e.g.,
_setupImageStruct) that accepts Pointer<RacVlmImageStruct>, imageFormat,
filePath, pixelData, width, height, base64Data and returns the allocated
pointers (filePathPtr, pixelDataPtr, base64DataPtr) so both callers call the
helper to populate imagePtr.ref and receive pointers to free later; update both
_processInIsolate and _vlmStreamingIsolateEntry to use this helper and ensure
callers still free any returned native allocations.

526-532: Byte-by-byte pixel data copy could use bulk copy.

The loop at Lines 530-532 copies pixel data one byte at a time. For large images (e.g., 1920×1080 RGB = ~6MB), this is slow. Dart FFI provides asTypedList for efficient bulk operations.

Proposed fix using bulk copy
       // Allocate native memory for pixel data
       pixelDataPtr = calloc<Uint8>(pixelData.length);
-      for (int i = 0; i < pixelData.length; i++) {
-        pixelDataPtr![i] = pixelData[i];
-      }
+      pixelDataPtr!.asTypedList(pixelData.length).setAll(0, pixelData);

Same applies to the duplicated code in _vlmStreamingIsolateEntry (Lines 688-691).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge_vlm.dart`
around lines 526 - 532, The byte-by-byte loop copying pixelData into the
allocated native buffer is slow; replace the per-byte loop in the branch
handling RacVlmImageFormat.rgbPixels (where pixelDataPtr is allocated with
calloc<Uint8>(pixelData.length)) with a bulk copy using
pixelDataPtr!.asTypedList(pixelData.length).setAll(0, pixelData) or equivalent
to copy the Uint8List in one operation; make the identical change in the
duplicated copy inside _vlmStreamingIsolateEntry so both locations use
asTypedList bulk copying instead of manual for-loops.
examples/flutter/RunAnywhereAI/lib/features/vision/vlm_view_model.dart (3)

143-184: Same temp file cleanup concern in single-capture mode.

takePicture() on Line 155 also creates a temp file. For consistency, consider cleaning up after processing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/features/vision/vlm_view_model.dart`
around lines 143 - 184, The describeCurrentFrame method currently calls
_cameraController!.takePicture() which leaves a temp file (xFile); after
processing the file (creating sdk.VLMImage.filePath and streaming results)
ensure the temp file is deleted to avoid leaks by adding an await xFile.delete()
(or File(xFile.path).delete()) in the finally block of describeCurrentFrame so
the file is removed whether processing succeeds or fails, taking care not to
delete it before the SDK has read it.

87-92: disposeCamera nulls the controller while async dispose is still in-flight.

_cameraController?.dispose() is async, but _cameraController is set to null synchronously on the next line. If any other code path checks _cameraController or _isCameraInitialized during the async disposal, it may see inconsistent state (e.g., controller is null but the underlying camera resource is still being released). For an example app this is low-risk, but ideally disposeCamera should be async and await the dispose.

Proposed fix
-  void disposeCamera() {
-    unawaited(_cameraController?.dispose());
+  Future<void> disposeCamera() async {
+    await _cameraController?.dispose();
     _cameraController = null;
     _isCameraInitialized = false;
     notifyListeners();
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/features/vision/vlm_view_model.dart`
around lines 87 - 92, disposeCamera currently calls _cameraController?.dispose()
without awaiting and then immediately nulls _cameraController and clears
_isCameraInitialized, which can leave observers seeing inconsistent state;
change disposeCamera to be async, await _cameraController?.dispose() (remove
unawaited), then set _cameraController = null and _isCameraInitialized = false,
and finally call notifyListeners(); update any callers to await disposeCamera()
where needed.

266-309: Auto-streaming takePicture() accumulates temp files without cleanup.

Each 2.5s cycle calls takePicture() which writes to the device's temp directory. Over extended use of live mode, this accumulates files. Consider deleting the XFile after processing completes.

Proposed cleanup
       // Capture image from camera
       final xFile = await _cameraController!.takePicture();
+      final capturedPath = xFile.path;

       // Create VLMImage from file path
       final image = sdk.VLMImage.filePath(xFile.path);
       // ... process ...

+      // Clean up temp file
+      try {
+        await File(capturedPath).delete();
+      } catch (_) {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/features/vision/vlm_view_model.dart`
around lines 266 - 309, In _describeCurrentFrameForAutoStream, the XFile
returned by _cameraController!.takePicture() is not deleted causing temp-file
accumulation; after processing (in the finally block) delete the captured file
(use xFile.delete() or File(xFile.path).delete() awaited) and swallow/log any
deletion errors without setting _error so auto-stream behavior remains
non-critical; ensure you only attempt deletion if xFile is non-null and do the
delete inside the existing finally block so cleanup always runs even on
exceptions.
sdk/runanywhere-flutter/packages/runanywhere/lib/public/runanywhere.dart (2)

1559-1615: Consider using switch expression for exhaustive pattern matching on sealed class.

The if/else chain handles all VLMImageFormat subtypes, but the final else at Line 1603 is technically unreachable since VLMImageFormat is sealed with only three variants. A switch expression would make this exhaustive at compile time.

Proposed refactor
-    if (format is VLMImageFormatFilePath) {
-      bridgeResult = await DartBridge.vlm.processImage(
-        imageFormat: RacVlmImageFormat.filePath,
-        filePath: format.path,
-        ...
-      );
-    } else if (format is VLMImageFormatRgbPixels) {
-      ...
-    } else if (format is VLMImageFormatBase64) {
-      ...
-    } else {
-      throw SDKError.vlmInvalidImage('Unsupported image format');
-    }
+    bridgeResult = switch (format) {
+      VLMImageFormatFilePath(:final path) => await DartBridge.vlm.processImage(
+        imageFormat: RacVlmImageFormat.filePath,
+        filePath: path,
+        prompt: prompt,
+        maxTokens: maxTokens,
+        temperature: temperature,
+        topP: topP,
+        useGpu: useGpu,
+      ),
+      VLMImageFormatRgbPixels(:final data, :final width, :final height) => await DartBridge.vlm.processImage(
+        imageFormat: RacVlmImageFormat.rgbPixels,
+        pixelData: data,
+        width: width,
+        height: height,
+        prompt: prompt,
+        maxTokens: maxTokens,
+        temperature: temperature,
+        topP: topP,
+        useGpu: useGpu,
+      ),
+      VLMImageFormatBase64(:final encoded) => await DartBridge.vlm.processImage(
+        imageFormat: RacVlmImageFormat.base64,
+        base64Data: encoded,
+        prompt: prompt,
+        maxTokens: maxTokens,
+        temperature: temperature,
+        topP: topP,
+        useGpu: useGpu,
+      ),
+    };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-flutter/packages/runanywhere/lib/public/runanywhere.dart`
around lines 1559 - 1615, The if/else chain in _processImageViaBridge that
checks image.format (VLMImageFormatFilePath, VLMImageFormatRgbPixels,
VLMImageFormatBase64) should be replaced with a switch expression on
image.format to get compiler-enforced exhaustiveness; inside each case call the
corresponding DartBridge.vlm.processImage with the correct parameters and assign
to bridgeResult, and remove the final unreachable else/SDKError.vlmInvalidImage
branch so the switch guarantees all variants are handled at compile time.

1277-1403: Broadcast StreamController can drop early tokens before the caller subscribes.

StreamController<String>.broadcast() (Line 1293) does not buffer events. The metricsFuture at Line 1339 subscribes via .toList() immediately, so it captures all tokens. However, the caller receives the VLMStreamingResult asynchronously — by the time they attach a listener to result.stream, some tokens may have already been emitted and lost from their perspective.

This matches the existing LLM generateStream pattern (Line 1867), so it's a pre-existing design choice. In practice the async gap is tiny and unlikely to cause visible token loss, but worth noting.

If this becomes an issue, switching to a single-subscription controller with a replay mechanism or ensuring the bridge stream doesn't start emitting until the caller subscribes would fix it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-flutter/packages/runanywhere/lib/public/runanywhere.dart`
around lines 1277 - 1403, processImageStream currently starts
_processImageStreamViaBridge and emits tokens before callers may subscribe to
controller.stream, risking dropped tokens; change the logic to defer
creating/subscribing to tokenStream until the returned stream is listened to by
moving the call to _processImageStreamViaBridge and the subscription into
controller.onListen (use controller.onCancel to cancel subscription and close
controller), keep collecting allTokens and telemetry as before, and ensure the
cancel closure still calls DartBridge.vlm.cancel() and unsubscribes the
subscription created in onListen.
sdk/runanywhere-react-native/packages/llamacpp/src/RunAnywhere+VLM.ts (2)

324-330: promptTokens is a rough heuristic, not actual token count.

Math.ceil(prompt.length / 4) is an approximation that could be quite inaccurate (especially for non-English text or with image tokens). This metric gets surfaced as VLMResult.promptTokens, which consumers may treat as authoritative.

Consider either documenting this as an estimate or setting it to 0 with a comment (as the Dart streaming path does on Line 1372 of runanywhere.dart).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/src/RunAnywhere`+VLM.ts around
lines 324 - 330, The VLMResult currently exposes promptTokens using a rough
heuristic (Math.ceil(prompt.length / 4)); update the construction of finalResult
in RunAnywhere+VLM.ts so promptTokens is not presented as an authoritative token
count—either set promptTokens to 0 (matching the Dart streaming path behavior)
and add a clear inline comment that accurate token counts are not available
here, or keep the field but rename/annotate it as an estimated value; modify the
finalResult assignment (symbol: VLMResult / variable: finalResult) accordingly
so consumers aren't misled by the heuristic.

448-454: uint8ArrayToBase64 may be slow for large pixel buffers.

Building a string via String.fromCharCode in a byte-by-byte loop creates O(n²) intermediate strings in some JS engines. For VLM images that can be megabytes of RGB data, this could cause noticeable latency or GC pressure.

Consider a chunked approach:

Proposed chunked implementation
 function uint8ArrayToBase64(bytes: Uint8Array): string {
-  // Use btoa with binary string conversion
-  let binaryString = '';
-  for (let i = 0; i < bytes.length; i++) {
-    binaryString += String.fromCharCode(bytes[i]);
-  }
-  return btoa(binaryString);
+  // Process in chunks to avoid stack overflow and reduce string concatenation overhead
+  const chunkSize = 8192;
+  const chunks: string[] = [];
+  for (let i = 0; i < bytes.length; i += chunkSize) {
+    const slice = bytes.subarray(i, Math.min(i + chunkSize, bytes.length));
+    chunks.push(String.fromCharCode(...slice));
+  }
+  return btoa(chunks.join(''));
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/src/RunAnywhere`+VLM.ts around
lines 448 - 454, The current uint8ArrayToBase64 builds a single string by
appending one char per byte which is O(n^2) and slow for large buffers; update
uint8ArrayToBase64 to convert the Uint8Array to a binary string in fixed-size
chunks (e.g., chunkSize ~ 0x8000/32768 bytes), for each chunk call
String.fromCharCode on the chunk (via Function.apply or equivalent safe
chunk-to-string conversion), push each chunk-string into an array, join the
array once to produce the full binary string, then call btoa on that joined
string; ensure you reference and update the function uint8ArrayToBase64 so large
VLM image buffers are processed without creating quadratic intermediate strings.
sdk/runanywhere-react-native/packages/llamacpp/cpp/bridges/VLMBridge.hpp (1)

49-57: VLMImageInput::pixel_data is a non-owning raw pointer — document lifetime requirements.

pixel_data is const uint8_t* while file_path and base64_data are std::string (owning). Callers must ensure the underlying buffer outlives the VLMImageInput. This already caused a bug in HybridRunAnywhereLlama.cpp (the static buffer issue). Consider adding a brief comment or storing std::vector<uint8_t> for owned pixel data.

Option: use owned storage
 struct VLMImageInput {
     rac_vlm_image_format_t format;
     std::string file_path;
-    const uint8_t* pixel_data = nullptr;
+    std::vector<uint8_t> pixel_data;  // Owned pixel data (RGB)
     std::string base64_data;
     uint32_t width = 0;
     uint32_t height = 0;
-    size_t data_size = 0;
 };

This eliminates the lifetime management burden on callers. The C API call sites would use .pixel_data.data() and .pixel_data.size().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/cpp/bridges/VLMBridge.hpp`
around lines 49 - 57, The VLMImageInput::pixel_data field is a non-owning const
uint8_t* but there is no documentation of its lifetime requirements which led to
a static buffer bug; update VLMBridge.hpp by either (A) adding a clear comment
on VLMImageInput::pixel_data stating callers must ensure the pointed-to buffer
outlives the VLMImageInput instance (mention HybridRunAnywhereLlama.cpp as a
cautionary example), or (B) change pixel storage to an owning
std::vector<uint8_t> (e.g., add std::vector<uint8_t> pixel_data_owned and use
pixel_data_owned.data()/size() for C API calls) and update usages that construct
VLMImageInput (including code in HybridRunAnywhereLlama.cpp) to populate the
vector so callers no longer manage lifetimes.

CURRENT_PROJECT_VERSION = "$(FLUTTER_BUILD_NUMBER)";
DEVELOPMENT_TEAM = L86FH3K93L;
ENABLE_BITCODE = NO;
ENABLE_RESOURCE_ACCESS_CAMERA = YES;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

ENABLE_RESOURCE_ACCESS_CAMERA iOS Xcode build setting

💡 Result:

ENABLE_RESOURCE_ACCESS_CAMERA is not an iOS permission setting. It’s an Xcode build setting used for macOS (and Mac Catalyst) sandboxed targets to control whether Xcode adds the camera entitlement (com.apple.security.device.camera) to the target’s entitlements when App Sandbox is enabled. (developer.apple.com)

For a normal iOS app, camera access is granted via:

  • an Info.plist purpose string: NSCameraUsageDescription, and
  • requesting authorization at runtime (via AVFoundation APIs).
    If the purpose string is missing, the app can terminate when it tries to access the camera. (developer.apple.com)

Where to change it (macOS / Mac Catalyst):

  • TargetSigning & CapabilitiesApp SandboxHardwareCamera (checkbox). (developer.apple.com)

Citations:


Remove ENABLE_RESOURCE_ACCESS_CAMERA from iOS build configurations.

ENABLE_RESOURCE_ACCESS_CAMERA is a macOS/Mac Catalyst-only build setting used to control the App Sandbox camera entitlement. It has no effect on iOS apps and should not be present in the Runner's iOS build configurations (Profile, Debug, Release at lines 475, 660, 685). iOS camera access is controlled exclusively via NSCameraUsageDescription in Info.plist and runtime permission requests—there is no build-time equivalent for iOS. Remove this setting from all iOS configurations to keep the build settings clean and prevent confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/ios/Runner.xcodeproj/project.pbxproj` at line
475, Remove the macOS-only build setting ENABLE_RESOURCE_ACCESS_CAMERA from the
Runner iOS build configurations (Profile, Debug, Release) in the Xcode project
so it is not present in iOS targets; locate occurrences of
ENABLE_RESOURCE_ACCESS_CAMERA in the project.pbxproj and delete those entries
for the iOS configurations, and ensure camera access is instead
documented/handled via NSCameraUsageDescription in Info.plist and runtime
permission requests.

Comment on lines +218 to +229
// SmolVLM 500M - Ultra-lightweight VLM for mobile (~500MB total)
RunAnywhere.registerModel(
id: 'smolvlm-500m-instruct-q8_0',
name: 'SmolVLM 500M Instruct',
url: Uri.parse(
'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-vlm-models-v1/smolvlm-500m-instruct-q8_0.tar.gz'),
framework: InferenceFramework.llamaCpp,
modality: ModelCategory.multimodal,
artifactType: ModelArtifactType.tarGzArchive(
structure: ArchiveStructure.directoryBased,
),
memoryRequirement: 600000000,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inline comment says ~500MB total but memoryRequirement is 600 MB — correct the comment.

The SmolVLM-500M-Instruct Q8_0 GGUF weighs ~437 MB and the mmproj file is ~109 MB, for a combined artifact size of ~546 MB. The memoryRequirement of 600,000,000 (600 MB) is a reasonable runtime figure, but the adjacent inline comment ~500MB total is understated and inconsistent with it.

📝 Suggested comment correction
-    // SmolVLM 500M - Ultra-lightweight VLM for mobile (~500MB total)
+    // SmolVLM 500M - Ultra-lightweight VLM for mobile (~600MB total: ~437MB model + ~109MB mmproj + runtime overhead)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/app/runanywhere_ai_app.dart` around lines
218 - 229, Update the inline comment above the RunAnywhere.registerModel call
for id 'smolvlm-500m-instruct-q8_0' to reflect the actual combined artifact size
(~546 MB) instead of "~500MB total"; mention the breakdown (≈437 MB GGUF + ≈109
MB mmproj) and keep the memoryRequirement: 600000000 as the runtime figure. This
change should be made adjacent to the RunAnywhere.registerModel block so the
comment and the memoryRequirement value are consistent.

Comment on lines +42 to +47
# Camera access for VLM
camera: ^0.11.0
# Image picker for gallery photos
image_picker: ^1.0.0
# Image manipulation (BGRA to RGB pixel conversion)
image: ^4.0.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

latest stable version camera package pub.dev 2026

💡 Result:

The latest stable version of the Flutter camera package on pub.dev (as of February 17, 2026) is camera 0.11.3+1. [1]


🌐 Web query:

latest stable version image_picker package pub.dev 2026

💡 Result:

The latest stable version of Flutter’s image_picker package on pub.dev (as of February 17, 2026) is 1.2.1. [1][2]

Use:

dependencies:
  image_picker: ^1.2.1

Sources: [1] [2]


🌐 Web query:

latest stable version image package pub.dev 2026

💡 Result:

The latest stable version of the Dart image package on pub.dev (as of Feb 17, 2026) is 4.7.2 (published ~58 days ago). [1]

pubspec.yaml:

dependencies:
  image: ^4.7.2

🌐 Web query:

Flutter camera NSCameraUsageDescription Info.plist iOS requirements

💡 Result:

On iOS, if your Flutter app (or any bundled dependency) accesses the camera / microphone / photo library, you must add the corresponding privacy usage description keys to ios/Runner/Info.plist (with non-empty strings). If you don’t, the app can exit/crash at runtime when the API is accessed. [1]

Minimum required for Flutter camera plugin

Add (at least) these keys (the plugin requires iOS 10+): [2]

<key>NSCameraUsageDescription</key>
<string>We use the camera to take photos/videos.</string>

<key>NSMicrophoneUsageDescription</key>
<string>We use the microphone when recording video with audio.</string>

If you only take photos (no audio/video)

If you disable audio in CameraController(enableAudio: false), you can avoid needing the microphone permission string; otherwise iOS will require it when audio is enabled/used. [2][3]

If you save to (or read from) the user’s Photos library

Apple’s guidance: [4][1]

  • Only adding/saving images: use NSPhotoLibraryAddUsageDescription
  • Reading (or read/write) the library: use NSPhotoLibraryUsageDescription

Example:

<key>NSPhotoLibraryAddUsageDescription</key>
<string>We save your captured photos to your library.</string>

Sources: Apple Info.plist privacy keys reference [1], Apple Photos privacy guidance [4], Flutter camera plugin iOS notes [2], Flutter/iOS camera audio toggle note [3].


🌐 Web query:

Flutter image_picker NSPhotoLibraryUsageDescription iOS requirements

💡 Result:

For Flutter’s image_picker on iOS, you must add privacy “purpose strings” to ios/Runner/Info.plist (inside the <dict>). The plugin’s iOS setup explicitly requires:

  • NSPhotoLibraryUsageDescription (Photo Library) — include it even if you pass requestFullMetadata: false; App Store processing can still reject builds that reference photo APIs without this key. [1][2]
  • NSCameraUsageDescription (Camera) — required if you use ImageSource.camera. [1]
  • NSMicrophoneUsageDescription (Microphone) — required only if you record video with audio. (Commonly added alongside camera.) [3]

Example:

<key>NSPhotoLibraryUsageDescription</key>
<string>We need access to your photo library to let you choose a photo.</string>
<key>NSCameraUsageDescription</key>
<string>We need camera access to let you take a photo.</string>
<key>NSMicrophoneUsageDescription</key>
<string>We need microphone access to record video audio.</string>

Notes:

  • On iOS 14+, image_picker uses PHPicker for library picking. [1]
  • There’s a known iOS 14+ Simulator limitation where picking HEIC images may not work; test on a real device or use non‑HEIC images. [1]

Sources: [1] image_picker iOS setup on pub.dev, [2] Flutter issue about App Store requiring NSPhotoLibraryUsageDescription, [3] Stack Overflow guidance on adding iOS privacy keys.


Update all three package versions to current stable releases.

The pinned versions are significantly outdated:

  • camera: update from ^0.11.0 to ^0.11.3
  • image_picker: update from ^1.0.0 to ^1.2.1
  • image: update from ^4.0.0 to ^4.7.2

These outdated lower bounds may cause you to miss important security patches and bug fixes. The caret constraints will floor resolution at the specified versions, so updating them ensures your app benefits from recent patches.

Regarding iOS integration: the required Info.plist entries are accurate but incomplete. For completeness:

  • camera plugin requires NSCameraUsageDescription and NSMicrophoneUsageDescription (the latter can be omitted if enableAudio: false is used)
  • image_picker requires NSPhotoLibraryUsageDescription (even with requestFullMetadata: false) and NSCameraUsageDescription if the app uses ImageSource.camera; NSMicrophoneUsageDescription is needed only if recording video with audio

Verify that all required keys are present in ios/Runner/Info.plist with appropriate non-empty strings, as missing keys will cause runtime crashes when the respective APIs are accessed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/pubspec.yaml` around lines 42 - 47, Update the
pinned Flutter package versions in pubspec.yaml by bumping camera from ^0.11.0
to ^0.11.3, image_picker from ^1.0.0 to ^1.2.1, and image from ^4.0.0 to ^4.7.2;
then run flutter pub get to lock the new versions. Also verify iOS Info.plist
contains the required non-empty usage keys for these plugins—add
NSCameraUsageDescription and NSMicrophoneUsageDescription (omit microphone only
if you set enableAudio: false), add NSPhotoLibraryUsageDescription for
image_picker and NSCameraUsageDescription if using ImageSource.camera—so the app
won’t crash when accessing camera, microphone, or photos.


<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.CAMERA" />
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Android photo library permissions are missing.

The iOS side adds NSPhotoLibraryUsageDescription and includes PhotoLibrary in setup_permissions, indicating photo library access is intentional. Android requires explicit manifest declarations for media read access (READ_MEDIA_IMAGES/READ_MEDIA_VIDEO on API 33+, or READ_EXTERNAL_STORAGE on older versions). Without these, any photo library permission request will silently return denied on Android.

🐛 Proposed fix
     <uses-permission android:name="android.permission.INTERNET" />
     <uses-permission android:name="android.permission.RECORD_AUDIO" />
     <uses-permission android:name="android.permission.CAMERA" />
+    <!-- For photo library access on Android 13+ -->
+    <uses-permission android:name="android.permission.READ_MEDIA_IMAGES" />
+    <uses-permission android:name="android.permission.READ_MEDIA_VIDEO" />
+    <!-- For photo library access on Android < 13 -->
+    <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"
+        android:maxSdkVersion="32" />
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.CAMERA" />
<!-- For photo library access on Android 13+ -->
<uses-permission android:name="android.permission.READ_MEDIA_IMAGES" />
<uses-permission android:name="android.permission.READ_MEDIA_VIDEO" />
<!-- For photo library access on Android < 13 -->
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"
android:maxSdkVersion="32" />
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/react-native/RunAnywhereAI/android/app/src/main/AndroidManifest.xml`
at line 5, The AndroidManifest currently only requests CAMERA but lacks
photo-library permissions, causing Android photo access requests to be denied;
update the manifest by adding the appropriate read permissions: include
READ_EXTERNAL_STORAGE for older Android versions and READ_MEDIA_IMAGES (and
READ_MEDIA_VIDEO if video access is needed) for API 33+, alongside the existing
CAMERA <uses-permission> entries so runtime requests succeed (add both
permission names as <uses-permission> entries so the platform will use the
correct one per OS version).

Comment on lines +36 to +41
<key>NSCameraUsageDescription</key>
<string>Vision AI needs camera access to describe what you see</string>
<key>NSMicrophoneUsageDescription</key>
<string>RunAnywhere needs access to your microphone for speech-to-text transcription.</string>
<key>NSPhotoLibraryUsageDescription</key>
<string>Vision AI needs photo library access to describe images</string>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent app name in permission descriptions.

The new descriptions reference "Vision AI" while the existing NSMicrophoneUsageDescription references "RunAnywhere". Users will see different app names across permission prompts.

✏️ Proposed fix
-	<string>Vision AI needs camera access to describe what you see</string>
+	<string>RunAnywhere needs camera access to describe what you see</string>
 	...
-	<string>Vision AI needs photo library access to describe images</string>
+	<string>RunAnywhere needs photo library access to describe images</string>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<key>NSCameraUsageDescription</key>
<string>Vision AI needs camera access to describe what you see</string>
<key>NSMicrophoneUsageDescription</key>
<string>RunAnywhere needs access to your microphone for speech-to-text transcription.</string>
<key>NSPhotoLibraryUsageDescription</key>
<string>Vision AI needs photo library access to describe images</string>
<key>NSCameraUsageDescription</key>
<string>RunAnywhere needs camera access to describe what you see</string>
<key>NSMicrophoneUsageDescription</key>
<string>RunAnywhere needs access to your microphone for speech-to-text transcription.</string>
<key>NSPhotoLibraryUsageDescription</key>
<string>RunAnywhere needs photo library access to describe images</string>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/react-native/RunAnywhereAI/ios/RunAnywhereAI/Info.plist` around
lines 36 - 41, Permission strings use inconsistent app names ("Vision AI" and
"RunAnywhere"); make them consistent by using the actual app display name
"RunAnywhereAI" in all plist descriptions. Update the values for
NSCameraUsageDescription, NSMicrophoneUsageDescription, and
NSPhotoLibraryUsageDescription so each string references "RunAnywhereAI" (e.g.,
"RunAnywhereAI needs camera access..." and "RunAnywhereAI needs access to your
microphone...") to ensure uniform permission prompts.

Comment on lines +96 to +101
void VLMBridge::cancel() {
cancellationRequested_ = true;
if (handle_) {
rac_vlm_component_cancel(handle_);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the header file for VLMBridge
find . -name "VLMBridge.h" -type f

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 56


🏁 Script executed:

#!/bin/bash
# Search for cancellationRequested_ to see its declaration and all usages
rg -n 'cancellationRequested_' --type=cpp -B2 -A2

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 6007


🏁 Script executed:

#!/bin/bash
# Read the VLMBridge.cpp file to see the cancel() method and tokenCallback around lines 96-101 and 207-213
wc -l sdk/runanywhere-react-native/packages/llamacpp/cpp/bridges/VLMBridge.cpp

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 150


🏁 Script executed:

#!/bin/bash
# Read the tokenCallback implementation around lines 207-230
sed -n '207,230p' sdk/runanywhere-react-native/packages/llamacpp/cpp/bridges/VLMBridge.cpp

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 1010


🏁 Script executed:

#!/bin/bash
# Also check if StreamContext is defined and how cancellationRequested_ is accessed
rg -n 'StreamContext' --type=cpp -B2 -A5

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 17584


Use std::atomic<bool> for thread-safe cancellation flag.

cancel() sets cancellationRequested_ = true (likely from the UI/main thread), while the tokenCallback lambda reads it from the streaming/inference thread (line 211). A plain bool provides no cross-thread visibility guarantees — the streaming thread may never observe the cancellation. Update the declaration in VLMBridge.hpp to std::atomic<bool> cancellationRequested_ = false;.

Also applies to: LLMBridge.hpp, LLMBridge.cpp (lines 93-96, 166-169).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/cpp/bridges/VLMBridge.cpp`
around lines 96 - 101, The cancellation flag is not thread-safe: update the
member declaration cancellationRequested_ in VLMBridge.hpp to use
std::atomic<bool> (initialize to false) and adjust any includes (add <atomic>),
then keep VLMBridge::cancel() as-is but rely on the atomic write so the
tokenCallback lambda on the streaming/inference thread will observe the change;
apply the same change to LLMBridge.hpp (and ensure LLMBridge.cpp logic that
reads/writes cancellationRequested_ at lines referenced uses the atomic) so both
bridges use std::atomic<bool> for cross-thread visibility.

Comment on lines +131 to +150
} else if (imageFormat == 1) {
// RGB pixels format (base64-encoded from JS)
input.format = RAC_VLM_IMAGE_FORMAT_RGB_PIXELS;
// Decode base64 to raw bytes
static std::vector<uint8_t> pixelBuffer;
pixelBuffer = base64Decode(imageData);
input.pixel_data = pixelBuffer.data();
input.width = static_cast<uint32_t>(imageWidth);
input.height = static_cast<uint32_t>(imageHeight);
input.data_size = pixelBuffer.size();
} else if (imageFormat == 2) {
// Base64 format
input.format = RAC_VLM_IMAGE_FORMAT_BASE64;
input.base64_data = imageData;
} else {
throw std::runtime_error("Invalid image format: " + std::to_string(imageFormat));
}

return input;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Static pixelBuffer in buildVLMImageInput is a data race and memory hazard.

pixelBuffer is declared static, meaning all concurrent invocations share the same buffer. If two processVLMImage or processVLMImageStream calls execute concurrently (they are not guarded by modelMutex_), the second call will overwrite the buffer while the first is still using input.pixel_data. This causes data corruption.

Additionally, the VLMImageInput stores a raw pointer (input.pixel_data = pixelBuffer.data()) into this static vector. If the vector reallocates on a subsequent call, the previously returned pointer becomes dangling.

Proposed fix: return the buffer alongside the input, or store it in VLMImageInput
-// Build VLMImageInput from JS bridge parameters
-VLMImageInput buildVLMImageInput(int imageFormat, const std::string& imageData,
-                                 int imageWidth, int imageHeight) {
-  VLMImageInput input;
+// Build VLMImageInput from JS bridge parameters.
+// For RGB pixel format, decoded data is stored in `pixelStorage` to keep it alive.
+struct VLMImageWithStorage {
+  VLMImageInput input;
+  std::vector<uint8_t> pixelStorage;
+};
+
+VLMImageWithStorage buildVLMImageInput(int imageFormat, const std::string& imageData,
+                                       int imageWidth, int imageHeight) {
+  VLMImageWithStorage result;
+  auto& input = result.input;

   if (imageFormat == 0) {
     // File path format
     input.format = RAC_VLM_IMAGE_FORMAT_FILE_PATH;
     input.file_path = imageData;
   } else if (imageFormat == 1) {
     // RGB pixels format (base64-encoded from JS)
     input.format = RAC_VLM_IMAGE_FORMAT_RGB_PIXELS;
-    // Decode base64 to raw bytes
-    static std::vector<uint8_t> pixelBuffer;
-    pixelBuffer = base64Decode(imageData);
-    input.pixel_data = pixelBuffer.data();
+    result.pixelStorage = base64Decode(imageData);
+    input.pixel_data = result.pixelStorage.data();
     input.width = static_cast<uint32_t>(imageWidth);
     input.height = static_cast<uint32_t>(imageHeight);
-    input.data_size = pixelBuffer.size();
+    input.data_size = result.pixelStorage.size();
   } else if (imageFormat == 2) {
     // ...
   }
-  return input;
+  return result;
 }

Then update callers to use auto imageWithStorage = buildVLMImageInput(...) and pass imageWithStorage.input to the bridge.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-react-native/packages/llamacpp/cpp/HybridRunAnywhereLlama.cpp`
around lines 131 - 150, The buildVLMImageInput function currently uses a static
std::vector<uint8_t> pixelBuffer and assigns input.pixel_data to its data(),
causing data races and dangling pointers when concurrent calls (e.g.,
processVLMImage or processVLMImageStream) occur; fix by making the image bytes
owned with the returned object instead of static storage: change
buildVLMImageInput to either (A) embed a std::vector<uint8_t> member inside
VLMImageInput (e.g., VLMImageInput::pixel_buffer) and set input.pixel_data =
pixel_buffer.data() so the input owns the storage, or (B) change its return to a
small struct like ImageWithStorage { VLMImageInput input; std::vector<uint8_t>
storage; } and set input.pixel_data = storage.data(); then update callers
(processVLMImage/processVLMImageStream and any bridge invocations) to hold the
returned object for the lifetime of the call so no dangling pointer or shared
static buffer is used.

Comment on lines +59 to +69
export {
registerVLMBackend,
loadVLMModel,
isVLMModelLoaded,
unloadVLMModel,
describeImage,
askAboutImage,
processImage,
processImageStream,
cancelVLMGeneration,
} from './RunAnywhere+VLM';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

VLM types are not re-exported, leaving consumers unable to type inputs/outputs.

The exported functions (describeImage, askAboutImage, processImage, processImageStream) accept VLMImage and return VLMResult / VLMStreamingResult. Without exporting these types, TypeScript consumers must resort to any, Parameters<typeof describeImage>[0], or imports from internal module paths.

🐛 Proposed fix — add VLM type re-exports
 export {
   registerVLMBackend,
   loadVLMModel,
   isVLMModelLoaded,
   unloadVLMModel,
   describeImage,
   askAboutImage,
   processImage,
   processImageStream,
   cancelVLMGeneration,
 } from './RunAnywhere+VLM';
+export type {
+  VLMImage,
+  VLMResult,
+  VLMStreamingResult,
+  VLMErrorCode,
+} from './RunAnywhere+VLM';
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export {
registerVLMBackend,
loadVLMModel,
isVLMModelLoaded,
unloadVLMModel,
describeImage,
askAboutImage,
processImage,
processImageStream,
cancelVLMGeneration,
} from './RunAnywhere+VLM';
export {
registerVLMBackend,
loadVLMModel,
isVLMModelLoaded,
unloadVLMModel,
describeImage,
askAboutImage,
processImage,
processImageStream,
cancelVLMGeneration,
} from './RunAnywhere+VLM';
export type {
VLMImage,
VLMResult,
VLMStreamingResult,
VLMErrorCode,
} from './RunAnywhere+VLM';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/src/index.ts` around lines 59
- 69, The package re-exports VLM functions but not their types, forcing
consumers to use any or internal imports; update the export block in index.ts to
also re-export the VLM types (e.g., VLMImage, VLMResult, VLMStreamingResult and
any other related types/interfaces) from './RunAnywhere+VLM' so consumers can
import the function types directly (match the existing exported function names:
describeImage, askAboutImage, processImage, processImageStream, etc.). Ensure
the type names exactly match those declared in RunAnywhere+VLM so TypeScript
consumers can type inputs/outputs without reaching into internal paths.

Comment on lines +203 to +214
} catch {
if (resultJson.includes('error')) {
throw new Error(resultJson);
}
return {
text: resultJson,
promptTokens: 0,
completionTokens: 0,
totalTimeMs: 0,
tokensPerSecond: 0,
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fragile error detection in JSON parse fallback.

When JSON.parse fails, the code checks resultJson.includes('error') to decide whether to throw. This is too broad — any successful result text containing the word "error" (e.g., "Describe any error messages visible") would incorrectly throw an exception.

Proposed fix: check for a JSON error pattern instead
   } catch {
-    if (resultJson.includes('error')) {
+    // If the raw string looks like a JSON error object, throw it
+    if (resultJson.startsWith('{"error"')) {
       throw new Error(resultJson);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} catch {
if (resultJson.includes('error')) {
throw new Error(resultJson);
}
return {
text: resultJson,
promptTokens: 0,
completionTokens: 0,
totalTimeMs: 0,
tokensPerSecond: 0,
};
}
} catch {
// If the raw string looks like a JSON error object, throw it
if (resultJson.startsWith('{"error"')) {
throw new Error(resultJson);
}
return {
text: resultJson,
promptTokens: 0,
completionTokens: 0,
totalTimeMs: 0,
tokensPerSecond: 0,
};
}

Comment on lines +187 to +210
/**
* Process an image with VLM (non-streaming)
* @param imageFormat 0=filePath, 1=rgbPixels, 2=base64
* @param imageData The image data (file path string, or base64 string)
* @param imageWidth Width in pixels (for rgbPixels format, 0 otherwise)
* @param imageHeight Height in pixels (for rgbPixels format, 0 otherwise)
* @param prompt Text prompt
* @param optionsJson Optional JSON with max_tokens, temperature, top_p
* @returns JSON string with result: text, promptTokens, completionTokens, totalTimeMs, tokensPerSecond
*/
processVLMImage(imageFormat: number, imageData: string, imageWidth: number, imageHeight: number, prompt: string, optionsJson?: string): Promise<string>;

/**
* Process an image with streaming
* @param imageFormat Image format enum value
* @param imageData Image data string
* @param imageWidth Width (for rgbPixels)
* @param imageHeight Height (for rgbPixels)
* @param prompt Text prompt
* @param optionsJson Generation options JSON
* @param callback Token callback (token, isComplete)
* @returns Full generated text
*/
processVLMImageStream(imageFormat: number, imageData: string, imageWidth: number, imageHeight: number, prompt: string, optionsJson: string, callback: (token: string, isComplete: boolean) => void): Promise<string>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

imageData: string is lossy for rgbPixels format (format=1).

For file paths (format=0) and base64 (format=2), string is natural. For raw RGB pixel data (format=1), binary data passed as a JS string will go through UTF-16 encoding and may corrupt non-text bytes. If rgbPixels is an expected input path from JS (not just C++-internal), consider using an ArrayBuffer-backed type or document that format=1 is only supported from native callers.

If rgbPixels is unused from the JS side, a doc comment clarifying this would prevent misuse.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-react-native/packages/llamacpp/src/specs/RunAnywhereLlama.nitro.ts`
around lines 187 - 210, The imageData parameter is unsafe as plain string for
rgbPixels (format=1); update both processVLMImage and processVLMImageStream
signatures to accept binary (e.g., imageData: string | ArrayBuffer | Uint8Array)
and update their JSDoc to state that when imageFormat==1 callers must pass an
ArrayBuffer/Uint8Array (and native bridging code must handle that type), or
alternatively explicitly document that format=1 is native-only; adjust any
downstream marshalling code to correctly handle ArrayBuffer/Uint8Array for rgb
pixel data.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sdk/runanywhere-flutter/packages/runanywhere_llamacpp/lib/llamacpp.dart (1)

275-280: ⚠️ Potential issue | 🟠 Major

dispose() does not reset _isVlmRegistered, breaking re-registration after dispose.

After calling dispose() followed by register(), the _registerVlm() method will short-circuit at the _isVlmRegistered guard (line 160) and never actually re-register the VLM backend.

Proposed fix
 static void dispose() {
   _bindings = null;
   _registeredModels.clear();
   _isRegistered = false;
+  _isVlmRegistered = false;
   _logger.info('LlamaCpp disposed');
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-flutter/packages/runanywhere_llamacpp/lib/llamacpp.dart`
around lines 275 - 280, The dispose() method currently clears _bindings,
_registeredModels and _isRegistered but fails to reset the _isVlmRegistered
flag, causing _registerVlm() to short-circuit on subsequent register() calls;
update dispose() to set _isVlmRegistered = false (in addition to clearing
_bindings, _registeredModels and _isRegistered) so that _registerVlm() can run
again after dispose() and register() are called.
🧹 Nitpick comments (19)
sdk/runanywhere-react-native/packages/llamacpp/src/native/NativeRunAnywhereLlama.ts (2)

19-21: requireNativeLlamaModule should be marked @deprecated.

The PR explicitly documents that calling requireNativeLlamaModule() directly creates throwaway HybridObject instances whose C++ destructors can tear down shared bridge state. However, the function remains a public export with no deprecation signal. Future callers have no indication they should prefer getNativeLlamaModule() instead.

♻️ Proposed fix
 /**
  * Get the native RunAnywhereLlama Hybrid Object
+ * `@deprecated` Use {`@link` getNativeLlamaModule} instead. Calling this directly
+ * creates a throwaway HybridObject whose C++ destructor can tear down shared
+ * bridge state. Prefer the singleton getter.
  */
 export function requireNativeLlamaModule(): NativeRunAnywhereLlamaModule {
   return NitroModules.createHybridObject<RunAnywhereLlama>('RunAnywhereLlama');
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-react-native/packages/llamacpp/src/native/NativeRunAnywhereLlama.ts`
around lines 19 - 21, Mark the exported function requireNativeLlamaModule as
deprecated by adding a deprecation JSDoc comment (e.g. /** `@deprecated` Use
getNativeLlamaModule() instead. */) immediately above its declaration and update
the message to recommend getNativeLlamaModule; ensure the text mentions the
reason briefly (creating throwaway HybridObject instances) so callers see the
risk, and optionally add a TypeScript `@deprecated` tag so IDEs surface the
warning when requireNativeLlamaModule is used.

28-35: isNativeLlamaModuleAvailable silently initializes the singleton as a side effect.

Delegating to getNativeLlamaModule() means a successful availability check eagerly sets _nativeModule. A predicate named isAvailable is expected to be a read-only observation; callers who use it as a guard (e.g., "check before use") will unknowingly warm up the singleton. At minimum, document this in the JSDoc. Alternatively, fast-path on an already-initialized singleton to keep the observable behaviour identical but surface intent clearly:

♻️ Proposed fix
 /**
  * Check if the native Llama module is available.
  * Uses the singleton getter to avoid creating throwaway HybridObject instances
  * whose C++ destructors could tear down shared bridge state.
+ * Note: a successful check initializes the singleton as a side effect.
  */
 export function isNativeLlamaModuleAvailable(): boolean {
+  // Fast-path: already initialized
+  if (_nativeModule !== undefined) return true;
   try {
     getNativeLlamaModule();
     return true;
   } catch {
     return false;
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-react-native/packages/llamacpp/src/native/NativeRunAnywhereLlama.ts`
around lines 28 - 35, isNativeLlamaModuleAvailable currently calls
getNativeLlamaModule which eagerly initializes and assigns the singleton
_nativeModule as a side effect; change it to first fast-path-check the existing
_nativeModule and return true if present, and if not present, probe for the
native module without mutating _nativeModule (e.g., add a non-mutating probe
helper or call the underlying native detection logic rather than
getNativeLlamaModule) so the availability predicate does not warm the singleton;
update isNativeLlamaModuleAvailable to reference _nativeModule and use the new
probe (or low-level NativeModules check) instead of directly invoking
getNativeLlamaModule.
examples/flutter/RunAnywhereAI/ios/Runner/Info.plist (1)

29-30: LGTM — NSCameraUsageDescription correctly added.

The usage description is present and App Store review will accept it. Minor note: "vision language model features" is a bit developer-facing for an end-user permission dialog; consider something like "RunAnywhere AI needs camera access to analyze images." for better UX in a production app (fine as-is for an example).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/ios/Runner/Info.plist` around lines 29 - 30,
Update the user-facing camera permission text in Info.plist by replacing the
NSCameraUsageDescription string value so it reads a clearer end-user message
(e.g., "RunAnywhere AI needs camera access to analyze images."); locate the
NSCameraUsageDescription entry in the iOS Runner Info.plist and change the
existing developer-facing phrase ("vision language model features to analyze
images") to the suggested simpler phrasing to improve UX for App Store
permission dialogs.
sdk/runanywhere-flutter/packages/runanywhere/lib/foundation/error_types/sdk_error.dart (1)

664-669: vlmModelLoadFailed silently drops the underlying error.

Unlike modelLoadFailed(String modelId, Object? error) which stores underlyingError on the SDKError, the VLM variant accepts only a pre-formatted String message, discarding the root-cause exception. This loses debugging context for load failures.

♻️ Proposed fix
-  static SDKError vlmModelLoadFailed(String message) {
+  static SDKError vlmModelLoadFailed(String message, [Object? error]) {
     return SDKError(
       'VLM model load failed: $message',
       SDKErrorType.vlmModelLoadFailed,
+      underlyingError: error,
     );
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-flutter/packages/runanywhere/lib/foundation/error_types/sdk_error.dart`
around lines 664 - 669, The vlmModelLoadFailed factory currently takes only a
String and discards the root-cause; change vlmModelLoadFailed to accept an
additional Object? underlyingError (or similar) and forward it into the returned
SDKError so the underlyingError field is populated (mirror how
modelLoadFailed(String modelId, Object? error) does); update the method
signature and the SDKError constructor call to pass the underlyingError and keep
the message and SDKErrorType.vlmModelLoadFailed unchanged.
sdk/runanywhere-flutter/packages/runanywhere/lib/public/extensions/runanywhere_frameworks.dart (1)

38-84: Inconsistent indentation inside getFrameworks — will fail dart format.

Lines 39–82 (new code) use 2-space method-body indentation while lines 86–96 (existing code) and every other method in the file use 4-space. Line 84 also has trailing whitespace. Running dart format on this file would normalise everything to 4-space, but as-is the mixed indentation makes the method visually discontinuous.

♻️ Proposed fix (align to 4-space)
-  static Future<List<InferenceFramework>> getFrameworks(
-    SDKComponent capability) async {
-  final frameworks = <InferenceFramework>{};
-
-  // Map capability to model categories
-  final Set<ModelCategory> relevantCategories;
-
-  switch (capability) {
-    case SDKComponent.llm:
-      relevantCategories = {
-        ModelCategory.language,
-        ModelCategory.multimodal
-      };
-      break;
-    // ... remaining cases ...
-  }
-
-    
+  static Future<List<InferenceFramework>> getFrameworks(
+      SDKComponent capability) async {
+    final frameworks = <InferenceFramework>{};
+
+    // Map capability to model categories
+    final Set<ModelCategory> relevantCategories;
+
+    switch (capability) {
+      case SDKComponent.llm:
+        relevantCategories = {
+          ModelCategory.language,
+          ModelCategory.multimodal,
+        };
+        break;
+      // ... remaining cases at 6-space body ...
+    }
+
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-flutter/packages/runanywhere/lib/public/extensions/runanywhere_frameworks.dart`
around lines 38 - 84, The getFrameworks method has mixed 2-space indentation and
a trailing whitespace which breaks dart format; re-indent the entire method body
to the project's 4-space style (including all switch cases and the final closing
brace) and remove the trailing whitespace (around the end of the switch/blank
line) so the method aligns with other functions using SDKComponent and
ModelCategory and passes dart format.
sdk/runanywhere-flutter/packages/runanywhere_llamacpp/lib/native/llamacpp_bindings.dart (1)

138-139: No isVlmAvailable getter to match the existing isAvailable pattern.

isAvailable lets callers guard on LLM availability; there's no equivalent for VLM. Without it, callers must call registerVlm(), inspect the return code, and infer support — asymmetric with the existing API surface.

♻️ Proposed addition
   /// Check if bindings are available.
   bool get isAvailable => _register != null;
+
+  /// Check if VLM bindings are available.
+  bool get isVlmAvailable => _registerVlm != null;

Also applies to: 166-179

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-flutter/packages/runanywhere_llamacpp/lib/native/llamacpp_bindings.dart`
around lines 138 - 139, Add a symmetric VLM availability getter to match
isAvailable: implement "bool get isVlmAvailable => _registerVlm != null;"
(mirror the existing isAvailable which checks _register) and expose it alongside
registerVlm so callers can guard on VLM support; update the doc comment above
the new getter to match the style of the existing /// Check if bindings are
available. and place it near the registerVlm/_registerVlm definitions to keep
the API consistent.
examples/flutter/RunAnywhereAI/lib/app/content_view.dart (1)

26-34: 7 bottom tabs may crowd the navigation bar on smaller devices.

This is a demo app so it's acceptable, but be aware that Material Design recommends 3–5 bottom destinations. On narrow screens, the labels may truncate. Consider grouping related features (e.g., Transcribe/Speak/Voice under a single "Audio" hub) if this becomes an issue.

Also applies to: 52-62

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/app/content_view.dart` around lines 26 -
34, The bottom navigation currently defines seven pages in the _pages list
(ChatInterfaceView, VisionHubView, SpeechToTextView, TextToSpeechView,
VoiceAssistantView, ToolsView, CombinedSettingsView) which may crowd small
screens; update the UI to condense audio-related pages into a single hub (e.g.,
create an AudioHubView that groups SpeechToTextView, TextToSpeechView, and
VoiceAssistantView) or implement responsive logic that swaps multiple tabs for a
single "Audio" tab on narrow screens, then replace the three individual entries
in _pages with the new AudioHubView (or a conditional selection) so the bottom
navigation shows 3–5 destinations on small devices.
sdk/runanywhere-react-native/packages/llamacpp/src/LlamaCppProvider.ts (1)

166-169: Silent error swallowing in autoRegister.

The catch(() => {}) swallows all errors silently. This is pre-existing code but worth noting — if registration fails during auto-register, there's no diagnostic trail. The register() method itself logs errors, so this is mitigated, but a brief catch log would improve debuggability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/src/LlamaCppProvider.ts`
around lines 166 - 169, autoRegister currently swallows all errors via catch(()
=> {}) which hides failures; update the catch on LlamaCppProvider.register()
inside autoRegister to log the caught error (e.g., console.warn or a module
logger) with a short message referencing auto-registration and include the error
object so failures are visible; keep the existing register() call and its
internal logging but ensure autoRegister's catch does not silently discard the
error.
examples/flutter/RunAnywhereAI/lib/core/services/permission_service.dart (1)

6-10: Class docstring omits camera permissions.

📝 Suggested update
 /// PermissionService - Centralized permission handling for the app
 ///
-/// Handles microphone and speech recognition permissions with proper
-/// user guidance for denied/permanently denied states.
+/// Handles microphone, speech recognition, and camera permissions with
+/// proper user guidance for denied/permanently denied states.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/core/services/permission_service.dart`
around lines 6 - 10, Update the PermissionService class docstring to include
camera permissions alongside microphone and speech recognition; locate the class
declaration PermissionService and expand its top comment to mention that this
service handles microphone, speech recognition, and camera permissions and
provides guidance for denied/permanently denied states so the documentation
accurately reflects the implemented responsibilities.
examples/flutter/RunAnywhereAI/lib/features/vision/vision_hub_view.dart (1)

41-96: Hardcoded magic numbers and raw TextStyle break design-system consistency.

Two nits in the list tiles:

  1. width: 44, height: 44, and BorderRadius.circular(10) repeat across both tiles — these should come from AppSpacing constants to stay consistent with the rest of the app.
  2. The disabled "Image Generation" text (lines 84–93) constructs TextStyle(...) directly. The rest of the file uses AppTypography.caption(context), AppTypography.subheadline(context), etc.
♻️ Suggested approach (Vision Chat tile as example)
-              width: 44,
-              height: 44,
-              decoration: BoxDecoration(
-                color: AppColors.primaryPurple,
-                borderRadius: BorderRadius.circular(10),
-              ),
+              width: AppSpacing.iconLarge,   // or whichever constant maps to 44
+              height: AppSpacing.iconLarge,
+              decoration: BoxDecoration(
+                color: AppColors.primaryPurple,
+                borderRadius: BorderRadius.circular(AppSpacing.cornerRadiusMedium),
+              ),
-            title: Text(
-              'Image Generation',
-              style: TextStyle(
-                color: AppColors.textSecondary(context).withValues(alpha: 0.5),
-              ),
-            ),
-            subtitle: Text(
-              'Create images with Stable Diffusion',
-              style: TextStyle(
-                color: AppColors.textSecondary(context).withValues(alpha: 0.5),
-              ),
-            ),
+            title: Text(
+              'Image Generation',
+              style: AppTypography.subheadline(context).copyWith(
+                color: AppColors.textSecondary(context).withValues(alpha: 0.5),
+              ),
+            ),
+            subtitle: Text(
+              'Create images with Stable Diffusion',
+              style: AppTypography.caption(context).copyWith(
+                color: AppColors.textSecondary(context).withValues(alpha: 0.5),
+              ),
+            ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/features/vision/vision_hub_view.dart`
around lines 41 - 96, Replace hardcoded sizes and raw TextStyle usage in the
vision hub tiles: swap the repeated width: 44, height: 44, and
BorderRadius.circular(10) with the app spacing constants (use AppSpacing.* for
size and radius) for the leading Container in both ListTile instances, and
replace the direct TextStyle(...) usages in the disabled "Image Generation" tile
with the appropriate AppTypography helpers (e.g.,
AppTypography.subheadline(context) and AppTypography.caption(context))
.copyWith(color: AppColors.textSecondary(context).withOpacity(0.5)) to preserve
the design system while keeping the faded color via opacity; keep color
adjustments using AppColors and withOpacity/withAlpha rather than .withValues.
examples/flutter/RunAnywhereAI/lib/features/vision/vlm_camera_view.dart (1)

472-488: Shadowed context parameter in builder may cause confusion.

The builder: (context) parameter shadows the widget's BuildContext. You correctly use this.context to refer to the State's context, but this is easy to misread. Renaming the builder parameter (e.g., sheetContext) would improve clarity.

Optional rename
-    await showModalBottomSheet<void>(
-      context: context,
-      isScrollControlled: true,
-      backgroundColor: Colors.transparent,
-      builder: (context) => ModelSelectionSheet(
+    await showModalBottomSheet<void>(
+      context: context,
+      isScrollControlled: true,
+      backgroundColor: Colors.transparent,
+      builder: (_) => ModelSelectionSheet(
         context: ModelSelectionContext.vlm,
         onModelSelected: (model) async {
-          await _viewModel.onModelSelected(model.id, model.name, this.context);
+          await _viewModel.onModelSelected(model.id, model.name, context);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/features/vision/vlm_camera_view.dart`
around lines 472 - 488, The builder callback in _onModelButtonTap uses a
parameter named context which shadows the State's BuildContext (this.context);
rename the builder parameter to something like sheetContext in the
ModelSelectionSheet builder to eliminate shadowing and update any references
inside the closure to use sheetContext when referring to the
bottom-sheet-specific context while keeping this.context for the State (e.g., in
the call to _viewModel.onModelSelected).
examples/react-native/RunAnywhereAI/src/screens/VLMScreen.tsx (1)

102-105: handleDismissError is dead code — never used in JSX.

This callback is defined but not referenced anywhere in the component's render output. Remove it to reduce noise.

Remove dead code
-  // Dismiss error
-  const handleDismissError = useCallback(() => {
-    // Reset error in next render to prevent flicker
-    // Since hook doesn't expose setError, we'll just let user retry
-  }, []);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/react-native/RunAnywhereAI/src/screens/VLMScreen.tsx` around lines
102 - 105, The function handleDismissError defined with useCallback is dead code
and should be removed; delete the entire handleDismissError callback block
(including its comment) from VLMScreen.tsx and, after removal, remove the
now-unused useCallback import if it is no longer referenced elsewhere in the
file. This cleans up unused symbols (handleDismissError and possibly
useCallback) and reduces noise in the component.
sdk/runanywhere-react-native/packages/llamacpp/src/LlamaCPP.ts (1)

230-297: addVLMModel largely duplicates addModel — consider extracting shared logic.

The two methods share ~80% of their code (ID generation, format inference, disk existence check, ModelInfo construction, registry call). Only the category, supportsThinking, and metadata differ.

A shared private helper (e.g., _buildAndRegisterModel) accepting the differing fields would reduce duplication and ensure future fixes (e.g., to disk-check logic) apply to both paths.

As per coding guidelines, "focus on SIMPLICITY and SOLID principles" — the DRY violation here is straightforward to resolve and would prevent the two methods from diverging over time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/src/LlamaCPP.ts` around lines
230 - 297, The addVLMModel implementation duplicates most logic from addModel;
extract the shared behavior into a private helper (suggested name:
_buildAndRegisterModel) that handles ID generation (this._generateModelId),
format inference (url -> ModelFormat), disk checks (FileSystem.modelExists,
FileSystem.getModelPath), ModelInfo construction (common fields like id, name,
format, downloadURL, localPath, createdAt/updatedAt, isDownloaded, isAvailable),
and ModelRegistry.registerModel, then have addVLMModel and addModel call that
helper while passing the differing bits (category, supportsThinking,
metadata.mmprojUrl or other per-model fields) so the duplicated checks and
registry call live in one place.
sdk/runanywhere-react-native/packages/core/src/services/FileSystem.ts (1)

627-663: findGGUFInDirectory looks correct; minor note on error-path logging.

The recursive search with mmproj exclusion is sound. One small observation: the throw at line 658 ("No GGUF model file found") is immediately caught by the outer catch at line 659, which logs it as "Error finding GGUF file" before rethrowing. This makes the "not found" case look like an unexpected error in logs. Consider throwing only from outside the try-catch, or distinguishing between "not found" and actual I/O errors.

Optional: cleaner error flow
   async findGGUFInDirectory(directory: string): Promise<string> {
     if (!RNFS) {
       throw new Error('react-native-fs not available');
     }

-    try {
-      const contents = await RNFS.readDir(directory);
-
-      // Look for .gguf files (not mmproj)
-      for (const item of contents) {
-        if (item.isFile() && item.name.endsWith('.gguf') && !item.name.includes('mmproj')) {
-          logger.info(`Found main GGUF model: ${item.name}`);
-          return item.path;
-        }
+    const contents = await RNFS.readDir(directory);
+
+    // Look for .gguf files (not mmproj)
+    for (const item of contents) {
+      if (item.isFile() && item.name.endsWith('.gguf') && !item.name.includes('mmproj')) {
+        logger.info(`Found main GGUF model: ${item.name}`);
+        return item.path;
       }
+    }
 
-      // If not found, check nested directories
-      for (const item of contents) {
-        if (item.isDirectory()) {
-          try {
-            return await this.findGGUFInDirectory(item.path);
-          } catch {
-            // Continue searching other directories
-          }
+    // If not found, check nested directories
+    for (const item of contents) {
+      if (item.isDirectory()) {
+        try {
+          return await this.findGGUFInDirectory(item.path);
+        } catch {
+          // Continue searching other directories
         }
       }
-
-      throw new Error(`No GGUF model file found in ${directory}`);
-    } catch (error) {
-      logger.error(`Error finding GGUF file: ${error}`);
-      throw error;
     }
+
+    throw new Error(`No GGUF model file found in ${directory}`);
   },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/core/src/services/FileSystem.ts` around
lines 627 - 663, The current findGGUFInDirectory method logs every thrown Error
as an I/O error because the local "No GGUF model file found" is thrown inside
the try and immediately caught; modify the flow so only real I/O/runtime errors
are logged: move the final "throw new Error(`No GGUF model file found in
${directory}`)" out of the try block (or throw a distinct NotFoundError and only
logger.error inside catch for non-NotFoundError cases), keep recursion via
this.findGGUFInDirectory, and continue to rethrow actual errors so RNFS failures
are logged while "not found" remains a normal control signal.
sdk/runanywhere-react-native/packages/llamacpp/src/RunAnywhere+VLM.ts (2)

71-88: loadVLMModel doesn't catch errors — inconsistent with registerVLMBackend.

registerVLMBackend wraps the native call in try/catch and returns false on failure. loadVLMModel lets the native exception propagate directly to the caller. This is a deliberate design choice (callers handle the error), but it's inconsistent and could surprise SDK consumers who expect a uniform boolean return pattern.

Not a bug, but worth documenting or harmonizing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/src/RunAnywhere`+VLM.ts around
lines 71 - 88, The loadVLMModel function currently lets native.loadVLMModel
throw, causing inconsistent behavior with registerVLMBackend; update
loadVLMModel to mirror registerVLMBackend by wrapping the native call in a
try/catch, logging the error via logger.warning or logger.error (include the
error message) and returning false on failure while still returning the native
result on success; locate symbols loadVLMModel, getNativeLlamaModule,
isNativeLlamaModuleAvailable, and native.loadVLMModel to implement this change.

448-454: uint8ArrayToBase64 may be very slow for large RGB images.

Building a string one String.fromCharCode at a time is O(n²) in many JS engines due to string immutability. For a 1920×1080 RGB image (~6 MB), this could cause noticeable jank. A chunked approach is significantly faster:

Proposed fix
 function uint8ArrayToBase64(bytes: Uint8Array): string {
-  // Use btoa with binary string conversion
-  let binaryString = '';
-  for (let i = 0; i < bytes.length; i++) {
-    binaryString += String.fromCharCode(bytes[i]);
-  }
-  return btoa(binaryString);
+  const CHUNK_SIZE = 0x8000; // 32KB chunks
+  const chunks: string[] = [];
+  for (let i = 0; i < bytes.length; i += CHUNK_SIZE) {
+    const slice = bytes.subarray(i, Math.min(i + CHUNK_SIZE, bytes.length));
+    chunks.push(String.fromCharCode(...slice));
+  }
+  return btoa(chunks.join(''));
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/src/RunAnywhere`+VLM.ts around
lines 448 - 454, The uint8ArrayToBase64 function builds a huge string
incrementally causing O(n²) behavior for large buffers; change it to a chunked
conversion: iterate over the Uint8Array in chunks (e.g., 32k or 0x8000 bytes),
for each chunk convert to a string with String.fromCharCode.apply or a fast
typed-array-to-string method and push to an array, then join the array and call
btoa once on the full string (or call btoa per chunk and concatenate) to avoid
repeated string reallocation; update the uint8ArrayToBase64 function to use this
chunked approach to improve performance for large RGB images.
sdk/runanywhere-react-native/packages/llamacpp/cpp/HybridRunAnywhereLlama.cpp (2)

100-120: Base64 lookup table is rebuilt on every call.

The 256-element lookup table T is recomputed each invocation. For images this could be called with large payloads. Consider making it static const (thread-safe in C++11+) for a small but free performance win.

Proposed fix
 std::vector<uint8_t> base64Decode(const std::string& encoded) {
   static const std::string base64_chars =
       "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

-  std::vector<uint8_t> decoded;
-  std::vector<int> T(256, -1);
-  for (int i = 0; i < 64; i++) T[base64_chars[i]] = i;
+  static const auto T = []() {
+    std::vector<int> table(256, -1);
+    for (int i = 0; i < 64; i++) table[base64_chars[i]] = i;
+    return table;
+  }();
+
+  std::vector<uint8_t> decoded;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-react-native/packages/llamacpp/cpp/HybridRunAnywhereLlama.cpp`
around lines 100 - 120, The base64Decode function rebuilds the 256-entry lookup
vector T on every call; change the lookup to a single, shared, pre-initialized
static (e.g., static const std::array<int,256> or static std::vector<int>) so it
is initialized once and reused across calls, and also make base64_chars static
const; update references in base64Decode to use the static table (keep function
signature and decoding logic unchanged) to avoid per-call allocation and improve
performance and thread-safety.

448-459: unloadVLMModel silently swallows errors, unlike loadVLMModel and unloadModel.

loadVLMModel (line 437) re-throws after setLastError, and the LLM unloadModel (line 252) returns the result code. But unloadVLMModel catches exceptions and returns false, meaning callers on the JS side never see the error reason — only a false boolean. Consider being consistent: either re-throw (like loadVLMModel) or at minimum log the exception message so it's available via getLastError().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-react-native/packages/llamacpp/cpp/HybridRunAnywhereLlama.cpp`
around lines 448 - 459, unloadVLMModel currently swallows exceptions and returns
false, hiding the error from JS; make its behavior consistent with loadVLMModel
and unloadModel by capturing the exception, calling setLastError(e.what()), and
then re-throwing the exception (or alternatively return a distinct error
code/log the message) so callers can observe the failure; modify
HybridRunAnywhereLlama::unloadVLMModel around the VLMBridge::shared().unload()
call to mirror loadVLMModel's error handling pattern (invoke setLastError and
re-throw) instead of silently returning false.
examples/flutter/RunAnywhereAI/lib/features/vision/vlm_view_model.dart (1)

143-184: Three near-identical "capture → stream → append tokens" flows.

describeCurrentFrame, describePickedImage, and _describeCurrentFrameForAutoStream share the same pattern: build VLMImage.filePath, call processImageStream, iterate tokens, append to buffer, and notify. This could be extracted to a shared helper accepting imagePath, prompt, maxTokens, and a flag for error handling behavior. Fine as-is for an example app, but worth noting if this grows.

Also applies to: 190-224, 266-309

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/features/vision/vlm_view_model.dart`
around lines 143 - 184, The three near-identical flows in describeCurrentFrame,
describePickedImage, and _describeCurrentFrameForAutoStream should be extracted
into a single helper (e.g., _processImageStreamAndAppend) that accepts
imagePath, prompt, maxTokens, and an error-handling/notification flag; inside
the helper create the VLMImage with VLMImage.filePath(imagePath), call
sdk.RunAnywhere.processImageStream(...), iterate result.stream to append tokens
into a StringBuffer and update the view-model state via notifyListeners (or
delegate notification behavior based on the flag), and then replace the bodies
of describeCurrentFrame, describePickedImage, and
_describeCurrentFrameForAutoStream to call this helper with the appropriate
arguments.

Comment on lines +321 to +326
@override
void dispose() {
_autoStreamTimer?.cancel();
unawaited(_cameraController?.dispose());
super.dispose();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

dispose doesn't cancel in-flight VLM generation — risks notifyListeners() on a disposed ChangeNotifier.

If the ViewModel is disposed while describeCurrentFrame, describePickedImage, or _describeCurrentFrameForAutoStream is mid-stream, the await for loop will continue receiving tokens and call notifyListeners() after super.dispose(), throwing a FlutterError. The timer is cancelled but not the generation itself.

Proposed fix
+ bool _disposed = false;
+
  `@override`
  void dispose() {
+   _disposed = true;
    _autoStreamTimer?.cancel();
+   // Cancel any in-flight VLM generation
+   sdk.RunAnywhere.cancelVLMGeneration();
    unawaited(_cameraController?.dispose());
    super.dispose();
  }

Then guard notification calls:

void _safeNotify() {
  if (!_disposed) notifyListeners();
}

Replace all notifyListeners() calls with _safeNotify().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/flutter/RunAnywhereAI/lib/features/vision/vlm_view_model.dart`
around lines 321 - 326, The dispose method currently only cancels the timer and
camera controller but doesn’t stop in-flight VLM generation, so streams in
describeCurrentFrame, describePickedImage, and
_describeCurrentFrameForAutoStream can continue and call notifyListeners() after
disposal; add a private boolean _disposed = false and set it true in dispose,
store any active StreamSubscription(s) returned by your `await for` loops (from
describeCurrentFrame/describePickedImage/_describeCurrentFrameForAutoStream) and
cancel them in dispose, and replace all direct notifyListeners() calls with a
guarded helper (e.g., _safeNotify that calls notifyListeners() only if
!_disposed) to prevent notifications after disposal while ensuring subscriptions
are cancelled.

Comment on lines +182 to +187
// SmolVLM 500M - Ultra-lightweight VLM for mobile (~500MB total)
await LlamaCPP.addVLMModel({
id: 'smolvlm-500m-instruct-q8_0',
name: 'SmolVLM 500M Instruct',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-vlm-models-v1/smolvlm-500m-instruct-q8_0.tar.gz',
memoryRequirement: 600_000_000,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Comment says ~500MB total but memoryRequirement is 600 MB.

The inline comment understates the memory figure. memoryRequirement: 600_000_000 is the runtime value; the comment should reflect it.

📝 Suggested correction
-    // SmolVLM 500M - Ultra-lightweight VLM for mobile (~500MB total)
+    // SmolVLM 500M - Ultra-lightweight VLM for mobile (~600MB runtime: model + mmproj + overhead)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// SmolVLM 500M - Ultra-lightweight VLM for mobile (~500MB total)
await LlamaCPP.addVLMModel({
id: 'smolvlm-500m-instruct-q8_0',
name: 'SmolVLM 500M Instruct',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-vlm-models-v1/smolvlm-500m-instruct-q8_0.tar.gz',
memoryRequirement: 600_000_000,
// SmolVLM 500M - Ultra-lightweight VLM for mobile (~600MB runtime: model + mmproj + overhead)
await LlamaCPP.addVLMModel({
id: 'smolvlm-500m-instruct-q8_0',
name: 'SmolVLM 500M Instruct',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-vlm-models-v1/smolvlm-500m-instruct-q8_0.tar.gz',
memoryRequirement: 600_000_000,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/react-native/RunAnywhereAI/App.tsx` around lines 182 - 187, The
inline comment for the SmolVLM model is inconsistent with the runtime memory
setting; update the comment near the LlamaCPP.addVLMModel call (the model with
id 'smolvlm-500m-instruct-q8_0' and property memoryRequirement) to reflect the
actual memoryRequirement value (600_000_000 / ~600MB) so the comment accurately
matches memoryRequirement: 600_000_000.

Comment on lines +93 to +99
const handleMainAction = useCallback(() => {
if (vlm.isAutoStreaming) {
vlm.toggleAutoStreaming();
} else {
vlm.captureAndDescribe();
}
}, [vlm]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

handleMainAction also depends on unstable [vlm] — memoization is ineffective.

Same root cause as the effects and handleModelSelected. The callback is recreated every render. Either extract stable refs from the hook or use the vlm object directly without useCallback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/react-native/RunAnywhereAI/src/screens/VLMScreen.tsx` around lines
93 - 99, The handleMainAction callback is being memoized with an unstable
dependency (vlm) so it’s recreated every render; fix by either removing
useCallback and call vlm.toggleAutoStreaming or vlm.captureAndDescribe directly
in the handler, or extract stable method refs (e.g., const toggleRef =
useRef(vlm.toggleAutoStreaming); const captureRef =
useRef(vlm.captureAndDescribe)) or stable callbacks that reference those methods
and then use those refs in handleMainAction so its dependency array can be
stable; update the handleMainAction implementation (and its dependency array)
accordingly to reference either stable refs or no longer memoize.

Comment on lines +562 to +595
final lib = PlatformLoader.loadCommons();
final processFn = lib.lookupFunction<
Int32 Function(RacHandle, Pointer<RacVlmImageStruct>, Pointer<Utf8>,
Pointer<RacVlmOptionsStruct>, Pointer<RacVlmResultStruct>),
int Function(
RacHandle,
Pointer<RacVlmImageStruct>,
Pointer<Utf8>,
Pointer<RacVlmOptionsStruct>,
Pointer<RacVlmResultStruct>)>('rac_vlm_component_process');

final status = processFn(handle, imagePtr, promptPtr, optionsPtr, resultPtr);

if (status != RAC_SUCCESS) {
return VlmBridgeResult(
text: '',
error: 'VLM processing failed: ${RacResultCode.getMessage(status)}',
);
}

final result = resultPtr.ref;
final text = result.text != nullptr ? result.text.toDartString() : '';

return VlmBridgeResult(
text: text,
promptTokens: result.promptTokens,
imageTokens: result.imageTokens,
completionTokens: result.completionTokens,
totalTokens: result.totalTokens,
timeToFirstTokenMs: result.timeToFirstTokenMs,
imageEncodeTimeMs: result.imageEncodeTimeMs,
totalTimeMs: result.totalTimeMs,
tokensPerSecond: result.tokensPerSecond,
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== Searching for rac_vlm_result_free in commons headers ==="
rg -n 'rac_vlm_result_free' sdk/runanywhere-commons/include

echo ""
echo "=== Searching for rac_vlm_result_free usage in Flutter native code ==="
rg -n 'VlmResultFree|rac_vlm_result_free' sdk/runanywhere-flutter/packages/runanywhere/lib/native

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 924


🏁 Script executed:

#!/bin/bash
echo "=== Searching for VLM-related code to understand structure ==="
rg -n 'RacVlmResultStruct' sdk/runanywhere-commons/include -A 3

echo ""
echo "=== Checking for memory management patterns in Flutter code ==="
rg -n 'resultPtr|RacVlmResultStruct' sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge_vlm.dart | head -30

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 854


🏁 Script executed:

#!/bin/bash
echo "=== Checking the entire _processInIsolate function for context ==="
rg -n '_processInIsolate' sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge_vlm.dart -A 50 | head -100

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 3300


🏁 Script executed:

#!/bin/bash
echo "=== Getting the complete _processInIsolate function ==="
sed -n '490,650p' sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge_vlm.dart

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 5295


🏁 Script executed:

#!/bin/bash
echo "=== Checking if rac_vlm_result_free appears anywhere in entire codebase ==="
rg -n 'rac_vlm_result_free' sdk/ --type dart --type cpp --type h

echo ""
echo "=== Checking C++ bridge for how result is freed ==="
rg -n 'rac_vlm_result_free|VlmResult.*free' sdk/runanywhere-commons -A 2 | head -50

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 4004


🏁 Script executed:

#!/bin/bash
echo "=== Implementation of rac_vlm_result_free ==="
sed -n '190,200p' sdk/runanywhere-commons/src/features/vlm/rac_vlm_service.cpp

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 315


🏁 Script executed:

#!/bin/bash
echo "=== Checking React Native usage for comparison ==="
sed -n '150,165p' sdk/runanywhere-react-native/packages/llamacpp/cpp/bridges/VLMBridge.cpp

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 678


Call rac_vlm_result_free() to avoid memory leaks.

The _processInIsolate function extracts the result text but never calls rac_vlm_result_free(), leaking the C++-allocated result.text on every call. The C++ implementation (React Native) correctly calls this function after processing. Currently, only calloc.free(resultPtr) is called, which frees the struct container but not the owned memory inside it.

🐛 Suggested fix
     final result = resultPtr.ref;
     final text = result.text != nullptr ? result.text.toDartString() : '';
+
+    final freeResultFn = lib.lookupFunction<
+        Void Function(Pointer<RacVlmResultStruct>),
+        void Function(Pointer<RacVlmResultStruct>)>('rac_vlm_result_free');
+    freeResultFn(resultPtr);
 
     return VlmBridgeResult(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge_vlm.dart`
around lines 562 - 595, In _processInIsolate, after calling
rac_vlm_component_process and reading resultPtr.ref (including converting
result.text to Dart via toDartString()), call the C cleanup function
rac_vlm_result_free(resultPtr) to free any C-allocated memory owned by the
struct (including result.text) before calling calloc.free(resultPtr) or
returning; do not manually free result.text if rac_vlm_result_free handles it to
avoid double-free.

Comment on lines +654 to +793
/// SendPort for the current streaming operation in the background isolate
SendPort? _vlmIsolateSendPort;

/// Entry point for the VLM streaming isolate
@pragma('vm:entry-point')
void _vlmStreamingIsolateEntry(_VlmStreamingIsolateParams params) {
// Store the SendPort for callbacks to use
_vlmIsolateSendPort = params.sendPort;

final handle = Pointer<Void>.fromAddress(params.handleAddress);
final promptPtr = params.prompt.toNativeUtf8();
final imagePtr = calloc<RacVlmImageStruct>();
final optionsPtr = calloc<RacVlmOptionsStruct>();

Pointer<Utf8>? filePathPtr;
Pointer<Uint8>? pixelDataPtr;
Pointer<Utf8>? base64DataPtr;

try {
// Set up image struct based on format
imagePtr.ref.format = params.imageFormat;
imagePtr.ref.width = params.width;
imagePtr.ref.height = params.height;

if (params.imageFormat == RacVlmImageFormat.filePath &&
params.filePath != null) {
filePathPtr = params.filePath!.toNativeUtf8();
imagePtr.ref.filePath = filePathPtr!;
imagePtr.ref.pixelData = nullptr;
imagePtr.ref.base64Data = nullptr;
imagePtr.ref.dataSize = 0;
} else if (params.imageFormat == RacVlmImageFormat.rgbPixels &&
params.pixelData != null) {
// Allocate native memory for pixel data
pixelDataPtr = calloc<Uint8>(params.pixelData!.length);
for (int i = 0; i < params.pixelData!.length; i++) {
pixelDataPtr![i] = params.pixelData![i];
}
imagePtr.ref.filePath = nullptr;
imagePtr.ref.pixelData = pixelDataPtr!;
imagePtr.ref.base64Data = nullptr;
imagePtr.ref.dataSize = params.pixelData!.length;
} else if (params.imageFormat == RacVlmImageFormat.base64 &&
params.base64Data != null) {
base64DataPtr = params.base64Data!.toNativeUtf8();
imagePtr.ref.filePath = nullptr;
imagePtr.ref.pixelData = nullptr;
imagePtr.ref.base64Data = base64DataPtr!;
imagePtr.ref.dataSize = params.base64Data!.length;
} else {
params.sendPort.send(
_VlmStreamingMessage(error: 'Invalid image format or missing data'),
);
return;
}

// Set options
optionsPtr.ref.maxTokens = params.maxTokens;
optionsPtr.ref.temperature = params.temperature;
optionsPtr.ref.topP = params.topP;
optionsPtr.ref.stopSequences = nullptr;
optionsPtr.ref.numStopSequences = 0;
optionsPtr.ref.streamingEnabled = RAC_TRUE;
optionsPtr.ref.systemPrompt = nullptr;
optionsPtr.ref.maxImageSize = 0;
optionsPtr.ref.nThreads = 0;
optionsPtr.ref.useGpu = params.useGpu ? RAC_TRUE : RAC_FALSE;

final lib = PlatformLoader.loadCommons();

// Get callback function pointers
final tokenCallbackPtr =
Pointer.fromFunction<Int32 Function(Pointer<Utf8>, Pointer<Void>)>(
_vlmIsolateTokenCallback, 1);
final completeCallbackPtr = Pointer.fromFunction<
Void Function(Pointer<RacVlmResultStruct>,
Pointer<Void>)>(_vlmIsolateCompleteCallback);
final errorCallbackPtr = Pointer.fromFunction<
Void Function(Int32, Pointer<Utf8>,
Pointer<Void>)>(_vlmIsolateErrorCallback);

final processStreamFn = lib.lookupFunction<
Int32 Function(
RacHandle,
Pointer<RacVlmImageStruct>,
Pointer<Utf8>,
Pointer<RacVlmOptionsStruct>,
Pointer<NativeFunction<Int32 Function(Pointer<Utf8>, Pointer<Void>)>>,
Pointer<
NativeFunction<
Void Function(Pointer<RacVlmResultStruct>, Pointer<Void>)>>,
Pointer<
NativeFunction<
Void Function(Int32, Pointer<Utf8>, Pointer<Void>)>>,
Pointer<Void>,
),
int Function(
RacHandle,
Pointer<RacVlmImageStruct>,
Pointer<Utf8>,
Pointer<RacVlmOptionsStruct>,
Pointer<NativeFunction<Int32 Function(Pointer<Utf8>, Pointer<Void>)>>,
Pointer<
NativeFunction<
Void Function(Pointer<RacVlmResultStruct>, Pointer<Void>)>>,
Pointer<
NativeFunction<
Void Function(Int32, Pointer<Utf8>, Pointer<Void>)>>,
Pointer<Void>,
)>('rac_vlm_component_process_stream');

// This FFI call blocks until processing is complete
final status = processStreamFn(
handle,
imagePtr,
promptPtr,
optionsPtr,
tokenCallbackPtr,
completeCallbackPtr,
errorCallbackPtr,
nullptr,
);

if (status != RAC_SUCCESS) {
params.sendPort.send(_VlmStreamingMessage(
error:
'Failed to start streaming: ${RacResultCode.getMessage(status)}',
));
}
} catch (e) {
params.sendPort.send(_VlmStreamingMessage(error: 'Streaming exception: $e'));
} finally {
calloc.free(promptPtr);
calloc.free(imagePtr);
calloc.free(optionsPtr);
if (filePathPtr != null) calloc.free(filePathPtr);
if (pixelDataPtr != null) calloc.free(pixelDataPtr);
if (base64DataPtr != null) calloc.free(base64DataPtr);
_vlmIsolateSendPort = null;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prevent concurrent streaming calls from overwriting the global SendPort.
_vlmIsolateSendPort is global; a second stream can overwrite it and misroute tokens. Consider guarding against concurrent streams (or documenting single-stream usage).

✅ Simple single-stream guard
 class DartBridgeVLM {
+  bool _streamingActive = false;
...
   Stream<String> processImageStream({
     required String prompt,
     required int imageFormat,
     String? filePath,
     Uint8List? pixelData,
     int width = 0,
     int height = 0,
     String? base64Data,
     int maxTokens = 2048,
     double temperature = 0.7,
     double topP = 0.9,
     bool useGpu = true,
   }) {
+    if (_streamingActive) {
+      throw StateError('VLM streaming already in progress');
+    }
+    _streamingActive = true;
     final handle = getHandle();
...
   Future<void> _startBackgroundStreaming(
...
     receivePort.listen((message) {
       if (controller.isClosed) return;
 
       if (message is String) {
         controller.add(message);
       } else if (message is _VlmStreamingMessage) {
         if (message.isComplete) {
           controller.close();
           receivePort.close();
+          _streamingActive = false;
         } else if (message.error != null) {
           controller.addError(StateError(message.error!));
           controller.close();
           receivePort.close();
+          _streamingActive = false;
         }
       }
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-flutter/packages/runanywhere/lib/native/dart_bridge_vlm.dart`
around lines 654 - 793, The global _vlmIsolateSendPort is being overwritten by
concurrent isolates which can misroute tokens; in _vlmStreamingIsolateEntry
either (A) avoid using the global: remove the assignment to _vlmIsolateSendPort
and use a local final sendPort = params.sendPort everywhere in this function
(and in callbacks capture that sendPort) or (B) add a guard at the top of
_vlmStreamingIsolateEntry that checks if _vlmIsolateSendPort != null and, if so,
calls params.sendPort.send(...) with an error message and returns; ensure the
finally block no longer clears a sendPort you didn't set (or clears only when
you set it). Use the symbols _vlmStreamingIsolateEntry, _vlmIsolateSendPort,
_VlmStreamingIsolateParams and the callback functions referenced (e.g.,
_vlmIsolateTokenCallback/_vlmIsolateCompleteCallback/_vlmIsolateErrorCallback)
to locate and implement the change.


rac_handle_t handle_ = nullptr;
std::string loadedModelId_;
bool cancellationRequested_ = false;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

cancellationRequested_ must be std::atomic<bool> — plain bool is a data race.

cancel() is designed to be called from a different thread than processStream(). Writing a plain bool from one thread while another thread reads it is undefined behaviour under C++11's memory model regardless of the platform's word size. Use std::atomic<bool> with the appropriate memory order.

🔒 Proposed fix
+#include <atomic>
 
 private:
     VLMBridge();
     ~VLMBridge();
 
     VLMBridge(const VLMBridge&) = delete;
     VLMBridge& operator=(const VLMBridge&) = delete;
 
     rac_handle_t handle_ = nullptr;
     std::string loadedModelId_;
-    bool cancellationRequested_ = false;
+    std::atomic<bool> cancellationRequested_{false};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/cpp/bridges/VLMBridge.hpp` at
line 119, Replace the plain bool cancellationRequested_ with std::atomic<bool>
and include <atomic>; update its declaration in VLMBridge.hpp (symbol:
cancellationRequested_) to initialize to false as an atomic, and ensure cancel()
stores true (use store(true, std::memory_order_relaxed) or stronger if you
prefer) while processStream() reads it via load(...) (or uses the atomic in
conditions) to avoid data races—look for all accesses to cancellationRequested_
and switch them to atomic store/load semantics.

Comment on lines +495 to +501
return buildJsonObject({
{"text", jsonString(result.text)},
{"promptTokens", std::to_string(result.promptTokens)},
{"completionTokens", std::to_string(result.completionTokens)},
{"totalTimeMs", std::to_string(result.totalTimeMs)},
{"tokensPerSecond", std::to_string(result.tokensPerSecond)}
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

JSON key mismatch between C++ and TypeScript — all VLM metrics will be zero.

The C++ side emits camelCase keys (promptTokens, completionTokens, totalTimeMs, tokensPerSecond), but the TypeScript consumer in RunAnywhere+VLM.ts (lines 198–201) reads snake_case keys (result.prompt_tokens, result.completion_tokens, result.total_time_ms, result.tokens_per_second). Since the keys never match, all parsed metrics will fall through to the ?? 0 defaults and every VLMResult will report zeros for tokens and timing.

Either align C++ keys to snake_case or update the TS parsing. I'd recommend fixing C++ to match the existing TS convention:

Proposed fix (align C++ keys to snake_case)
       return buildJsonObject({
         {"text", jsonString(result.text)},
-        {"promptTokens", std::to_string(result.promptTokens)},
-        {"completionTokens", std::to_string(result.completionTokens)},
-        {"totalTimeMs", std::to_string(result.totalTimeMs)},
-        {"tokensPerSecond", std::to_string(result.tokensPerSecond)}
+        {"prompt_tokens", std::to_string(result.promptTokens)},
+        {"completion_tokens", std::to_string(result.completionTokens)},
+        {"total_time_ms", std::to_string(result.totalTimeMs)},
+        {"tokens_per_second", std::to_string(result.tokensPerSecond)}
       });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@sdk/runanywhere-react-native/packages/llamacpp/cpp/HybridRunAnywhereLlama.cpp`
around lines 495 - 501, The JSON emitted by buildJsonObject in
HybridRunAnywhereLlama.cpp uses camelCase keys (e.g., "promptTokens",
"completionTokens", "totalTimeMs", "tokensPerSecond") but the TypeScript
consumer RunAnywhere+VLM.ts expects snake_case (result.prompt_tokens,
result.completion_tokens, result.total_time_ms, result.tokens_per_second);
update the JSON keys in the buildJsonObject return (the block that constructs
{"text", jsonString(result.text)} ...) to use snake_case names that match the TS
parser (prompt_tokens, completion_tokens, total_time_ms, tokens_per_second) so
the metrics are parsed correctly.

Comment on lines +121 to +132
const native = requireNativeLlamaModule();

// Unregister VLM first (matches iOS: unregister VLM before LLM)
if (this.isVLMRegistered) {
try {
await native.unloadVLMModel();
this.isVLMRegistered = false;
vlmLog.info('LlamaCPP VLM backend unregistered');
} catch (error) {
vlmLog.error(`LlamaCPP VLM unregistration failed: ${error instanceof Error ? error.message : String(error)}`);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for unregisterVLMBackend in the codebase
rg -n "unregisterVLMBackend" --type=ts

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 56


🏁 Script executed:

# Also search for the native module interface/spec
rg -n "unloadVLMModel|registerVLMBackend" --type=ts -B2 -A2

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 6245


🏁 Script executed:

# Find the native module type definitions
fd -e "\.ts" -e "\.tsx" | xargs rg "requireNativeLlamaModule|NativeLlamaModule" | head -20

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 2315


🏁 Script executed:

# Check the full native module spec to understand the VLM API design
cat -n sdk/runanywhere-react-native/packages/llamacpp/src/specs/RunAnywhereLlama.nitro.ts | sed -n '160,190p'

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 1500


🏁 Script executed:

# Also check if there are any comments or docs explaining the VLM registration/unregistration strategy
rg -n "registerVLMBackend|unloadVLMModel" -B5 -A5 sdk/runanywhere-react-native/packages/llamacpp/src/specs/RunAnywhereLlama.nitro.ts

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 881


🏁 Script executed:

# Check how iOS implements VLM cleanup for comparison
fd -e "swift" | xargs rg -l "VLM|registerVLM" 2>/dev/null | head -5

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 527


🏁 Script executed:

# Check the iOS Swift implementation of VLM cleanup
cat -n sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VLM/RunAnywhere+VisionLanguage.swift | head -150

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 6880


🏁 Script executed:

# Search for unload/unregister patterns in iOS VLM implementation
rg -n "unload|unregister" sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VLM/ -B3 -A3

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 1144


🏁 Script executed:

# Check the iOS CppBridge VLM implementation
fd -e "swift" -path "*/CppBridge*" | xargs rg -l "VLM" 2>/dev/null

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 9163


🏁 Script executed:

# Search for VLM registration/unregistration in iOS CppBridge
rg -n "registerVLMBackend|unload|unregister" sdk/runanywhere-swift/ -B2 -A5 | grep -A10 -B10 "VLM"

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 12293


🏁 Script executed:

# Check how the LlamaCppProvider.unregister() compares to iOS implementation
rg -n "unregister\|deinit" sdk/runanywhere-swift/Sources/RunAnywhere/ --type=swift | head -30

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 56


Add unregisterVLMBackend() to native module and call it during cleanup.

The React Native implementation unloads only the VLM model but doesn't unregister the VLM backend—unlike iOS, which calls rac_backend_llamacpp_vlm_unregister() during unregister(). The C++ API for rac_backend_llamacpp_vlm_unregister() exists but is not exposed in the React Native native module interface. This leaves the VLM backend registered in C++ after the JS-side flag is cleared, creating state inconsistency with iOS.

Add unregisterVLMBackend() to RunAnywhereLlama.nitro.ts and call it in LlamaCppProvider.unregister() instead of (or in addition to) unloadVLMModel() to match iOS behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/src/LlamaCppProvider.ts`
around lines 121 - 132, The native VLM backend isn't being unregistered from the
C++ layer—add and wire a new method named unregisterVLMBackend on the native
module (in RunAnywhereLlama.nitro.ts) and call it from
LlamaCppProvider.unregister() instead of only calling unloadVLMModel(); locate
the native accessor via requireNativeLlamaModule(), add an exported
unregisterVLMBackend() that invokes the underlying
rac_backend_llamacpp_vlm_unregister() binding, then update
LlamaCppProvider.unregister() to await native.unregisterVLMBackend() (with the
same try/catch and logging using vlmLog) and clear this.isVLMRegistered
accordingly.

Comment on lines +194 to +214
try {
const result = JSON.parse(resultJson);
return {
text: result.text ?? '',
promptTokens: result.prompt_tokens ?? 0,
completionTokens: result.completion_tokens ?? 0,
totalTimeMs: result.total_time_ms ?? 0,
tokensPerSecond: result.tokens_per_second ?? 0,
};
} catch {
if (resultJson.includes('error')) {
throw new Error(resultJson);
}
return {
text: resultJson,
promptTokens: 0,
completionTokens: 0,
totalTimeMs: 0,
tokensPerSecond: 0,
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

JSON key mismatch: parsed keys don't match C++ output.

Lines 198–201 read result.prompt_tokens, result.completion_tokens, etc. (snake_case), but the C++ bridge emits promptTokens, completionTokens, etc. (camelCase). All metrics will silently fall to the ?? 0 defaults. See the corresponding comment on HybridRunAnywhereLlama.cpp lines 495–501 for the proposed fix.

If the C++ keys are changed to snake_case, this code is correct as-is. Otherwise, update the keys here:

Alternative: fix on the TS side
     return {
       text: result.text ?? '',
-      promptTokens: result.prompt_tokens ?? 0,
-      completionTokens: result.completion_tokens ?? 0,
-      totalTimeMs: result.total_time_ms ?? 0,
-      tokensPerSecond: result.tokens_per_second ?? 0,
+      promptTokens: result.promptTokens ?? 0,
+      completionTokens: result.completionTokens ?? 0,
+      totalTimeMs: result.totalTimeMs ?? 0,
+      tokensPerSecond: result.tokensPerSecond ?? 0,
     };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
const result = JSON.parse(resultJson);
return {
text: result.text ?? '',
promptTokens: result.prompt_tokens ?? 0,
completionTokens: result.completion_tokens ?? 0,
totalTimeMs: result.total_time_ms ?? 0,
tokensPerSecond: result.tokens_per_second ?? 0,
};
} catch {
if (resultJson.includes('error')) {
throw new Error(resultJson);
}
return {
text: resultJson,
promptTokens: 0,
completionTokens: 0,
totalTimeMs: 0,
tokensPerSecond: 0,
};
}
try {
const result = JSON.parse(resultJson);
return {
text: result.text ?? '',
promptTokens: result.promptTokens ?? 0,
completionTokens: result.completionTokens ?? 0,
totalTimeMs: result.totalTimeMs ?? 0,
tokensPerSecond: result.tokensPerSecond ?? 0,
};
} catch {
if (resultJson.includes('error')) {
throw new Error(resultJson);
}
return {
text: resultJson,
promptTokens: 0,
completionTokens: 0,
totalTimeMs: 0,
tokensPerSecond: 0,
};
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/src/RunAnywhere`+VLM.ts around
lines 194 - 214, The JSON parsing block in RunAnywhere+VLM.ts is reading
snake_case keys (result.prompt_tokens, result.completion_tokens,
result.total_time_ms, result.tokens_per_second) but the C++ bridge emits
camelCase (promptTokens, completionTokens, totalTimeMs, tokensPerSecond); update
the return mapping in the try branch to read the camelCase properties (e.g.,
result.promptTokens, result.completionTokens, result.totalTimeMs,
result.tokensPerSecond) and optionally keep a fallback to the snake_case names
(e.g., result.promptTokens ?? result.prompt_tokens ?? 0) so parsing in the
function that uses resultJson handles both key styles.

Comment on lines +324 to +330
const finalResult: VLMResult = {
text: fullText,
promptTokens: Math.ceil(prompt.length / 4),
completionTokens: tokenCount,
totalTimeMs: latencyMs,
tokensPerSecond,
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

promptTokens is fabricated in the streaming path.

Math.ceil(prompt.length / 4) is a rough character-based guess, not an actual token count. The non-streaming processVLMImage path returns real metrics from the C++ backend, so consumers comparing the two will see inconsistent values. Consider either:

  • Returning 0 with a clear doc note that streaming doesn't report prompt tokens, or
  • Propagating the actual count from the onComplete callback's rac_vlm_result_t* (which presumably contains real metrics).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-react-native/packages/llamacpp/src/RunAnywhere`+VLM.ts around
lines 324 - 330, The current streaming path fabricates promptTokens using
Math.ceil(prompt.length / 4) in the VLMResult, causing inconsistent metrics vs.
processVLMImage's real backend values; fix by removing the heuristic and either
set promptTokens to 0 with a clear comment that streaming doesn't report prompt
token count, or (preferable) read the actual token counts from the native
callback (use the onComplete handler's rac_vlm_result_t* result structure) and
populate VLMResult.promptTokens from that real metric; update the code that
constructs finalResult (VLMResult) in the streaming flow to pull promptTokens
from the onComplete rac_vlm_result_t* instead of using Math.ceil.

@Siddhesh2377
Copy link
Collaborator

Hey @VyasGuru
You fixing the bugs in this soon ?
Should this pr be converted into draft ?

@VyasGuru
Copy link
Contributor Author

Hey @VyasGuru

You fixing the bugs in this soon ?

Should this pr be converted into draft ?

Have talked this over with @shubhammalhotra28 on slack, this version works... The bot report does have a few things worth checking over but for now, keeping this as is, because it is working.

@Siddhesh2377
Copy link
Collaborator

Hey @VyasGuru
You fixing the bugs in this soon ?
Should this pr be converted into draft ?

Have talked this over with @shubhammalhotra28 on slack, this version works... The bot report does have a few things worth checking over but for now, keeping this as is, because it is working.

Noted ✔️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments