Releases · google-ai-edge/mediapipe

18 Dec 22:17

dbcp1

v0.10.20

0f80899

MediaPipe v0.10.20 Latest

Latest

Build changes

Add comments to explain how to configure OpenCV in the opencv_macos.BUILD file.
Add libc++_shared.so to MediaPipe Android examples.
Add linkstatic to OpenCV prebuilts

Framework and core calculator improvements

Fix ParseFromString() compilation issue in OSS
All the dead links fixed
Add troubleshooting tip for unsupported XNNPACK flags during build
Add UniqueId::Dup.
Updating the XNNPACK latest commit hash
Format Workspace file
Add EglSync wrapper.
Update Bazel version to 6.5.0
Update sync_wait to support UniqueFd.
Fix GlContext includes
Add EglSyncPoint/CreateEglSyncPoint.
Bump MediaPipe version to 0.10.19.
More perfetto tracking for EglSync.
Patch for supporting WebGPU .deviceInfo during API migration.
Log the Tensor multi-write error message only once.
Enable GpuBufferStorageAhwb ASYNC usage for use case: AhwbView write -> GlTextureView read
Add IsSignaled function (the previous SyncWait for checking status triggers unnecessary StrFormat)
Add type information to error message when accessing an empty packet.
Add SharedFD type
Update SyncWait/IsSignaled to work with SharedFd.
Enable SharedFd usage in EglSync
Adding VLOG overrides - MediaPipe utilizes VLOG heavily, but it's not straightforward for how to enable this when running an Android app. VLOG overrides allow to relatively quickly enable VLOGs for various modules within MediaPipe.
Updating Troubleshooting with VLOG info.
Slice only the tokens which are needed for the next stage of the LLM pipeline.
Adds DebugInputStreamHandler.
Delete YUVImage copy and move operations
Adds GetGraphRuntimeInfo methods which generates runtime debugging information about the state of InputStreams.
Add a sample script to run LLM inference on Android via the MediaPipe LLM inference engine.
Update bot_config.yml
Add option to set max sequence size in PackMediaSequenceCalculator instead of having it hard coded.
Update run_llm_inference.sh with recommended models.
Allow to read the input frame rate from the header in the input side stream and to limit the frame rate.
Add stream operator<< for TypeId
Extract native-to-UTF8 path string conversion; add FormatLastError()
Update comment in yuv_image.h
Introduce shadow_copy parameter to PathToResourceAsFile
Fix header includes after refactoring
Migrate away from status builders
Avoids the creation of two "default" GpuExecutor instances
Adds and integrates GraphRuntimeInfoLogger into CalculatorGraph.
nit: don't overwrite InitializeDefaultExecutor argument "use_application_thread"
Add Name() access to the source names in api2.
Add memory mapping and locking to file helpers
Fix Windows build
Fix Windows build, part 2
Support memory mapping in resources.
Bump MediaPipe version to 0.10.20.
Enable log message output for messages larger than 4096 bytes.
Add vision modality to the C API

MediaPipe Tasks update

This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.

Android

Adds the canonical toBuilder method to the LlmInferenceOptions object.
Add Vision Modality to the MediaPipe LLM JNI Layer
Add vision modality to the Java LLM API
Remove unused Proto dependency

iOS

Fixed empty pose world landmarks in iOS holistic landmarker

Javascript

Improve logging to allow users to understand 1) which InferenceCalculator backend is used (without extra VLOG flags) and 2) when a model is loaded (including its size).
nits: Remove linter warnings, fix unused includes.

Python

Update the expected accuracy for text embedder test.
Remove the check for start and stop tokens in the LLM bundler.

Model Maker changes

Move tensorflow lite python calls to ai-edge-litert.

MediaPipe Dependencies

Update WASM files

Assets 2

07 Nov 23:34

dbcp1

v0.10.18

76e52c7

MediaPipe v0.10.18

Build changes

Following open-sourcing webgpu with open-sourcing one of its dependencies third_party/emscripten
Add pillow, pyyaml, and requests to model_maker BUILD

Framework and core calculator improvements

Loading resources through calculator and subgraph contexts and configuring through kResourcesService.
Use std::make_unique
Moves OnDiskCacheHelper class into a separate file / compilation target
Pools: report buffer specs on failure, fix status propagation, fix includes
Open-Source MediaPipe's WebGPU helpers.
BatchMatul uses transpose parameter.
Introduce Resource to represent a generic resource (file content, embedded/in-memory resource) for reading.
Bump up the version number to 0.10.16
Migrate from AdapterProperties to AdapterInfo
Migrate from Resource::ReadContents to Resources::Get (using ForEachLine where required)
Update Resources docs to mention ForEachLine (so devs don't fallback to ReadContents in such a case)
Adjust WebGPU device registration
Fix includes/copies/checks for BuildLabelMapFromFiles
Migrate to BuildLabelMapFromFiles.
Update Python version requirements in setup.py
Introduce Resources with mapping, so graphs can use placeholders instead of actual resource paths.
Remove Resources::ReadContents & add Resource::TryReleaseAsString.
Fix ports for multi side outputs.
Update solution android apps with explicit exported attribute.
Ensure kResourcesService is set before CalculatorGraph is initialized (otherwise subgraphs/nodes may get the wrong default resources).
Switch inference tests to ResourceProviderCalculator & update builder to refer MODEL_RESOURCE.
Migrate modules to use ResourceProviderCalculator.
Support single tensor input in TensorsToImageCalculator
Migrate TfLiteModelLoader to use MP Resources.
Remove deprecated TfLiteModelLoader::LoadFromPath.
Fix for isIOS() platform util on worker and non-worker contexts
Support single tensor input in TensorsToSegmentationCalculator
Makes CalculatorContext::GetGraphServiceManager() private
BatchMatMul can handle cases where ndims != 4 and quantization
RmsNorm has an optional scale parameter.
Allowed variable audio packet size by setting num_samples to null.
Fix technically correct but confusing example in top level comments.
Removing ReturnType helper, since it's part of the standard now.
Update XNNPack to 9/24
Enable LoRA conversion support for Gemma2-2B
Improve warning when InferenceCalculator backends are not linked
Bump MediaPipe version to 0.10.17.
Update OpenCV to a version that compiles with C++ 17
Force xnnpack when CPU inference is enforced
Install PyBind before TensorFlow to get the MediaPipe version
Change MP version to 0.10.18
Add validation to LLM bundler, alternative takePicture method to support custom thread executor, CopySign op, const Spec() method to OutputStreamManager, support for converting SRGBA ImageFrame to YUVImage, model configuration parameters for Gemma2-2B, support for converting SRGBA ImageFrame to YUVImage, model configuration parameters for Gemma2-2B, menu for the default demo app and option to Close processor/graph and Exit gracefully, ngrammer, per layer embeddings and Relu1p5 fields to llm_params and update from Proto, a special InMemory Resources (current use case is in tests, but may be needed for some simple things as well), ResourceProviderCalculator (replacement for LocalFileContentsCalculator), Resource support into TfliteModelCalculator and a flag to set the default number of XNNPACK threads.

MediaPipe Tasks update

This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.

Android

Initialize new members in LlmModelSettings
Create an implicit session for all requests to generateResponse()
Change session management so that all JNI calls come from the same thread.
Add Session API support to LLM Java API

iOS

Updated name of iOS audio classifier delegate
Fixed incorrect stream mode in iOS audio classifier options
Added method to ios audio task runner
Updated iOS audio classifier BUILD file
Fixed buffer length calculation in iOS MPPAudioData
Updated iOS audio data tests to fix issue in buffer length calculation
Revert "Added method for getting interleaved float32 pcm buffer from audio file"
Updated comments in iOS LlmInference
Dropped Refactored suffix for modified files in iOS genai
Updated documentation of LlmTaskRunner
Removed allocation of LlmInference Options
Updated the response generation queue to be serial in iOS LlmInference
Updated documentation of iOS LlmInference, documentation of LlmInference+Session
Fixed marking of response generation completed control flow in LlmInference+Session.
LlmInference.Options: remove unnecessary numOfSupportedLoraRanks parameter.
Add activation data type to LlmInference.Options.
Added more methods to iOS AVAudioPCMBuffer+TestUtils, few basic iOS audio classifier tests, options tests to iOS audio classifier, utils for AVAudioFile, test for score threshold to MPPAudioClassifierTests, constants in MPPAudioClassifierTests, close method to iOS audio classifier, iOS MPPAudioData test utils, stream mode tests for iOS audio classifier, iOS audio classifier to cocoapods build, audio record creation tests to MPPAudioClassifierTests, close method to MPPAudioEmbedder, iOS audio embedder tests, more utility methods to MPPAudioEmbedderTests, streams mode tests for iOS audio embedder, iOS audio embedder to cocoapods build, comments to MPPAudioClassifierTests, iOS audio embedder header and implementation, iOS audio classifier implementation file, method for getting interleaved float32 pcm buffer from audio file, refactored iOS LlmTaskRunner, iOS LlmSessionRunner, more errors to GenAiInferenceError, refactored LlmInference, iOS session runner to build files, extra safeguards for response context in LlmSessionRunner, LlmInference+Session.swift and documentation regarding session and inference life times to iOS LLM Inference.
Fixed issue with iOS audio embedder result parsing, iOS audio embedder options processing , index error in AVAudioFile+TestUtils, audio classifier result processing in stream mode, error handling in MPPAudioData, microphone recording issues in iOS MPPAudioRecord, documentation of iOS Audio Record, iOS audio record and audio data tests by avoiding audio engine running state checks and iOS audio embedder result helpers and bug due to simultaneous response generation calls across sessions.
Updated method signatures in iOS audio classifier tests
Fixed flow limiting in iOS audio classifier
Removed duplicate test from MPPAudioClassifierTests
Updated comments in AVAudioFile+TestUtils
Changed the name of iOS audio classifier async test helper
Update comment for LlmInference.Session.clone() method.
Marked inits unavailable in MPPFloatBuffer
Updated documentation of iOS audio record
Adds a LlmInference.Metrics for providing some key performance metrics ( initialization time, response generation time) of the LLM inference.
Removed unwanted imports from iOS audio data tests
Cleaned ios audio test utils BUILD file
Remove the activation data type from the Swift API. We don't expect users to set it directly.
Use seconds instead of milliseconds for latency metrics.

Javascript

Add comments to generateResponses method.
Migrate to ForEachLine to have a single source of truth for getting file contents lines.
Workaround for multi-output web LLM issue where last response can get corrupted when numResponses is odd.
Quick fix for wrong number of multi-outputs sometimes when streaming

Python

Add a flag in the converter config for generating fake weights. When it is set to true, all weights will be filled with zeros.
Update text embedder test to match the output after XNNPack upgrade.
Update remaining data in text embedder test to match the output after XNNPack upgrade.
Update the expected value of the text embedder test.
Add python pip deps to WORKSPACE
Fix pip_deps targets.

Model Maker changes

Undo dynamic sequence length for export_model api because it doesn't work with MediaPipe.
Replace mock with unittest.mock in model_maker tests.
Move tensorflow lite python calls to ai-edge-litert.

MediaPipe Dependencies

Update WASM files

Assets 2

30 Aug 19:09

ayushgdev

v0.10.15

e252e56

MediaPipe v0.10.15

Build changes

Fix unwanted dependency on GPU libraries.
Adds TwoTapFirFilterCalculator.
Add public visibility to graph_service headers.
Disable ASAN, TSAN and MSAN tests which take more than 10 minutes.

Framework and core calculator improvements

Update PointToForeign with an optional cleanup object.
Enable BeginLoopCalculator for move-only types (e.g. Tensor) without Packet::Consume usage and copyable types without copying unless it's a fundamental type.
Ensure proper release of resources in case of multiple AHWB reads.
Enables the configuration of GpuBufferPool options via GpuResources::Create();
Bugfix to correctly handle landmark projection in the non-square case.
add utility to wait for a sync (represented by FD)
Change a RET_CHECK to RET_CHECK_EQ
KinematicPathSolver: Avoid overshooting target
Introduce GetDefaultGpuExecutor(GpuResources) to allow executing all calculators on MP GPU thread.
No destruction for static ahwb_usage_track_.
Unbind framebufffer in Affine Transformation Runner GL
Move/isolate ahwb_usage_track_ into tensor_ahwb
Guard ahwb_tensor_track_ with mutex.
Add SidePacketConnectionTest
Update C++ Graph Builder to support executors and support input/output stream handlers.
Node::Input/OutputStreamHandler -> Node::SetInput/OutputStreamHandler
Add Packet::Share() method in replacement of SharedPtrWithPacket() function.
Default to high-performance power preference hint for WebGL contexts. For some computers with dual GPUs (like MBP2019), this will more frequently give us the higher performance GPU, which is generally preferable for most of our use cases (realtime rendering and ML), since speed is more critical than power consumption. If necessary, the user can override this setting by requesting their canvas' WebGL context manually before initializing the graph.
Introduce input_scale parameter to SpectogramCalculator.
Improve documentation of graph options
Add an option to PackMediaSequenceCalculator to add empty clip labels instead of ignoring them. This is useful when we want to distinguish processing errors from no-detections.
Updates language detection headers
Fix dangling error reporter pointer in memory mapped models
Fix for possible infinite stall using setOptions immediately before a loadLoraModel call.
Add relu1p5 op, abs op, Log op, mdspan and Lhs Broadcast Sub with test
Fix missing member move in Tensor class
Add support for single Tensor output streams for ImageToTensorCalculator.
Fix some compilation errors in WebGPU code. These changes are all minor.
Add single tensor output support to tensor_converter_calculator.
Replace QCHECK with ABSL_QCHECK and CHECK with ABSL_CHECK.
Fix a bug in TensorAHWB that triggers a crash with multiple delayed AHWB readers followed by a CPU reader.
Fixes an unnecessary allocation of GraphServiceManager in case it is adopted from the calculator context.
Fix triggering of DFATAL message.
Remove xnn_enable_avx512fp16=false from .bazelrc
Replace uses of TfLiteOperatorCreate with TfLiteOperatorCreateWithData
Compile with '--keep_going' in setup.py
Update ndk version so that our open source users get the best possible performance out of mediapipe.
Correct address of android ndk
Replace absl::make_unique with std::make_unique in tensor.cc and tensor_ahwb.cc.
LLM decode benchmarks fill the cache with a predefined number of tokens before starting decoding.
Add logic to drop the offending non-monotonically increasing timestamp in the MicrophoneHelper.
Make packet payload const.
Pass flag to indicate that consuming op may support prepacked GEMM.
Get timestamp from OpenCV VideoCapture after first frame is read.
Update XNNPack and cpuinfo
Update TensorFlow to 2024-07-18.
Remove deprecated TfLiteOperatorCreateWithData function
Add option to use shifted window in SpectrogramCalculator.
Move AhwbUsage struct and helper methods into a separate library.
Make fields in PacketGetter.Pair public.
The GraphProfiler my be destoried before the task executed in the executor.
Introduce flag in MicrophoneHelper to drop non-increasing timestamps.
llm_test - add batch size of 8 for BM_Llm_QCINT8/512/128
Add method to create MP Tensor from TfLite tensor specs
Refactors AHardwareBufferView class to be instantiated with a TensorAhwbUsage pointer.
Refactor LlmBuilder to have one graph
Add expected_seq_len param to ComputeLogits()
Fix mediapipe::file::Exists() for >2GB files on Windows.
Bump XNNPACK and KleidiAI versions.
Update MP demo app to acquire wake lock
Replace mediapipe::StatusOr with absl::StatusOr
Sync on ssbo_writte_ before mapping an AHWB to a CpuReadView.

MediaPipe Tasks update

This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.

Android

Bump targetSdkVersion to 34 throughout MediaPipe.

iOS

Updated documentation in iOS audio classifier
Added iOS holistic landmarker to vision framework build
Changed method name in MPPAudioClassifierResult
Added audio classifier options helpers
Added audio classifier result helpers
Added method to create audio record MPPAudioTaskRunner
Removed unused imports in MPPAudioTaskRunner
Added iOS audio embedder result, classifier result, classifier options, embedder options, embedder options helpers, classifier header and embedder result helpers
Add missing argument for num_draft_tokens.

Javascript

Set quantization bits for LoRA weight conversion to match those specified
Warn on adding packets to a closed input stream instead of silently dropping packets.
Enable experimental support for Chromium WGSL subgroups in LLM API, when available.
Support multi-response generation.

Python

Add prompt template to llm bundler.

Bug fixes

class_weights flag cuases a crash for multiclass case

Model Maker changes

Rename old BinaryAUC metric to BinarySparseAUC(used by text_classifier) and create a new BinaryAUC metric which does not expect sparse inputs.
Allow configuration of num_parallel_calls and cycle_length in hparams
Improve python code format.
Use tf.io.gfile.GFile for writing metadata file in image classifier.
Change SparsePrecision metric to BinarySparsePrecision metric, and same for SparseRecall->BinarySparseRecall in the core library. We only care about these metrics in the binary case, so this change makes the metric classnames more accurate for it's intended usage.
Support multilabel model training in text classifier
Create and add metrics for multi-class case
Support a customized best model monitor for multiclass cases

MediaPipe Dependencies

Update WASM files

Assets 2

13 May 17:40

ayushgdev

v0.10.14

4cf89a7

MediaPipe v0.10.14

Framework and core calculator improvements

Expose Lora ranks.
Update C API documentation to make it clear that the callback is invoked multiple times
Do not free response in PredictAsync callback
Enable usage of DRISHTI_PROFILING from non mediapipe namespaces.
Add model type to ImageGeneratorOptions.
Allow casting Stream->Stream

MediaPipe Tasks update

This section should highlight the changes that are done specifically for any platform and don't propagate to other platforms.

iOS

Added iOS audio data tests
Removed unused methods in AVAudioPCMBufferTestUtils
Added read at offset tests to MPPAudioRecordTests
Renamed property in MPPAudioData
Added iOS Audio Packet Creator
Added iOS audio running mode
Added iOS Packet Creator
Added iOS audio task runner
Updated documentation of MPPAudioPacketCreator

Javascript

Allow models to be uploaded via ReadableStreamDefaultReader
Allow all tasks to use a ReadableStreamDefaultReader
Expose Web LoRA API.
Raise WebGPU errors to JavaScript.
Update GenAI Experimental README
Update GenAI README

Python

Fixed result_callback() argument

MediaPipe Dependencies

Flatbuffers upgrade to 24.3.7
Update TF and FlatBuffer dependency to latest.

Assets 2

03 May 23:23

ayushgdev

v0.10.13

272c8d4

MediaPipe v0.10.13

Build changes

Make Holistic C++ graph public until we have a C++ API
Added a test image to tasks/testdata/vision
Update dependency in inference_calculator_metal to make it OSS compatible
Add build rule for gpt2_unicode_mapping_calculator to ODML repo
Update TF patch to match new version
Make model_asset_bundle_resources public

Framework and core calculator improvements

Added Interactive Segmenter C Tasks API and updated Image Segmenter + Pose Landmarker API/tests
Moved some utility functions used by the segmenter APIs to a shared test namespace
Added Face Stylizer C API
Add config options to disable default service in mediapipe vision tasks.
Fix race condition in GetCurrentThreadId
Update base Docker image to Ubuntu 22.04
Adding support for boolean tensor inputs to InferenceInterpreterDelegateRunner
Updates text_embedder_cosing_similarity signature to use Embedding pointers
Fix mediapipe/framework/packet.h build failure on C++20.
Finish allowing "direct Tensor" inputs and outputs in all InferenceCalculator variants.
Add SizeInTokens API to C layer
Remove dependency on "torch" for MediaPipe Python package
Add option for allowing cropping beyond image borders in ContentZoomingCalculator
Update XNNPACK
Add support for loading models from memory mapped files
InferenceCalculator: Add option to use mmap for model loading
Workaround the flaky status of XNN_FLAG_KEEP_DIMS
Add an HasError method and a test for ErrorReporter
Update cached_kernel_path option doc
Adds MemoryManager to several Tensor-generating calculators
Add support for mmapping models to more inference calculators
Removes InferenceRunner interface from InferenceCalculatorNodeImpl
ContentZoomingCalculator: Fix initial state for "last measured" rect
Reduce memory usage of LLM Web API
Propagate packet timestamps to Android surfaces
Make previous_log_index_ atomic and to fix some of race condition issues in Mediapipe Profiler
Adds MemoryManager to TensorConverter Calculator
Fix template_parser's crash when destructing stowed_messages_ for proto3.
ContentZoomingCalculator: Don't clamp when allow_cropping_outside_frame is set
Refactor Metal path out of TensorsToSegmentationCalculator main file.
Update Protobuf dependency to 4.x
Expand AssetManager docs to provide JNI initialization method and proper usage patter through GetResourceContents.
Enables reordering of input and output tensor streams in InferenceCalculators
Add CopyCpuInputIntoTfLiteTensor
Add CopyTfLiteTensorIntoCpuOutput
Add int64_t to MP tensor.
Make it clear sinks should outlive graph initialized with the corresponding config
Update initializeNativeAssetManager docs - singleton + MediaPipe usage
Deprecated ImageSource in favor of standard TexImageSource.
Add support for additional tensor_data_type for tensor conversion calculators.
Fix TensorsToSegmentationConverterMetal RunInGlContext().
Change the naming of converters of ImageToTensorCalculator and TensorsToSegmentationCalculator.
Update WebGL2 on OffscreenCanvas support check to include Safari 17+
Use TextFormat for serialization
Add IsConnected() to graph builder SideOut
Use "ahwb" prefix for "release_callback" to disambiguate ahwb vs. non ahwb callbacks.
Allow multiple AHWB release callbacks.
Add itemized loop calculators
Add support for a Vector string packet to the constant_side_packet_calculator.
Fix an issue in BeginItemLoopCalculator
Allow arbitrary timestamp changes in BeginItemLoopCalculator
Add unsigned int type to Mediapipe-Web binding.
Add the ability to load a drishti graph template from a byte array.
Add error handling to CreateSesssion in C API
Report received dims size in the error.
Adds conditional TFLITE_CONDITIONAL_NAMESPACE namespace to .cc implementations
Adds support for tensor scalar output to VectorIntToTensorCalculator.
Parse num classes per detection from TFLite_Detection_PostProcess op.
Output error status int in case AHWB allocation fails.
Support more types for inference_calculator_util tensor copying functions.
Upgrade TensorFlow
Fix ASAN error by removing tensor data filling for kNone in test.
Added warning when MultiPoolOptions.keep_count is reached
Updated the safetensor converter to support Gemma 7B mode

MediaPipe Tasks update

This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.

Android

Add sizeInTokens API to the Java LLM Inference Engine
Set a default empty lora path for LlmInference.

iOS

Updated iOS vision task runner to support tasks without norm rect stream
Added iOS holistic landmarker result helpers and implementation
Add async stream API to LlmInference for better Swift compatibility.
Remove duplicate symbols from MediaPipeTasksGenAIC
Apply iOS build fixes
Revert avoid_deps in from MediaPipeTasksGenAI_framework and MediaPipeTasksGenAIC_framework.
Fixed missing method in iOS vision task runner
Fixed condition check in MPPVisionTaskRunner
Fixed incorrect types in MPPHolisticLandmarkerResult
Added init with proto utility to MPPHolisticLandmarkerResult
Added MPPHolisticLandmarker helper for initialization from protobuf text file
Added Holistic Landmarker Objective C Tests
Added size in tokens API to iOS LlmInference
Fixed type of holistic landmarker pose segmentation mask
Added optional initialization of face blendshapes from protobuf file
Added video mode and option tests to MPPHolisticLandmarker tests
Updated documentation of MPPHolisticLandmarkerResult+Helpers.h
Added iOS face stylizer implementation, options helpers, Result Helpers
Updated iOS MPPImage Utils to support creation of output images from C++ RGB images
Updated constants in MPPImage+Utils
Added iOS Face Stylizer tests
Updated documentation of iOS MPPFaceStylizer
Added missing connections to iOS Pose Landmarker
Added iOS MPPFloatBuffer, MPPFloatRingBuffer, ring buffer tests, MPPAudioData
Update swift name of MPPAudioDataFormat
Added live stream mode tests to iOS holistic landmarker
Updated method signature in MPPHolisticLandmarkerResult+Helpers
Added test for nil modelPath to face stylizer
Fixed memory deallocation issues when creating images using MPPImage+Utils
Exposed iOS Face Stylizer headers in xcframework build
Added iOS MPPAudioRecord
Updated method signature in MPPFloatRingBuffer
Added audio error codes to MPPCommon.h
Move MPPAudioDataFormat to a new file
Updated method signature in MPPAudioRecord
Added test utils for AVAudioPCMBuffer
Added basic failure tests for MPPAudioRecord
Add support for static LoRA on iOS.
Updated AVAudioPCMBuffer convenience initializer to a class method
Fix LlmTaskRunner.swift
Added buffer loading tests to MPPAudioRecord
Added method to load from audio record in MPPAudioData
Fix LoRA integration in LlmInference.swift
Add error handling to GenAI's Swift API

Javascript

Add export to GenAI Fileset API
Return the full string from the model
Fix code snippet in NPM Readme
Add Holistic Landmarker to NPM Readme
Update npm README
Add tokenizer normalization node to LLM web graph.
Add Matrix to vision.d.ts

Python

Fix API documentation link for ImageProcessing Options
Disable text_embedder and text_classifier tests for Python
Update safetensors converter for LoRA weights conversion for GEMMA 2B.
Update model converter to support Phi-2 LoRA
Mark Optional for landmark_drawing_spec argument
Add LoRA options to converter.

Model Maker changes

Keep tensorflow and tf-models-official to be <2.16. tensorflow-addons breaks with tensorflow 2.16.
Read from default checkpoint path when training MobileBERT.
Add checkpoint_frequency in model maker.
Add repeat field in hyperparameters in model maker classifier.
add mobilenet_v2 keras model spec.
Only use auc, precision, and recall for binary classification problems.
Drop remainder from datasets in text_classifier. This helps deal with issues on TPU training that results in NaN loss.
Disable object detector oss test due to flakiness

MediaPipe Dependencies

Update WASM files for 0.10.13 release
Update WASM files to fix issues in the LLM Inference API

Assets 2

08 Mar 05:30

ayushgdev

v0.10.11

98942d5

MediaPipe v0.10.11

Build changes

Updated genai C package visibility

Framework and core calculator improvements

Prevent UnpackMediaSequenceCalculator from segfaulting on a type of malformed input.
Updated import statements of llm_inference_engine.h to support C
Refactor OpenGL 3.1 path out of TensorsToSegmentationCalculator main file.
Add 'addRawDataSpanToInputSidePacket' and addRawDataSpanToInputStream binding functions.
Update DotAttention interface to take SelfAttentionWeights
Remove customized DotAttention
Update Tensorflow dependency to latest release
Update InferenceCalculator documentation on DELEGATE side input.

MediaPipe Tasks update

This section should highlight the changes that are done specifically for any platform and don't propagate to other platforms.

Android

Add LlmInference stats logging
Changed max_sequence_length naming to max_tokens

iOS

Added iOS LlmInferenceError, LlmInference, LlmTaskRunner
Removed init with default params in LlmInferenceOptions
Updated iOS task runner to delete C LLM Session on deallocation
Updated variable names in iOS LlmTaskRunner
Updated access of C LLMSession to fileprivate
Updated access of some constants in iOS LlmInference
Removed unwanted iOS BUILD targets
Added iOS Gen AI build scripts
Added iOS Gen AI files
Updated parameter names in LlmSessionConfig create
Added asynchronous predict and generate function to iOS LLM Task Runner
Removed decoded response from iOS LlmTaskRunner
Updated iOS error enum cases
Updated response generation state logic in iOS LlmInference
Fixed error handling in iOS LlmInference
Updated error message in iOS GenAiInferenceError
Fixed unitialized response array in iOS LlmTaskRunner
Added podspec templates of the iOS Gen AI framework

Javascript

Add visibility field in landmark.d.ts
Update landmark_result.ts with visibility support

Python

Register model_ckpt_util in Python framework
Expose HolisticLandmarker module as other modules
Create empty module if ENABLE_ODML_CONVERTER is not set
Optimized memory usage for conversion script

Model Maker changes

Remove jax and torch from model maker requirements.txt
Use tf model optimization < 0.8.0 due to tf.keras and tf_keras compat issues

MediaPipe Dependencies

Update WASM files for 0.10.11 release

Assets 2

23 Feb 06:25

ayushgdev

v0.10.10

d2bc9e5

MediaPipe v0.10.10

Build changes

Fix TensorsToSegmentationCalculator gpu dependencies.
Open Source build rules for quantization_util
Added the binary the converter factory to run the model weight conversion.
Integrates the kMemoryManagerService into ImageToTensorCalculator and InferenceCalculatorDarwinn.
Updated iOS OpenCV source build to exclude highgui and videoio
Open source some BUILD rules for Converter package

Framework and core calculator improvements

Added Face Landmarker C Tasks API and tests
Added Pose Landmarker C Tasks API
Use memcpy now for copying data and indicate how the data is stored
Remove superfluous glFlush().
Added Face Detector C Tasks API
Add mediapipe::file::IsDirectory helper
Deprecate ImageFrame::ByteDepth
Added files for the Image Segmenter C Tasks API
Add general support for PathToResourceAsFile to TfLiteModelLoader
Add CalculatorGraph::SetErrorCallback to receive errors in case of async graph use cases.
Add JAX as requirements for MediaPipe python package
Introduces HardwareBufferPool based on the ReusablePool and MultiPool
Added the base classes for the LLM weight converter.
Add stdbool import to C API
Introduces MemoryManagerService with HardwareBufferPool and integrates it into the Tensor class.
Added the model writer that writes to the weight binary files.
Fix GlContext (attachments) cleanup in case of a failing GlContext initialization.
Add option for using variable XNNPACK operators to MediaPipe XNNPACK flags
Make InferenceCalculatorDarwinn support float and int32 as input data type.
Adds VectorToTensorCalculator
Enable HardwareBufferPool only if MEDIAPIPE_TENSOR_USE_AHWB is enabled
Enables MultiPool and ReusablePool to pass on absl::Status returns originating from object factory methods.
Adding TENSOR to InferenceCalculatorCpu to remove vector encumbrances
Update Clang to version 16
Add ability to preserve output format to GlScalerCalculator
Add explicit depedency on XNNPACK & cpuinfo
Update TensorFlow and Android NDK dependency
Adds MEDIAPIPE_ANDROID_LINK_NATIVE_WINDOW condition to hardware_buffer_android
Support interpolate flags in image_to_tensor_converter_opencv.
Add MODEL_VIEW side input to tflite_model_calculator

MediaPipe Tasks update

This section should highlight the changes that are done specifically for any platform and don't propagate to other platforms.

Android

Added HolisticLandmarker
Migrate TextGenerator Java API to C Wrapper
Add LlmTaskRunner to TextGenerator sources
Don't cache the JNI environment for async calls
Simplified api interface
Updates to LLM JNI Layer
Handle model loading on Android
Support custom cache dir.
Pass cacheDir to LLM engine from Java API
Add "done" field to the Java LLM API
Removed backend option from JNI layer

iOS

Updated supported pixel formats in iOS image classifier Documentation
Removed support for CVPixelBuffer of type 32RGBA
Added support for creating CVPixelBuffer from C++ Images to iOS MPPImage Utils
Updated implementation of MPPImage Utils to reduce lines of code
Added iOS interactive segmenter options, helpers, implmentation and basic tests
Added iOS Image Embedder API
Enabled stream mode on iOS pose landmarker
Fixed issue with iOS Language Detector Prediction Count
Added iOS language detector to Cocoapods build
Added cosine similarity method to iOS MPPImagEmbedder
Updated method signature of MPPImageEmbedderResult initializer
Added packet validation in MPPImageEmbedderResult+Helpers
Added tests for creating MPPImage with source type UIImage from C++ Image
Renamed methods in MPPImageUtilsTests
Added a new class for iOS Interctive Segmenter Results
Added iOS interactive segmenter to cocoapods build
Added method for initializing MPPImages of all source types from MPPImage+TestUtils
Added support for creating MPImages of sample buffer source type from C++ Images in MPPImage+Utils
Added provision to initialize MPImages of source type sample buffer to MPImage+TestUtils
Added tests for initialization of MPImages with source type sample buffer from MPImage+Utils
Updated documentation of iOS hand landmarker, image embedder, image segmenter, interactive segmenter, object detector
Added provision to create graph config from task options that use any proto for iOS tasks
Added new methods to MPPTaskOptionsProtocol.h
Updated iOS task runner to initialize tasks using MPPTaskInfo
Updated iOS text task runner to initialize tasks from task Info.
Updated iOS vision task runner to use new methods from MPPTaskRunner
Updated MPPTaskOptionsProtocol
Updated MPPTaskRunner initializer
Fixed iOS framework conflicts with TensorFlowLiteC and OpenCV CocoaPods
Fixed issue in installing iOS tasks text and vision libraries in a single project

Javascript

Extend verifyGraph to be compatible with proto3.
Add Holistic Landmarker Web API
Add export declarations to PoseLandmakerResult
TypeScript: adding VideoFrame typings support to video input
Guard WaitOnGpu with extra OpenGL checks.
Explicitly cast at callsite of WebGL context creation to avoid compilation errors with newer Emscripten versions.

Python

Added Holistic Landmarker Python API
Support both proto2 and proto3 in task subgraph options configuration, and revised the Holistic Landmarker API's implementation
Update holistic_landmarker.py
Documented HolisticLandmarker
Fixing delegate passing argument in BaseOptions
Add model_ckpt_util to Python Build script
Use pybind_library for GenAI Converter build

MediaPipe Dependencies

Expose MediaPipe's ABSL and Sentencepiece as shared dependencies
Remove Sentencepiece's LOG function
Removed unwanted headers from opencv_ios_xcfraemworl_files.bzl
Update WASM files for 0.10.10 release
Upgrade TypeScript to 5.3.3

Assets 2

13 Dec 13:46

ayushgdev

v0.10.9

4237b76

MediaPipe v0.10.9

Build changes

Add libtext and libvision build rules
Add lib targets for all C vision tasks

Framework and core calculator improvements

Added files for the Image Embedder C API and tests
Pass Model Asset Buffer as byte array + length
Drop default arguments in C API
Updated the Image Embedder C API and added tests for cosine similarity
Drop default arguments in Image Embedder C API
Remove const from input types of C API
Resolved issues and added a common header to hold all the necessary structures for the vision tasks
Refactor OpenCV path out of TensorsToSegmentationCalculator main file.
Add some convenience getters to EglManager.
Added files for the Object Detector C Tasks API
Explicitly delete some copy operations to improve compile errors.
Extract CPU conversion methods into a separate library & add test
Updated components and their tests in the C Tasks API
Ensure that releaseGl() is called if prepapreGl throws
Adding a GpuTestWithParamBase test class to support value parameterized tests
Added Gesture Recognizer C API and tests
Holistic Landmarker C++ Graph
Revised Gesture Recognizer API implementation and associated tests
Added FreeMemory test for GestureRecognizerResult
Refactor GestureRecognizerResult conversion for default initialization
Move LanguageDetectorResult converter to LanguageDetector task
Add TensorsToSegmentationCalculator test utilities.
Added Hand Landmarker C Tasks API and tests
Export java package for hand_roi_refinement_graph_options.
Fix naming in different files
Create an explicit GlRuntimeException class

MediaPipe Tasks update

This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.

Android

Create shared utilities to construct category lists
Create shared utilities to construct landmark lists
Add the result class for the HolisticLandmarker Java API
HolisticLandmarker Java API
Add dependency on hand_roi_refinement_graph_options_proto
Use Java Proto Lite Target for Hand ROI Refinement proto
Move hand_roi_refinement_graph_options_java_proto_lite to vision lib

iOS

Added iOS MPPPoseLandmarker.mm
Added null check for segmentation masks in pose landmarker helper initializer
Added pose landmarker protobuf utils
Fixed graph name in iOS language detector
Added iOS language detector tests
Added iOS Objective C Pose Landmarker Tests
Added iOS interactive segmenter options
Added iOS region of interest
Added iOS region of interest helpers
Updated iOS vision/core to add methods for processing region of interest
Added iOS interactive segmenter header

Javascript

Creates GpuBuffers around pre-allocated AHardware_Buffer objects.
Add drawConfidenceMask() to our public API
Use gl.LINEAR interpolation for confidence masks
Add missing export declarations to DrawingUtils

Python

Example updated for mp.Image in documentation
Added image classifier benchmark
Updated copyright
Documented the return value and added percentile to argparser
Allowed a default value for the model argument
Added more benchmark scripts for the Tasks Python API
Code cleanup and revised benchmarking API
Removed unused param

Model Maker changes

Add option to omit the checkpoint callback in text classifier.
Add BinaryAUC metric and Best Checkpoint callback to Text Classifier
Remove batch dimension from the output of tflite_with_tokenizer in text classifier.

Assets 2

09 Nov 19:02

ayushgdev

v0.10.8

1c46e43

MediaPipe v0.10.8

Build changes

Allow Python to be build on Mac with GPU support

Bazel changes

Adds an empty skeleton project for iOS docgen.
Remove pinned versions from deps
Added files for the Language Detector C API and tests
Add OnCameraBoundListener and support for landscape orientation to CameraXPreviewHelper
Removed language_detection_result and moved the necessary containers to language_detector.h
Detection postprocessing support quantized tensor.
Adding vector versions of input calls to TS GraphRunner API
Introduce AlignHandToPoseInWorldCalculator
Add check to avoid doing illegal memory access from an invalid iterator from std::prev()
GPU_ORIGIN configurable through base options proto.
Introduce FixGraphBackEdges utils function.
Migrate ParseTagAndName to use absl::string_view
Plumb an optional default Executor and set of input side packets
Add implementation and tests for Image Classifier C API
Allow GPU Origin Proto to be build by Maven
Add a field to GPUBuffer C struct so FFIGen can handle it
Add scaling support to surface view renderer.
Remove objc_library from Python build path for Mac GPU build
Fix internal incensistency in parsing code
Add CPU tests for TensorsToSegmentationCalculator
Speed up Python build by only building binary graph
Don't drop status message in ConvertFromImageFrame
Use designated initializers for TensorsToSegmentationCalculator tests.
Adding two new immutable texture GpuBufferFormat types
TensorsToDetectionsCalculator supports multi clasees for a bbox.
Move filtering logic of score to ConvertToDetection.
Add video and live stream processing and tests for Image Classifier C API
Upgrade to use Gradle 8.4
Add AT_FIRST_TICK processing to SidePacketToStreamCalculator.

MediaPipe Tasks update

This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.

Android

Add GPU Origin proto to Java Tasks Library

iOS

Added iOS Pose Landmarker result, options and helpers
Added iOS language detector options, results, and helpers
Added property to get labels from iOS Image Segmenter
Added a test for getting labels from iOS image segmenter
Updated iOS Image Segmenter documentation to use Swift names
Added pose landmarker result helpers
Added iOS pose landmarks connections
Added iOS pose landmarker header
Updated documentation
Added language detector result helpers
Added iOS language detector implementation
Fixed extra condition check in iOS Image Segmenter Result Helper
Added iOS Image Segmenter to CocoaPods build
Fixed deletion of iOS output MPImage buffer in MPImage Utils
Added GPU support

Javascript

Add drawCategoryMask() to our public API
Creates GpuBuffers around pre-allocated AHardware_Buffer objects.
Allow OffscreenCanvas to be used by DrawingUtils

Python

Added files for Face Stylizer Unit Tests
Allow Mac to use GPU Delegate
Use mp.ImageFormat instead of just ImageFormat
Support 3-channel RGB images for Mac Python
Added GPU support on Mac and Linux

Dependency changes

Update WASM files for 0.10.8 relese

Assets 2

10 Oct 05:53

ayushgdev

v0.10.7

69fe645

MediaPipe v0.10.7

Framework and core calculator improvements

Fix win32 build break in mediapipe.
Remove 'awaiting' labels when user issue/PR updated.
Fix glScalerCalculator not clearing background in FIT mode
Add cc_binary target for C Libraries
Only recreate immutable texture when necessary for Android TensorsToSegmentationCalculator.
Update PackMediaSequenceCalculator to support index feature inputs on the CLIP_MEDIA_ input tag.
Added concatenate stream, get_vector_item stream, landmarks_to_tensor stream, tensor_to_joints stream utility function.
Introduce TensorToJointsCalculator and LandmarksTransformationCalculator
smoothing stream utility function.
Don't convert nullptr to std::string in C layer
Fix memory access issue in C layer
segmentation smoothing stream utility function.
Populate the classification result output param instead of a copy
Add tests for C API containers
Add unit tests for C layer for the input types of Text Classifier
Add End to End test for Text Classifier C API
Add error handling to C API
Added files for the TextEmbedder C API and tests
See memory of freed result to nullptr
Smooth pose landmarks
GlSurfaceViewRenderer: Capture graph output texture
Prefix status macro implementation with MP_.
Introduce CombineJointsCalculator and SetJointsVisibilityCalculator
Add stream API presence utils.
Fixed some issues with documentation
Add stream API merge utils.
Update glog to latest commit

MediaPipe Tasks update

This section should highlight the changes that are done specifically for any platform and don't propagate to
other platforms.

Android

Do not convert milliseconds to microseconds twice
Fix bug missing SHOW_RESULT in image generator
Fix depth condition bug when only depth condition is configured.

iOS

Added iOS face stylizer result, options and header
Added iOS MPPFileInfo for tests
Added new initializers for iOS MPPImage in test utils
Added iOS MPPMask test utils
Added iOS image segmenter basic Objective C tests
Updated multiply function in iOS Image Segmenter tests to use C++ vectors
Fixed premature deallocation of C++ masks in iOS Image Segmenter
Updated interface of iOS image segmenter
Added selfie segmentation and running mode tests to image segmenter
Uncommented live stream test in iOS image segmenter tests
Updated iOS Face Detector Objective C API names
Updated iOS Face Landmarker,hand landmarker,Object Detector Objective C API names
Added iOS Image Segmenter tests for methods with completion handlers
Added methods to create iOS MPImage with source type UIImage from a C++ image.
Changed de-allocation method in data provider release callback
Fixed error messages
Updated error messages in MPPImage Utils

Javascript

Add helper to create Connection array
Add export declaration for FaceDetector
Add export declaration to FaceDetector.detect()
Do not use full filename when FileLocator decides which asset to load

Bug fixes

Fixed Pose Landmarker jittering issue

Model Maker changes

Add export_model_with_tokenizer to Text Classifier API.

MediaPipe Dependencies

Update WASM files for 0.10.7 release

Assets 2

Releases: google-ai-edge/mediapipe

MediaPipe v0.10.20

Build changes

Framework and core calculator improvements

MediaPipe Tasks update

Android

iOS

Javascript

Python

Model Maker changes

MediaPipe Dependencies

MediaPipe v0.10.18

Build changes

Framework and core calculator improvements

MediaPipe Tasks update

Android

iOS

Javascript

Python

Model Maker changes

MediaPipe Dependencies

MediaPipe v0.10.15

​Build changes

Framework and core calculator improvements

MediaPipe Tasks update

Android

iOS

Javascript

Python

Bug fixes

Model Maker changes

MediaPipe Dependencies

MediaPipe v0.10.14

Framework and core calculator improvements

MediaPipe Tasks update

iOS

Javascript

Python

MediaPipe Dependencies

MediaPipe v0.10.13

Build changes

Framework and core calculator improvements

MediaPipe Tasks update

Android

iOS

Javascript

Python

Model Maker changes

MediaPipe Dependencies

MediaPipe v0.10.11

Build changes

Framework and core calculator improvements

MediaPipe Tasks update

Android

iOS

Javascript

Python

Model Maker changes

MediaPipe Dependencies

MediaPipe v0.10.10

Build changes

Framework and core calculator improvements

MediaPipe Tasks update

Android

iOS

Javascript

Python

MediaPipe Dependencies

MediaPipe v0.10.9

Build changes

Framework and core calculator improvements

MediaPipe Tasks update

Android

iOS

Javascript

Python

Model Maker changes

MediaPipe v0.10.8

Build changes

Bazel changes

Build changes