streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL
-
Updated
Jan 14, 2025 - Python
streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL
Tag manager and captioner for image datasets
Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.
VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 Vision-Language Model. Includes a Gradio-based interface for querying and analyzing video footage.
AI-Powered Watermark Remover using Florence-2 and LaMA Models: A Python application leveraging state-of-the-art deep learning models to effectively remove watermarks from images with a user-friendly PyQt6 interface.
Use Florence 2 to auto-label data for use in training fine-tuned object detection models.
Rem-WM, a powerful watermark remover tool that leverages the capabilities of Microsoft Florence and Lama Cleaner models.
A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.
Florence-2 is a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
vision language models finetuning notebooks & use cases (paligemma - florence .....)
Run SOTA Vision-Language Model Florence-2 on your data!
Simple Video Summarization using Text-to-Segment Anything (Florence2 + SAM2) This project provides a video processing tool that utilizes advanced AI models, specifically Florence2 and SAM2, to detect and segment specific objects or activities in a video based on textual descriptions.
Simple Gradio application integrated with Hugging Face Multimodals to support visual question answering chatbot and more features
Local LLM Discord Bot
The Ultimate Local LLM Discord Bot!!!
ecko-cli is a simple CLI tool that streamlines the process of processing images in a directory, generating captions, and saving them as text files. Additionally, it provides functionalities to create a JSONL file from images in the directory you specify. Images will be captioned using the Microsoft Florence-2-large model and ONNX
TextSnap: Demo for Florence 2 model used in OCR tasks to extract and visualize text from images.
Video Synopsis: Intelligent Video Object Summarization using Florence/OWL-ViT and SAM. It uses OWL-ViT or Florence 2 for object detection, SAM for segmentation, and a custom video synopsis algorithm to produce optimized outputs.
Microsoft の軽量VLMのFlorence-2のColaboratory上でのサンプル
Add a description, image, and links to the florence-2 topic page so that developers can more easily learn about it.
To associate your repository with the florence-2 topic, visit your repo's landing page and select "manage topics."