Skip to content

ricgoe/YALR

Repository files navigation

YALR - Visual Lip Reading with AI

pipeline

YALR (Yet Another Lip Reader) is a computer vision–based lip reading system for sentence-level speech recognition from visual input only. It combines MediaPipe-based mouth ROI extraction with a pretrained AV-HuBERT model and evaluates its applicability to real-world scenarios. The project explores the practical challenges of visual-only speech recognition, including viseme ambiguity, non-labial sounds, and real-world recording conditions, and includes a web-based demonstrator with video transcription.

Installation Guide

Requirements

  • Ubuntu (20.04 / 22.04 recommended)
  • Python 3.10
  • Node.js 22

Clone required Repositories

git clone https://github.com/ricgoe/YALR.git
cd YALR
git submodule update --init

Python Setup

Install Python 3.10 (Ubuntu)

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.10 python3.10-dev python3.10-venv python3.10-distutils build-essential ffmpeg

Create Virtual Environment

python3.10 -m venv .venv

Activate Virtual Environment

source .venv/bin/activate

Downgrade pip

pip install pip==24

Install Dependencies

pip install -r requirements.txt

Frontend Setup (Node.js 22 via nvm)

Install nvm

curl -fsSL https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

Verify nvm installation after shell reload:

nvm --version

Install Node.js 22

nvm install 22
nvm use 22
cd ./frontend && npm install

Verify installation:

node -v
npm -v

Usage

web_based

Important

It is necessary to use two terminal instances (one for frontend, one for backend)

Inside Backend Terminal

cd YALR
uvicorn api:app --host 0.0.0.0

Inside Frontend Terminal

cd YALR/frontend
npm run dev

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •