YALR - Visual Lip Reading with AI

YALR (Yet Another Lip Reader) is a computer vision–based lip reading system for sentence-level speech recognition from visual input only. It combines MediaPipe-based mouth ROI extraction with a pretrained AV-HuBERT model and evaluates its applicability to real-world scenarios. The project explores the practical challenges of visual-only speech recognition, including viseme ambiguity, non-labial sounds, and real-world recording conditions, and includes a web-based demonstrator with video transcription.

Installation Guide

Requirements

Ubuntu (20.04 / 22.04 recommended)
Python 3.10
Node.js 22

Clone required Repositories

git clone https://github.com/ricgoe/YALR.git
cd YALR
git submodule update --init

Python Setup

Install Python 3.10 (Ubuntu)

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.10 python3.10-dev python3.10-venv python3.10-distutils build-essential ffmpeg

Create Virtual Environment

python3.10 -m venv .venv

Activate Virtual Environment

source .venv/bin/activate

Downgrade pip

pip install pip==24

Install Dependencies

pip install -r requirements.txt

Frontend Setup (Node.js 22 via nvm)

Install nvm

curl -fsSL https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

Verify nvm installation after shell reload:

nvm --version

Install Node.js 22

nvm install 22
nvm use 22
cd ./frontend && npm install

Verify installation:

node -v
npm -v

Usage

Important

It is necessary to use two terminal instances (one for frontend, one for backend)

Inside Backend Terminal

cd YALR
uvicorn api:app --host 0.0.0.0

Inside Frontend Terminal

cd YALR/frontend
npm run dev

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
av_hubert @ 258fb50		av_hubert @ 258fb50
backend		backend
data/misc		data/misc
documentation/typst		documentation/typst
frontend		frontend
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
api.py		api.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YALR - Visual Lip Reading with AI

Installation Guide

Requirements

Clone required Repositories

Python Setup

Install Python 3.10 (Ubuntu)

Create Virtual Environment

Activate Virtual Environment

Downgrade pip

Install Dependencies

Frontend Setup (Node.js 22 via nvm)

Install nvm

Install Node.js 22

Usage

Inside Backend Terminal

Inside Frontend Terminal

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ricgoe/YALR

Folders and files

Latest commit

History

Repository files navigation

YALR - Visual Lip Reading with AI

Installation Guide

Requirements

Clone required Repositories

Python Setup

Install Python 3.10 (Ubuntu)

Create Virtual Environment

Activate Virtual Environment

Downgrade pip

Install Dependencies

Frontend Setup (Node.js 22 via nvm)

Install nvm

Install Node.js 22

Usage

Inside Backend Terminal

Inside Frontend Terminal

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages