This project evaluates various Speech-to-Text (STT) configurations for both streaming and batch (offline) transcription, testing different model variants, container environments, model serving frameworks, deployment platforms, and hardware configurations.
Two fold:
- Demonstrate how to move from individual experimentation to operational enterprise scale.
- Collect benchmarking data to guide architectural and operational decisions on model performance, system efficiency, and container security.
- START HERE -> Crawl README — Experiment locally using Ubuntu and UBI9-minimal containers on a single server.
- Walk README — Scale from a single server to a Kubernetes cluster.
- Run README — Shift from embedded inference to decoupled model serving.
- Sprint — Integrate Model Registries and optimize for production-scale serving.
Along the journey, we introduce automation and answer common performance, security, and scalability questions.
Execute benchmarking to capture metrics from:
- Models: OpenAI Whisper
- Containers: Ubuntu, UBI9-minimal
- Platforms: Linux, Kubernetes
- Model Servers: OpenAI Whisper, vLLM
- CPUs: Intel Cascade Lake, AWS Graviton3, AMD EPYC, Intel Sapphire Rapids
- GPUs: T4, L4, A10, H100
- Instance Types: g4dn.12xlarge, g6.12xlarge, g5.12xlarge, p5.48xlarge
- Command Modes: basic, hyperparameters
- Start Modes: cold, warm
- Harvard.txt
- JFK Inaugural Address (official transcript)
- JFK Rice University Speech (official transcript)
whisper-functional-batch-metrics.sh- Batch Transcription Benchmarkingcompare_transcripts.py- Accuracy Metrics Scoringsystem_non_functional_monitoring.py- System Resource Monitoringcleanup-benchmark-results.sh- Benchmark Workspace Cleanup