Improve AI Transcripts Accuracy with Automated Post-Processing #122

kouloumos · 2024-07-24T09:47:43Z

Our transcriber tool processes source material to produce AI-generated transcripts using various transcription services (mainly Deepgram). While these AI transcripts are highly accurate, achieving around 90% accuracy, they still require human review to reach near-perfect accuracy, especially given the technical nature of Bitcoin-related content.

We have observed common AI transcription errors through our review process. We took a first step to address this by creating the style guide with bitcointranscripts/bitcointranscripts#489. The next step is to create a machine-readable JSON format that includes these common mistakes, allowing us to correct them during post-processing.

Steps to Implement:

Create JSON Format: Develop a JSON format to list common AI transcription errors and their corrections.
Post-Processing Logic: Implement logic to use this JSON file during post-processing to automatically fix known errors.
Autogenerate Error List: After the initial implementation, enhance the system to autogenerate this list based on previous corrections. This can be achieved by comparing AI-generated transcripts with the final reviewed versions stored in source control.

This approach will help us improve the accuracy of AI transcripts and reduce the workload for human reviewers.

kouloumos added this to The Bitcoin Development Project Roadmap Jul 29, 2024

kouloumos moved this to 🏗 In Progress in The Bitcoin Development Project Roadmap Jul 29, 2024

kouloumos self-assigned this Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve AI Transcripts Accuracy with Automated Post-Processing #122

Improve AI Transcripts Accuracy with Automated Post-Processing #122

kouloumos commented Jul 24, 2024

Improve AI Transcripts Accuracy with Automated Post-Processing #122

Improve AI Transcripts Accuracy with Automated Post-Processing #122

Comments

kouloumos commented Jul 24, 2024