Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve AI Transcripts Accuracy with Automated Post-Processing #122

Open
kouloumos opened this issue Jul 24, 2024 · 0 comments
Open

Improve AI Transcripts Accuracy with Automated Post-Processing #122

kouloumos opened this issue Jul 24, 2024 · 0 comments
Assignees

Comments

@kouloumos
Copy link
Member

Our transcriber tool processes source material to produce AI-generated transcripts using various transcription services (mainly Deepgram). While these AI transcripts are highly accurate, achieving around 90% accuracy, they still require human review to reach near-perfect accuracy, especially given the technical nature of Bitcoin-related content.

We have observed common AI transcription errors through our review process. We took a first step to address this by creating the style guide with bitcointranscripts/bitcointranscripts#489. The next step is to create a machine-readable JSON format that includes these common mistakes, allowing us to correct them during post-processing.

Steps to Implement:

  1. Create JSON Format: Develop a JSON format to list common AI transcription errors and their corrections.
  2. Post-Processing Logic: Implement logic to use this JSON file during post-processing to automatically fix known errors.
  3. Autogenerate Error List: After the initial implementation, enhance the system to autogenerate this list based on previous corrections. This can be achieved by comparing AI-generated transcripts with the final reviewed versions stored in source control.

This approach will help us improve the accuracy of AI transcripts and reduce the workload for human reviewers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant