Skip to content
/ transcribe Public

A Swift parser for output files from automated transcription services. An experiment inspired by the Swift Community Podcast.

License

Notifications You must be signed in to change notification settings

ole/transcribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transcribe

A Swift parser for output files from automated transcription services.

Created by Ole Begemann, January 2019.

Status

Very unstable and incomplete. I wrote this as an experiment to transcribe episode 1 of the Swift Community Podcast.

Supported File Formats

The only supported input file format is the JSON produced by the Amazon Transcribe API. It should be possible to add support for other formats (such as Google Cloud Speech-to-Text) and come up with a universal data structure that understands multiple input formats.

Requirements

Swift 4.2. I only tested on macOS 10.14, but it should also run on other platforms supported by Swift.

Dependencies

None.

Components

The package consists of two targets:

  • Transcribe: the library that implements the parsing of transcription files and conversion to other formats.
  • TranscribeCLI: a command line tool that exposes some of the Transcribe functionality on the command line.

Usage

If you want to create a new transcription, you must first create and run a transcription job on Amazon Transcribe API. This step is not part of this tool. When the transcription job completes, Amazon Transcribe will provide you with a JSON file with the transcription results. This file can be used as the input for this tool.

In Code

To use the library in a SwiftPM package, add this to your Package.swift:

let package = Package(
    ...
    dependencies: [
        .package(url: "https://github.com/ole/transcribe", .branch("master")),
    ],
    targets: [
        .target(name: "YOUR_TARGET", dependencies: ["Transcribe"]),
    ]
)

Import the module with import Transcribe.

Sample code:

import Transcribe

let inputFile = URL(fileURLWithPath: "input.json") // Change path to your input file
var transcript = try AmazonTranscribe.Transcript(file: inputFile)

// Print some statistics
print("Number of speakers:", transcript.speakers.count)
print("Speaker labels:", transcript.speakers.map { $0.speakerLabel })
print("Number of segments:", transcript.segments.count)
if let speechBegan = transcript.segments.first?.time.lowerBound,
    let speechEnded = transcript.segments.last?.time.upperBound
{
    let formatter = DateComponentsFormatter()
    formatter.allowedUnits = [.hour, .minute, .second]
    formatter.unitsStyle = .positional
    formatter.zeroFormattingBehavior = .pad
    print("Speaking began at:", formatter.string(from: speechBegan.seconds) ?? "(unable to format timecode)")
    print("Speaking ended at:", formatter.string(from: speechEnded.seconds) ?? "(unable to format timecode)")
}

// Set/change speaker names
transcript[speaker: "spk_0"]?.name = "Alice"
transcript[speaker: "spk_1"]?.name = "Bob"

// Save as Markdown
let markdown = transcript.makeMarkdown()
let outputFile = URL(fileURLWithPath: "output.md") // Change path to your output file
try Data(markdown.utf8).write(to: outputFile)

// Save as WebVTT
let webvtt = transcript.makeWebVTT()
let outputFile = URL(fileURLWithPath: "output.vtt") // Change path to your output file
try Data(webvtt.utf8).write(to: outputFile)

On the Command Line

The command line tool takes an input file and converts it to Markdown and WebVTT.

Usage:

swift run -c release TranscribeCLI --json transcript.json

This will print a WebVTT-formatted transcript to standard out.

Run the command line tool with --help for the documentation of all options:

swift run TranscribeCLI --help

Overview of the Main Data Structures

The base data structure for a transcript. It contains a list of segments and a list of speakers:

struct AmazonTranscribe.Transcript {
    public var segments: [Segment]
    public var speakers: [Speaker]
}

A speaker has a label (which identifies the speaker in the transcript) and a name (which can be used when formatting a transcript for output, e.g. to Markdown).

struct AmazonTranscribe.Speaker {
    /// The speaker label in the original `RawTranscript`
    public var speakerLabel: String
    /// The speaker's name as it should appear in the formatted output.
    public var name: String
}

A segment is a segment of spoken text, e.g. a sentence or paragraph. A segment has a time range (when it was spoken), a speaker (who spoke), and a list of fragments:

/// A list of consecutive fragments by the same speaker
struct AmazonTranscribe.Segment {
    var time: Range<Timecode>
    var speakerLabel: String
    var fragments: [Fragment]
}

A fragment is a single unit of speech, like a single word or a punctuation character. The Amazon Transcribe API collects timecodes on this granularity.

/// A fragment of transcribed speech. Could be a word or punctuation.
struct AmazonTranscribe.Fragment {
    var kind: Kind
    var speakerLabel: String

    enum Kind {
        case pronunciation(Pronunciation)
        case punctuation(String)
    }

    struct Pronunciation {
        var time: Range<Timecode>
        var content: String
    }
}

There is also a struct AmazonTranscribe.RawTranscript type, which is a 1-to-1 mapping between the Amazon Transcribe JSON format and Swift data types. When you call AmazonTranscribe.Transcript.init(file:), we parse the JSON into a RawTranscript value and then transform that into the Transcript data structures, which are easier to work with. Users of the library shouldn't need to deal with RawTranscript directly.

License

MIT.

About

A Swift parser for output files from automated transcription services. An experiment inspired by the Swift Community Podcast.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages