Transcribe Parser

The Transcribe Parser python Lambda function will be triggered on completion of an Amazon Transcribe job, although other sources will be supported in the future. It will read the Transcribe job information, download the relevant transcription output JSON into local storage and then write out the parsed JSON to a configured S3 location.

{
  "ConversationAnalytics": {},
  "SpeechSegments": []
}

ConversationAnalytics

Section Structure

Contains header-level information around the analytics that have been generated, along with information specific to the type of source input that the conversation came from - the majority of the analytics have their own detailed sections later.

"ConversationAnalytics": {
  "Agent": "string",
  "Agents": [ ],
  "GUID": "string",
  "ConversationTime": "string",
  "ConversationLocation": "string",
  "ProcessTime": "string",
  "Duration": "float",
  "LanguageCode": "string",
  "EntityRecognizerName": "string",
  "SpeakerLabels": [ ],
  "SentimentTrends": { },
  "SpeakerTime": { },
  "CustomEntities": [ ],
  "CategoriesDetected": [ ],
  "IssuesDetected": [ ],
  "ActionItemsDetected": [ ],
  "OutcomesDetected": [ ],
  "Telephony": [ ],
  "SourceInformation": [ ],
  "Summary": { },
  "ContactSummary": { }
}

Field	Type	Description
Agent	string	An indentifier for the Agent that was involved in the conversation, or the agent in a multi-agent call (as identified by the telephony system) that was the top-talker
Agents	[string]	[Optional - telephony only] A list of the names of all agents that participated on this call, ordered in sequence when they first spoke on a call
Cust	string	Customer name or telephony internal Customer ID
GUID	string	A unique GUID that identifies the input source for the conversation
ConversationTime	string	A timestamp that shows the when the conversation occurred
ConversationLocation	string	The TZ database name for the source location for the calls, which can be looked up in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
ProcessTime	string	A timestamp that shows when the analytics job was completed
Duration	float	Duration of the call in seconds
LanguageCode	string	The language code for the input data; e.g. "en-US" for voice or just "en" for text
EntityRecognizerName	string	The name of the Comprehend custom entity recognizer used when processing the transcription job
SpeakerLabels	-	List of speaker labels to use for display purposes
SentimentTrends	-	List of sentiment trends per speaker, both at the summary level and on a per call quarter level
CustomEntities	-	Summary of the custom entities detected throughout the conversation
CategoriesDetected	-	A list of categories detected by Call Analytics
IssuesDetected	-	A list of issues detected by Call Analytics
ActionItemsDetected	-	A list of action items detected by Call Analytics
OutcomesDetected	-	A list of outcomes detected by Call Analytics
Telephony	-	[Optional] A list of telephony-specific metadata fields extract from the CTR files (only present if the chosen telephony CTR parser chooses to write this information out)
SourceInformation	-	Source-specific details for the conversation. Contains just one of any of the possible supported sources
Summary	-	Key value pairs that define summary topics and values. These will be rendered inside the GenAI Call Summary panel in the user interface. The key will be rendered as the title, and the value is the body.
ContactSummary	-	Generative call summarization output from Transcribe Call Analytics. This is a nested structure. See the structure here.

SpeakerLabels

The system generates internal speaker markers, but you can assign (and change) explicit displayable text fro each one. The DisplayText field is user-configurable, but Transcribe Call Analytics has its own configured names and these override the customer definitions.

"SpeakerLabels": [
  {
    "Speaker": "string",
    "DisplayText": "string",
    "UserId": "string"
  }
]

Field	Type	Description
Speaker	string	Internal speaker name in the format `spk_0`, `spk_1` up to `spk_n`
DisplayText	string	Text label to display for this speaker
UserId	string	[Optional - telephony only] Telephony system's user ID reference for this speaker, whether it's a customer or an agent

SentimentTrends

Sentiment trends for each caller. The sentiment score is either:

The sum of any postive and negative scores divided by the number of turns (Standard Transcribe)
The pre-calculated value based upon the number of sentiment markers (Call Analytics)

The per-quarter sentiment scores are similar - we either use the values provided by Call Analytics or we calculate them on-the-fly.

"SentimentTrends": {
  "<SpeakerLabels|Speaker>": {
    "SentimentScore": "float",
    "SentimentChange": "float",
    "NameOverride": "string",
    "SentimentPerQuarter": [
      {
        "Quarter": "int",
        "Score": "float",
        "BeginOffsetSecs": "float",
        "EndOffsetSecs": "float"
      }
    ]
  }
}

Field	Type	Description
Speaker	string	Internal speaker name in the format `spk_n`, starting with n=0
SentimentScore	float	The sentiment for this speaker this period, in range [-5.0, 5.0]
SentimentChange	float	Change in sentiment from start to end of call
NameOverride	string	Override the normal display name for this speaker [optional]
Quarter	int	Period number, in range [1, 4]
Score	float	The sentiment for this speaker this period, in range [-5.0, 5.0]
BeginOffsetSecs	float	Start time for this speaker talking in this period
EndOffsetSecs	float	End time for this speaker talking in this period

CustomEntities

A list of the custom entities that have been detected via Amazon Comprehend Custom Entity Detection, or via the string-matching algorithm. This is summarised by entity type, and doesn't give any indication as to where the entity lies in the text - that is part of the SpeechSegments structure.

"CustomEntities": [
  {
    "Name": "string",
    "Instances": "integer",
    "Values": [ "string" ]
  }  
]

Field	Type	Description
Name	string	The name of the custom entity type; e.g. `Product`
Instances	int	The number of times that this entity was identified in the transcript
Values	[ string ]	An array of literal text values for this type that have been flagged; e.g. `Kit Kat`

SpeakerTime

Each speaker on the call has their total talk time in seconds included in this block. Additionally, the total amount of quiet non-talk time is also recorded, along with the location within the call when each of those quiet periods occured.

The sum of talk times for all speakers may add up to more than the duraction of the call, as it only reports the time they were speaking, which could overlap with when other speakers were talking.

Note: for non-Analytics calls we do not yet have quiet time.

"SpeakerTime": {
  "<SpeakerLabels|Speaker>": {
    "TotalTimeSecs": "float"
  },
  "NonTalkTime": {
    "TotalTimeSecs": "float",
    "Instances": [
      {
        "BeginOffsetSecs": "float",
        "EndOffsetSecs": "float",
        "DurationSecs": "float"
      }
    ]
  }
}

Field	Type	Description
Label: Speaker	string	Internal speaker name in the format `spk_n`, starting with n=0
TotalTimeSecs	float	The amoung of time that this speaker was talking on the call, or the total amount of non-talk time
BeginOffsetSecs	float	Starting point in the call of a period of non-talk time
EndOffsetSecs	float	End point in the call of a period of non-talk time
DurationSecs	float	Duration of a period of non-talk time

CategoriesDetected

Note: this information is only available from Call Analytics calls.

Within Amazon Transcribe Call Analytics the customer can define a number of categories, which can be based upon multiple rules across different aspects of the call - speaker, sentiment, point in call, interruption status, etc. Whenever these categories are detected in the call then this header entry will be populated with summary information, with more detail further down in the SpeechSegments list.

"CategoriesDetected": [
  {
    "Name": "string",
    "Instances": "integer",
    "Timestamps": [
      {
        "BeginOffsetSecs": "float",
        "EndOffsetSecs": "float"
      }
    ]
  }  
]

Field	Type	Description
Name	string	Name of the detected category
Instances	int	Number of instances of this category in the call
BeginOffsetSecs	float	Beginning of the text that identified this category
EndOffsetSecs	float	End of the text that identified this category

Note that the Timestamps block can be empty, as some categories are triggered on the absence of data in the call, so those categories would have an Instances count but no Timestamps

IssuesDetected | ActionItemsDetected | OutcomesDetected

Note: this information is only available from Call Analytics calls.

The issue detection model in Call Analytics will highlight text in the transcript that it recognises as an issue, an outcome or an action. It does not give a category for any of these items, just the text and location. This information is listed here, but is repeated further down in the SpeechSegments list against the relevant line.

"IssuesDetected | ActionItemsDetected | OutcomesDetected": [
  {
    "Text": "string",
    "BeginOffset": "integer",
    "EndOffset": "integer"
  }  
]

Field	Type	Description
Text	string	Text that triggered the issue
BeginOffset	float	Beginning position of the text that identified this issue in the transcript line
EndOffset	float	End position of the text that identified this issue in the transcript line

Telephony | Genesys

A set of data points extracted from a telephony CTR file by the selected CTR processor; if no telephony CTR processing is done, or the processor chooses not to write these out, then this section is missing. It should be noted that each telephony system's CTR records are distinct, and each may contain different values - for the definiteion of each field you are referred to the telephony provider's documentation.

"Telephony": {
  "Genesys": {
    "id": "string",
    "conversationId": "string",
    "startTime": "string",
    "endTime": "string",
    "conversationStart": "string",
    "originatingDirection": "string",
    "queueIds": [ "string" ]
  }
}

SourceInformation | TranscribeJobInfo

Present when the source of the conversation is Amazon Transcribe. A mixture of information around the Transcription job itself, some of which comes directly from the service but some is generated by the parser and stored here, as it is high-level transcription-wide information.

"SourceInformation": [
  {
    "TranscribeJobInfo": {
      "TranscriptionJobName": "string",
      "TranscribeApiType": "string",
      "StreamingSession": "string",
      "CompletionTime": "string",
      "CLMName": "string",
      "VocabularyName": "string",
      "VocabularyFilter": "string",
      "MediaFormat": "string",
      "MediaSampleRateHertz": "integer",
      "MediaFileUri": "string",
      "MediaOriginalUri": "string",
      "RedactedTranscript": "boolean",
      "ChannelIdentification": "boolean",
      "AverageWordConfidence": "float",
      "CombinedAnalyticsGraph": "string"
    }
  }
]

Field	Type	Description
TranscriptionJobName	string	The name of the transcription job (audio file input) or the name of the transcription file (transcription file input)
TranscribeApiType	string	The Transcribe API used, must be one of: `standard`, `analytics`
StreamingSession	string	ID for any associated Transcribe Streaming session
CompletionTime	string	A timestamp that shows when the job was completed
CLMName	string	The name of the Custom Language Model used in the transcription job
VocabularyName	string	The name of the vocabulary used in the transcription job
VocabularyFilter	string	The name and mask method of the vocabulary filter used in the transcription job
MediaFormat	string	The format of the input media file, as determined by Amazon Transribe
MediaSampleRateHertz	Int	The sample rate, in Hertz, of the audio track in the input audio
MediaFileUri	string	The S3 object location of the media file to use during playback, as we may playback an audio-redacted version or a version that has a format unplayable in all browsers with the HTML5 audio control
MediaOriginalUri	string	The S3 object location of the original input audio file
RedactedTranscript	bool	Indicates that the transcript has been redacted
ChannelIdentifcation	bool	Indicates whether the transcription job used channel- (true) or speaker-separation (false)
AverageWordConfidence	float	Percentage value between 0.00 and 1.00 indicating overall word confidence score for this job
CombinedAnalyticsGraph	string	S3 URL for the pre-generated combined Call Analytics chart

Summary

This is a set of key value pairs that make up the summary of the call. It is defined as zero or more key value pairs because summaries can be about more than one thing. For example, we can summarize the entire call and also summarize the next steps of the call.

"Summary": {
  "Key1": "Value1",
  ...
}

Example:

"Summary": {
  "Summary": "The caller called to renew their drivers license.",
  "Agent Sentiment": "Positive",
  "Caller Sentiment": "Negative",
  "Call Category": "DRIVERS_LICENSE_RENEWAL"
  ...
}

SpeechSegments

Section Structure

Contains a single line - or turn - of transcribed text, along with sentiment indicators and any other analytics that have been calculated or provided by Transcribe.

"SpeechSegments": [
  {
    "SegmentStartTime": "float",
    "SegmentEndTime": "float",
    "SegmentSpeaker": "string",
    "SegmentInterruption": "boolean",
    "IVRSegment": "boolean",
    "OriginalText": "string",
    "DisplayText": "string",
    "TextEdited": "boolean",
    "SentimentIsPositive": "boolean",
    "SentimentIsNegative": "boolean",
    "SentimentScore": "float",
    "LoudnessScores": [ "float" ],
    "CategoriesDetected": [ "string" ],
    "FollowOnCategories": [ "string" ],
    "BaseSentimentScores": { },
    "EntitiesDetected": [ ],
    "IssuesDetected": [ ],
    "WordConfidence": [ ]
  }
]

Field	Type	Description
SegmentStartTime	float	Start time in the conversation for this segment in seconds
SegmentEndTime	float	End time in the conversation for this segment in seconds
SegmentSpeaker	string	Internal speaker name in the format `spk_n`, starting with n=0
SegmentInterruption	bool	Indicates if this segment was an interruption by the speaker
IVRSegment	bool	Indicates if this segment was spoken by a telephony IVR
OriginalText	string	Original text string generated by conversation source
DisplayText	string	Text to be displayed by the front-end application
TextEdited	bool	Indicates if text has been edited
SentimentIsPositive	bool	Indicates if the sentiment of this turn is positive
SentimentIsNegative	bool	Indicates if the sentiment of this turn is negatice
SentimentScore	float	Sentiment score in the range [-5.0, +5.0]
LoudnessScores	[ float ]	A list of loudness scores in decibels, one per second of the segment
CategoriesDetected	[string]	A list of categories triggered by or just prior to this segment. Note, negative rules are always tagged to the first segment, as they have no start time
FollowOnCategories	[string]	A list of categories triggered after this segment, typically only on the final segment to catch categories like silence detection after the final piece of speech
BaseSentimentScores	-	Set of base sentiment scores from Amazon Comprehend
EntitiesDetected	-	List of custom entities that were detected on this speech segment
IssuesDetected	-	List of caller issues that were detected on this speech segment
WordConfidence	-	List of word/confidence pairs for the whole turn

BaseSentimentScores

Amazon Comprehend will generate a score between +/- 5.0 for each of four different sentiment types, and we will use several of these. If the sentiment comes from a source that only return tags rather than scores, such as Amazon Transcribe Call Analytics, then then the confidence levels will be set to just 0.0 or 1.0.

"BaseSentimentScores": {
  "Positive": "float",
  "Negative": "float",
  "Neutral": "float"
}

Field	Type	Description
Positive	float	Confidence that this turn has positive sentiment
Negative	float	Confidence that this turn has negative sentiment
Neutral	float	Confidence that this turn has neutral sentiment

EntitiesDetected

List of the custom entities detected in this speech segment - the offset text markers from Comprehend are present, but only make sense if the text has not been edited.

"EntitiesDetected": [
  {
    "Type": "string",
    "Text": "string",
    "BeginOffset": "integer",
    "EndOffset": "integer",
    "Score": "float"
  }
]

Field	Type	Description
Type	string	The type of the custom entity
Text	string	The text that has been identified as the custom entity
BeginOffset	int	A character offset in the input text that shows where the entity begins (start from 0)
EndOffset	int	A character offset in the input text that shows where the entity ends; e.g. at the character after the entity
Score	float	The level of confidence that Amazon Comprehend has in the accuracy of the detection

IssuesDetected

Issue text and their timestamps are called out in the header for ConversationAnalytics, but in the SpeechSegments we include more detailed information. The presenced of data here indicates that there is text on this segment that has triggered issue detection, what the text is and where it can be found within the segment.

"IssuesDetected": [
  {
    "Text": "string",
    "BeginOffset": "integer",
    "EndOffset": "integer"
  }
]

Field	Type	Description
Text	string	The text string that triggered the issue detection
BeginOffset	int	A character offset in the input text that shows where the detected issue begins (start from 0)
EndOffset	Int	A character offset in the input text that shows where the detected issue ends; e.g. at the character after the issue text

WordConfidence

Amazon Transcribe will generate a word-confidence score for every word in the transcription output, allowing a front-end application highlight potential inaccuracies in the transcription.

"WordConfidence": [
  {
    "Text": "string",
    "Confidence": "float",
    "StartTime": "float",
    "EndTime": "float"
  }
]

Field	Type	Description
Text	string	Word that this score applies to. Note, this may include a leading space as well as trailing punctuation
Confidence	float	Word confidence score between 0.00 - 1.00 for this word
StartTime	float	Time in seconds in call where word starts
EndTime	float	Time in seconds in call where word finishes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

output_json_structure.md

output_json_structure.md

Transcribe Parser

ConversationAnalytics

Section Structure

SpeakerLabels

SentimentTrends

CustomEntities

SpeakerTime

CategoriesDetected

IssuesDetected | ActionItemsDetected | OutcomesDetected

Telephony | Genesys

SourceInformation | TranscribeJobInfo

Summary

SpeechSegments

Section Structure

BaseSentimentScores

EntitiesDetected

IssuesDetected

WordConfidence

Files

output_json_structure.md

Latest commit

History

output_json_structure.md

File metadata and controls

Transcribe Parser

ConversationAnalytics

Section Structure

SpeakerLabels

SentimentTrends

CustomEntities

SpeakerTime

CategoriesDetected

IssuesDetected | ActionItemsDetected | OutcomesDetected

Telephony | Genesys

SourceInformation | TranscribeJobInfo

Summary

SpeechSegments

Section Structure

BaseSentimentScores

EntitiesDetected

IssuesDetected

WordConfidence