Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest Client - Multichannel support added #2186

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions samples/ingestion/ingestion-client/Connector/BatchClient.cs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ namespace Connector
using System.Threading;
using System.Threading.Tasks;
using Connector.Serializable.TranscriptionFiles;
using Microsoft.Extensions.Logging;
using Newtonsoft.Json;
using Polly;
using Polly.Retry;
Expand Down Expand Up @@ -72,11 +73,11 @@ public static Task DeleteTranscriptionAsync(string transcriptionLocation, string
return DeleteAsync(transcriptionLocation, subscriptionKey, DefaultTimeout);
}

public static async Task<Uri> PostTranscriptionAsync(TranscriptionDefinition transcriptionDefinition, string hostName, string subscriptionKey)
public static async Task<Uri> PostTranscriptionAsync(TranscriptionDefinition transcriptionDefinition, string hostName, string subscriptionKey, ILogger? logger)
{
var path = $"{hostName}{TranscriptionsBasePath}";
var payloadString = JsonConvert.SerializeObject(transcriptionDefinition);

logger.LogInformation($"Sending payload {payloadString} to {path}");
return await PostAsync(path, subscriptionKey, payloadString, PostTimeout).ConfigureAwait(false);
}

Expand Down
2 changes: 2 additions & 0 deletions samples/ingestion/ingestion-client/Connector/Constants.cs
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,7 @@ public static class Constants
public const int DefaultConversationAnalysisMaxChunkSize = 5000;

public const string SummarizationSupportedLocalePrefix = "en";

public static readonly int[] Channels = new int[] { 0, 1 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that I understand the purpose of this PR. Multichannel support should be enabled by default in batch transcription. Only if a customer wants to transcribe a specific channel, he needs to specify the channel number in request property. For example, he only wants to transcribe channel 1, but not channel 0, of a stereo audio.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test the default settings - it always returns "Invalid Data" for multi Channel audios, it does not do that by default.

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ private TranscriptionDefinition(
string description,
string locale,
IEnumerable<string> contentUrls,
Dictionary<string, string> properties,
Dictionary<string, object> properties,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird. Why do we need to change this to "object"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properties for transcription request not always just string , for example

"properties": {
    "diarizationEnabled": false,
    "wordLevelTimestampsEnabled": true,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "languageIdentification": {
      "candidateLocales": [
        "en-US",
        "de-DE",
        "es-ES"
      ]
    }
  },

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-transcription-create?pivots=rest-api#create-a-transcription-job

ModelIdentity model)
{
this.DisplayName = name;
Expand All @@ -36,14 +36,14 @@ private TranscriptionDefinition(

public ModelIdentity Model { get; set; }

public IDictionary<string, string> Properties { get; }
public IDictionary<string, object> Properties { get; }

public static TranscriptionDefinition Create(
string name,
string description,
string locale,
string contentUrl,
Dictionary<string, string> properties,
Dictionary<string, object> properties,
ModelIdentity model)
{
return new TranscriptionDefinition(name, description, locale, new[] { contentUrl }.ToList(), properties, model);
Expand All @@ -54,7 +54,7 @@ public static TranscriptionDefinition Create(
string description,
string locale,
IEnumerable<string> contentUrls,
Dictionary<string, string> properties,
Dictionary<string, object> properties,
ModelIdentity model)
{
return new TranscriptionDefinition(name, description, locale, contentUrls, properties, model);
Expand Down
64 changes: 24 additions & 40 deletions samples/ingestion/ingestion-client/Setup/ArmTemplateBatch.json
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,13 @@
"description": "A value indicating whether diarization (speaker separation) is requested."
}
},
"Channels": {
"defaultValue": "1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default value of transcription is [0,1]. i.e. If no channels is specified, both channel 0 and channel 1 of stereo will be transcribed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not what happens if we do not pass Channels in the Transcription request properties , it results with Invalid Data error .Tested on multichannel audios.

"channels": [
      0,
      1
    ],

"type": "int",
"metadata": {
"description": "A value indicating whether multiple channels are processed. 0 - Mono, 1 - Stereo ."
}
},
"AddWordLevelTimestamps": {
"defaultValue": false,
"type": "bool",
Expand All @@ -150,35 +157,11 @@
"description": "The key for the Text Analytics subscription."
}
},
"TextAnalyticsRegion": {
"defaultValue": "None",
"TextAnalyticsEndpoint": {
"defaultValue": "",
"type": "String",
"allowedValues": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we support TextAnalytics in any region now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to latest release of ARM in this repo yes Endpoint is supported specifically to support Private Endpoints that are not regional. Not sure why latest Tag is not merged in master

https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/ingestion-v2.0.11/samples/ingestion/ingestion-client/Setup/ArmTemplateBatch.json

"None",
"centralus",
"eastus",
"eastus2",
"northcentralus",
"southcentralus",
"westcentralus",
"westus",
"westus2",
"canadacentral",
"brazilsouth",
"eastasia",
"southeastasia",
"australiaeast",
"centralindia",
"japaneast",
"japanwest",
"koreacentral",
"northeurope",
"westeurope",
"francecentral",
"uksouth"
],
"metadata": {
"description": "The region the Text Analytics subscription is associated with. If none is selected, no text analysis will be performed."
"description": "The endpoint the Text Analytics subscription is associated with (format should be like https://{resourceName}.cognitiveservices.azure.com or https://{region}.api.cognitive.microsoft.com or similar). If empty, no text analysis will be performed."
}
},
"SentimentAnalysis": {
Expand All @@ -194,15 +177,15 @@
}
},
"PiiRedaction": {
"defaultValue": "None",
"type": "String",
"allowedValues": [
"None",
"UtteranceAndAudioLevel"
],
"metadata": {
"description": "A value indicating whether personally identifiable information (PII) redaction is requested. Will only be performed if a Text Analytics Key and Region is provided."
}
"defaultValue": "None",
"type": "String",
"allowedValues": [
"None",
"UtteranceAndAudioLevel"
],
"metadata": {
"description": "A value indicating whether personally identifiable information (PII) redaction is requested. Will only be performed if a Text Analytics Key and Region is provided."
}
},
"SqlAdministratorLogin": {
"type": "string",
Expand All @@ -227,7 +210,7 @@
}
},
"variables": {
"Version": "v2.0.5",
"Version": "v2.0.12",
"AudioInputContainer": "audio-input",
"AudioProcessedContainer": "audio-processed",
"ErrorFilesOutputContainer": "audio-failed",
Expand Down Expand Up @@ -844,6 +827,7 @@
},
"properties": {
"AddDiarization": "[parameters('AddDiarization')]",
"Channels": "[parameters('Channels')]",
"AddWordLevelTimestamps": "[parameters('AddWordLevelTimestamps')]",
"APPLICATIONINSIGHTS_CONNECTION_STRING": "[reference(resourceId('Microsoft.Insights/components', variables('AppInsightsName')), '2020-02-02-preview').ConnectionString]",
"AudioInputContainer": "[variables('AudioInputContainer')]",
Expand Down Expand Up @@ -932,7 +916,7 @@
"RetryLimit": "[variables('RetryLimit')]",
"StartTranscriptionServiceBusConnectionString": "[listKeys(variables('AuthRuleCT'),'2015-08-01').primaryConnectionString]",
"TextAnalyticsKey": "[concat('@Microsoft.KeyVault(VaultName=', variables('KeyVaultName'), ';SecretName=', variables('TextAnalyticsKeySecretName'), ')')]",
"TextAnalyticsRegion": "[parameters('TextAnalyticsRegion')]",
"TextAnalyticsEndpoint": "[parameters('TextAnalyticsEndpoint')]",
"UseSqlDatabase": "[variables('UseSqlDatabase')]",
"WEBSITE_RUN_FROM_PACKAGE": "[variables('FetchTranscriptionBinary')]",
"CreateConsolidatedOutputFiles": "[variables('CreateConsolidatedOutputFiles')]",
Expand All @@ -942,8 +926,8 @@
"PiiCategories": "[variables('PiiCategories')]",
"ConversationPiiCategories": "[variables('ConversationPiiCategories')]",
"ConversationPiiInferenceSource": "[variables('ConversationPiiInferenceSource')]",
"ConversationPiiSetting":"[variables('ConversationPiiRedaction')]",
"ConversationSummarizationOptions":"[variables('ConversationSummarizationOptions')]"
"ConversationPiiSetting": "[variables('ConversationPiiRedaction')]",
"ConversationSummarizationOptions": "[variables('ConversationSummarizationOptions')]"
}
}
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,7 @@ public static class StartTranscriptionEnvironmentVariables
public static readonly string PunctuationMode = Environment.GetEnvironmentVariable(nameof(PunctuationMode), EnvironmentVariableTarget.Process);

public static readonly string StartTranscriptionServiceBusConnectionString = Environment.GetEnvironmentVariable(nameof(StartTranscriptionServiceBusConnectionString), EnvironmentVariableTarget.Process);

public static readonly int[] Channels = int.TryParse(Environment.GetEnvironmentVariable(nameof(Channels), EnvironmentVariableTarget.Process), out int result) && result == 1 ? Constants.Channels : new int[] { result };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading channels from environment variables looks weird. Do we ask customers to change the environment variable value for each audio to be transcribed? How about if multiple audios are submitted but with different channel settings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's global setting for Azure Function applicable to all audios

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,8 @@ private async Task StartBatchTranscriptionJobAsync(IEnumerable<ServiceBusReceive
var transcriptionLocation = await BatchClient.PostTranscriptionAsync(
transcriptionDefinition,
StartTranscriptionEnvironmentVariables.AzureSpeechServicesEndpointUri,
this.subscriptionKey).ConfigureAwait(false);
this.subscriptionKey,
this.logger).ConfigureAwait(false);

this.logger.LogInformation($"Location: {transcriptionLocation}");

Expand Down Expand Up @@ -309,9 +310,9 @@ private async Task WriteFailedJobLogToStorageAsync(IEnumerable<Connector.Service
}
}

private Dictionary<string, string> GetTranscriptionPropertyBag()
private Dictionary<string, object> GetTranscriptionPropertyBag()
{
var properties = new Dictionary<string, string>();
var properties = new Dictionary<string, object>();

var profanityFilterMode = StartTranscriptionEnvironmentVariables.ProfanityFilterMode;
properties.Add("ProfanityFilterMode", profanityFilterMode);
Expand All @@ -330,6 +331,9 @@ private Dictionary<string, string> GetTranscriptionPropertyBag()
properties.Add("WordLevelTimestampsEnabled", addWordLevelTimestamps.ToString(CultureInfo.InvariantCulture));
this.logger.LogInformation($"Setting word level timestamps enabled to {addWordLevelTimestamps}");

var channels = StartTranscriptionEnvironmentVariables.Channels;
properties.Add("Channels", channels);
this.logger.LogInformation($"Setting channels to {string.Join(",", channels)}");
return properties;
}

Expand Down