-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Description of the feature request:
Hi all,
I looked through cookbooks and example to get quick transcription of 10-15 sec of audio pcm mono channel 16kHz. Live api seem quickly on conversation so I though I could get it to just do transcription. I was testing it like 2-3 month ago. For example example below I get (around 1 s +- 100-200 ms to get whole text back) . I tried live api examples but it gave me same time to get the transcription based on examples. Or some errors as the api seem to change a bit. Could anybody let me know example of how to get it quicker than below or it fastest way to do it?
response = await self.client.aio.models.generate_content(
model='models/gemini-2.5-flash-lite',
contents=[
prompt,
types.Part.from_bytes(
data=processed_audio_bytes,
mime_type='audio/wav',
)
]
)
transcribed_text = response.text
I also tried
response = client.models.generate_content(
model="gemini-2.5-flash-lite",
contents=[
file_upload,
"Transcribe this audio exactly as spoken. Output only the text."
],
config={
"response_modalities": ["TEXT"], # We want text back, not audio
}
)
What problem are you trying to solve with this feature?
No response
Any other information you'd like to share?
No response