Skip to content

Live API for just quick transcription ? Or is gemini-2.5-flash-lite fasest possible? #1123

@lukaLLM

Description

@lukaLLM

Description of the feature request:

Hi all,
I looked through cookbooks and example to get quick transcription of 10-15 sec of audio pcm mono channel 16kHz. Live api seem quickly on conversation so I though I could get it to just do transcription. I was testing it like 2-3 month ago. For example example below I get (around 1 s +- 100-200 ms to get whole text back) . I tried live api examples but it gave me same time to get the transcription based on examples. Or some errors as the api seem to change a bit. Could anybody let me know example of how to get it quicker than below or it fastest way to do it?

response = await self.client.aio.models.generate_content(
model='models/gemini-2.5-flash-lite',
contents=[
prompt,
types.Part.from_bytes(
data=processed_audio_bytes,
mime_type='audio/wav',
)
]
)
transcribed_text = response.text

I also tried

response = client.models.generate_content(
    model="gemini-2.5-flash-lite",
    contents=[
        file_upload,
        "Transcribe this audio exactly as spoken. Output only the text."
    ],
    config={
        "response_modalities": ["TEXT"], # We want text back, not audio
    }
)

What problem are you trying to solve with this feature?

No response

Any other information you'd like to share?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions