Skip to main content

Accurate Transcription

Enable accurate transcription to give your agent a tool it can call when it needs a higher-accuracy transcript of the conversation. This is particularly useful for alphanumeric inputs โ€” codes, reference numbers, policy numbers, and spelled-out names โ€” where the standard real-time transcription may mishear individual characters.

When enabled, the agent gets access to a request_accurate_transcript function. It calls this function automatically based on the instructions you provide, requesting either the last minute of audio or the full conversation, then uses the result to correct its understanding before responding.

Default: Disabled

When to Enable Itโ€‹

Consider enabling accurate transcription if your agent:

  • Collects reference numbers, order numbers, or booking codes
  • Handles spelled-out names, postcodes, or alphanumeric identifiers
  • Needs to recover gracefully when callers say "that's not what I said"

Instructionsโ€‹

Define when the agent should request an accurate transcript. These instructions are included directly in the tool description, so they directly shape when the agent decides to call it.

Default instructions:

Use this function when the user corrects you about something they said, indicates you misheard them, or when you need clarification on alphanumeric information like codes, reference numbers, or spelling.
tip

Be specific about the types of inputs that benefit from re-transcription โ€” include examples relevant to your use case (e.g., "policy numbers", "flight codes"). Keep it concise since the agent reads this every turn.

Example:

Use this when the caller provides a booking reference, loyalty number, or spelled-out name,
or when they indicate you misheard them.

Transcript Windowโ€‹

When the agent calls request_accurate_transcript, it selects one of two audio windows to transcribe:

  • Recent โ€” transcribes approximately the last minute of conversation. Best for re-checking something said moments ago, such as a correction or a short alphanumeric input the caller just provided. This is faster and cheaper.
  • Full conversation โ€” transcribes the entire conversation from the beginning. Best for end-of-call verification steps where the agent needs to confirm multiple details collected throughout the call, or when the relevant information may have been provided much earlier.

The agent chooses which window to use based on context. You can guide this behavior through your instructions โ€” for example, telling the agent to prefer recent for corrections and full_conversation to review all details at the end of the call.

Guiding the Agent in Your Dialogueโ€‹

The tool instructions above control when the agent calls request_accurate_transcript globally, but you can also add guidance in individual dialogue state instructions. This is useful when a specific state always warrants accurate transcription.

Example state instruction:

When the caller provides their policy number, use request_accurate_transcript to ensure you have the correct number before continuing.