Voice

Overview

Voice mode allows your agent to:

The Voice tab under Style & Behavior lets you configure how your agent listens and speaks. There are two distinct modes:

  • Speech to Text → User speaks, agent listens (input only).
  • Voice Chat → Agent also talks back (two-way conversation).

Voice Settings in Agent Studio

The Voice tab under Style & Behavior lets you configure how your agent interacts with users through spoken conversations.



1. Speech to Text

Toggle on/off.

  • On → Users can talk to the agent, and their speech is transcribed into text input.
    • Off → Users can only type.
      • This setting only captures input — the agent still responds with text on screen.

2. Voice Chat

Toggle on/off.

  • On → Agent replies with synthesized voice, in addition to text.
  • Off → Agent responds only in text (even if Speech to Text is enabled).
  • Requires Pro plan.

Tip:

Use Speech to Text when you want quick dictation or hands-free input. Use Voice Chat when you want a natural, conversational, two-way voice experience.


3. Voice Chat Type

Choose which voice the agent uses for spoken responses.

  • Alloy (default) → Neutral, professional, multi-purpose.
  • Additional voices may be available depending on your plan.
  • Click the 🔊 icon to preview the selected voice.

4. Voice Instructions

Provide optional custom guidance for how the agent should sound.

Examples:

  • “Use a friendly, upbeat tone.”
  • “Speak slowly and clearly, with short sentences.”
  • “Adopt a formal, executive style suitable for leadership updates.”

Best Practices

  • Clarify the mode: Enable Speech to Text if you want voice input, enable Voice Chat if you want voice output.
  • Match your audience: For enterprise → professional, concise. For consumer apps → conversational, warmer tone.
  • Test voice speed: Faster responses save time, slower speech increases clarity.
  • Document tone/style in Voice Instructions to keep experiences cons