Overview

Voice mode allows your agent to:

The Voice tab under Style & Behavior lets you configure how your agent listens and speaks. There are two distinct modes:

Speech to Text → User speaks, agent listens (input only).
Voice Chat → Agent also talks back (two-way conversation).

Voice Settings in Agent Studio

The Voice tab under Style & Behavior lets you configure how your agent interacts with users through spoken conversations.

1. Speech to Text

Toggle on/off.

On → Users can talk to the agent, and their speech is transcribed into text input.
- Off → Users can only type.
  - This setting only captures input — the agent still responds with text on screen.

2. Voice Chat

Toggle on/off.

On → Agent replies with synthesized voice, in addition to text.
Off → Agent responds only in text (even if Speech to Text is enabled).
Requires Pro plan.

⚡
Tip:
Use Speech to Text when you want quick dictation or hands-free input. Use Voice Chat when you want a natural, conversational, two-way voice experience.

3. Voice Chat Type

Choose which voice the agent uses for spoken responses.

Alloy (default) → Neutral, professional, multi-purpose.
Additional voices may be available depending on your plan.
Click the 🔊 icon to preview the selected voice.

4. Voice Instructions

Provide optional custom guidance for how the agent should sound.

Examples:

“Use a friendly, upbeat tone.”
“Speak slowly and clearly, with short sentences.”
“Adopt a formal, executive style suitable for leadership updates.”

Best Practices

Clarify the mode: Enable Speech to Text if you want voice input, enable Voice Chat if you want voice output.
Match your audience: For enterprise → professional, concise. For consumer apps → conversational, warmer tone.
Test voice speed: Faster responses save time, slower speech increases clarity.
Document tone/style in Voice Instructions to keep experiences cons