Voice
Overview
Voice mode allows your agent to:
The Voice tab under Style & Behavior lets you configure how your agent listens and speaks. There are two distinct modes:
- Speech to Text → User speaks, agent listens (input only).
- Voice Chat → Agent also talks back (two-way conversation).
Voice Settings in Agent Studio
The Voice tab under Style & Behavior lets you configure how your agent interacts with users through spoken conversations.

1. Speech to Text
Toggle on/off.
- On → Users can talk to the agent, and their speech is transcribed into text input.
- Off → Users can only type.
- This setting only captures input — the agent still responds with text on screen.
- Off → Users can only type.
2. Voice Chat
Toggle on/off.
- On → Agent replies with synthesized voice, in addition to text.
- Off → Agent responds only in text (even if Speech to Text is enabled).
- Requires Pro plan.
Tip:Use Speech to Text when you want quick dictation or hands-free input. Use Voice Chat when you want a natural, conversational, two-way voice experience.
3. Voice Chat Type
Choose which voice the agent uses for spoken responses.
- Alloy (default) → Neutral, professional, multi-purpose.
- Additional voices may be available depending on your plan.
- Click the 🔊 icon to preview the selected voice.
4. Voice Instructions
Provide optional custom guidance for how the agent should sound.
Examples:
- “Use a friendly, upbeat tone.”
- “Speak slowly and clearly, with short sentences.”
- “Adopt a formal, executive style suitable for leadership updates.”
Best Practices
- Clarify the mode: Enable Speech to Text if you want voice input, enable Voice Chat if you want voice output.
- Match your audience: For enterprise → professional, concise. For consumer apps → conversational, warmer tone.
- Test voice speed: Faster responses save time, slower speech increases clarity.
- Document tone/style in Voice Instructions to keep experiences cons
Updated 4 days ago