Evaluations
The Evaluations tab is where you test if a user’s natural language prompt is correctly translated into the action you’ve defined, and whether the right input parameters are extracted based on the schema.
Goal
- Confirm the AI routes the user intent to the correct action.
- Check that the schema fields (parameters) are filled correctly from the prompt.
- Validate how the agent handles the server response using the mock response
Generating Test Prompts
- Open the Evaluations tab.
- Click Generate Prompts to auto-create a set of test prompts based on your action schema.
- Example (for
Create Campaign
action):- “Create a new campaign named ‘Summer Sale’ starting from July 1 to July 31 with a budget of 5000 and status ACTIVE.”
- “I want to create a campaign called ‘Holiday Promo’.”
- “Create campaign starting on August 1 with a budget of 2000 and status PAUSED.”
- Example (for
You can also click + Add New Test to write your own custom prompt.
Running a Test
- Click Run next to a test prompt.
- The agent will process the input and attempt to:
- Match the correct action.
- Extract values for each field in the Schema.
- Return the Response Mock you defined.
Reviewing Results
-
On the right-hand side, you’ll see the Agent Testing output.
- Example:
Campaign Created: Spring Sale Name: Spring Sale Start Date: 2024-05-01 End Date: 2024-05-31 Budget: 10,000 Status: ACTIVE
- Example:
-
In the Agent Testing panel, click the action link (e.g., Create Campaign was executed).
- This opens the Arguments view, which shows the raw schema extraction:
{ "startDate": "2024-05-01", "endDate": "2024-05-31", "budget": 10000, "status": "ACTIVE", "campaignName": "Spring Sale" }
- This opens the Arguments view, which shows the raw schema extraction:
This lets you confirm that user language (e.g., “budget of 10k”) is mapped into structured schema fields.
Best Practices
-
Create tests that cover:
- All required fields provided (happy path).
- Only required fields provided (minimal input).
- Missing required fields (should fail validation).
- Partial optional fields (some extras given, others missing).
-
Update your schema descriptions or instructions if the AI is misinterpreting user prompts.
-
Always re-run evaluations after editing the schema or response mock.
✅ Use Evaluations before publishing to ensure your action works reliably across different ways a user might phrase their request.
Updated 14 days ago