Evaluations

The Evaluations tab is where you test if a user’s natural language prompt is correctly translated into the action you’ve defined, and whether the right input parameters are extracted based on the schema.


Goal

  • Confirm the AI routes the user intent to the correct action.
  • Check that the schema fields (parameters) are filled correctly from the prompt.
  • Validate how the agent handles the server response using the mock response

Generating Test Prompts

  1. Open the Evaluations tab.
  2. Click Generate Prompts to auto-create a set of test prompts based on your action schema.
    • Example (for Create Campaign action):
      • “Create a new campaign named ‘Summer Sale’ starting from July 1 to July 31 with a budget of 5000 and status ACTIVE.”
      • “I want to create a campaign called ‘Holiday Promo’.”
      • “Create campaign starting on August 1 with a budget of 2000 and status PAUSED.”

You can also click + Add New Test to write your own custom prompt.


Running a Test

  • Click Run next to a test prompt.
  • The agent will process the input and attempt to:
    1. Match the correct action.
    2. Extract values for each field in the Schema.
    3. Return the Response Mock you defined.

Reviewing Results

  • On the right-hand side, you’ll see the Agent Testing output.

    • Example:
      Campaign Created: Spring Sale
      Name: Spring Sale
      Start Date: 2024-05-01
      End Date: 2024-05-31
      Budget: 10,000
      Status: ACTIVE
  • In the Agent Testing panel, click the action link (e.g., Create Campaign was executed).

    • This opens the Arguments view, which shows the raw schema extraction:
      {
        "startDate": "2024-05-01",
        "endDate": "2024-05-31",
        "budget": 10000,
        "status": "ACTIVE",
        "campaignName": "Spring Sale"
      }

This lets you confirm that user language (e.g., “budget of 10k”) is mapped into structured schema fields.


Best Practices

  • Create tests that cover:

    • All required fields provided (happy path).
    • Only required fields provided (minimal input).
    • Missing required fields (should fail validation).
    • Partial optional fields (some extras given, others missing).
  • Update your schema descriptions or instructions if the AI is misinterpreting user prompts.

  • Always re-run evaluations after editing the schema or response mock.


✅ Use Evaluations before publishing to ensure your action works reliably across different ways a user might phrase their request.