Skip to main content

Evaluation rules

Evaluation rules automatically assess conversation quality after each call completes. They help you measure performance, identify areas for improvement, and ensure consistent service quality across the dimensions that matter to you.

What evaluation rules do

After a conversation ends, evaluation rules analyze the transcript and assign scores based on criteria you define. It is basically an AI agent assessing your AI agent. This automated quality assessment helps you:

  • Measure agent performance consistently
  • Identify successful conversations vs. ones that need improvement
  • Track performance trends over time
  • Find patterns in high-quality vs. low-quality interactions

Creating evaluation rules

To create an evaluation rule:

  1. Navigate to the Evaluation tab in the agent editor
  2. Click Add evaluation rule
  3. Configure the rule settings

Evaluation rule configuration

Name and Description

  • Name: Clear identifier for this rule (e.g., "Customer Satisfaction", "Issue Resolution")
  • Description: Explain what this rule measures and why it matters

Primary Rule

Toggle this ON to designate this as the primary evaluation rule. Only one rule can be primary.

The primary rule is:

  • Featured prominently in conversation details
  • Used for filtering and sorting conversations
  • Highlighted in analytics dashboards
Choosing a Primary Rule

Your primary rule should measure the most important aspect of conversation success. For customer support, this might be "Issue resolution." For sales, it might be "Lead qualification quality."

Score values

Define the possible scores for this evaluation. You can use:

Pass/fail approach:

Pass: Issue was resolved
Fail: Issue was not resolved

Numeric scale:

1: Very Poor
2: Poor
3: Satisfactory
4: Good
5: Excellent

Custom categories:

Resolved: Customer's issue was fully resolved
Partially Resolved: Progress made but follow-up needed
Not Resolved: Issue remains unresolved
Escalated: Transferred to human agent

Scoring logic

Describe in natural language how to evaluate conversations and assign scores. Be specific about what constitutes each score level.

Example: Customer satisfaction rule

Evaluate the conversation for overall customer satisfaction based on these factors:

Score 5 (Excellent) when:
- Customer's question or issue was fully addressed
- Agent was helpful, polite, and efficient
- Customer expressed satisfaction or gratitude
- No frustration or negative sentiment from customer
- Conversation ended positively

Score 4 (Good) when:
- Customer's question or issue was addressed
- Agent was professional and helpful
- Minor hiccups but overall positive interaction
- No significant customer frustration

Score 3 (Satisfactory) when:
- Basic issue addressed but not ideal
- Some customer confusion or repetition
- Agent eventually helped but took longer than ideal
- Neutral customer sentiment

Score 2 (Poor) when:
- Issue not fully resolved
- Customer expressed frustration
- Agent made errors or provided incomplete information
- Required escalation or multiple attempts

Score 1 (Very poor) when:
- Issue not resolved
- Customer clearly frustrated or dissatisfied
- Agent failed to help effectively
- Conversation ended negatively

Example: Issue resolution rule

Evaluate whether the customer's issue was resolved:

Resolved:
- Customer explicitly confirmed their issue is fixed
- Agent successfully completed requested actions (e.g., refund processed, information provided)
- Customer expressed satisfaction with the outcome

Partially resolved:
- Agent provided helpful information but complete resolution requires follow-up
- Some progress made but customer needs additional assistance
- Transferred to appropriate department for completion

Not resolved:
- Customer's issue remains unaddressed
- Agent unable to help with the specific request
- Customer ended conversation still needing help

Escalated:
- Conversation transferred to human agent
- Issue too complex for AI agent
- Customer requested human assistance

How evaluation works

  1. Conversation completes: When a call ends, evaluation begins automatically
  2. Analysis: The AI reviews the full transcript
  3. Scoring: Based on your scoring logic, a score is assigned
  4. Results stored: Scores are saved with the conversation
  5. Visibility: Results appear in conversation details and analytics

Viewing evaluation results

When you view a conversation:

  • Evaluation results appear near the top
  • Primary rule is prominently displayed
  • All evaluation rule scores are visible
  • You can see the reasoning behind each score

Your primary score is also shown prominently in the Conversations overview and can be used for sorting and filtering of conversations.

Multiple evaluation rules

You can create multiple evaluation rules to measure different aspects:

Example evaluation rules for a customer support agent:

  1. Issue resolution (Primary) - Was the problem solved?
  2. Customer satisfaction - How happy was the customer?
  3. Efficiency - Was the conversation efficient or too lengthy?
  4. Policy compliance - Did the agent follow company policies?

Each conversation receives a score for every active evaluation rule.

Best practices

Start with one rule

Begin with a single, clear primary rule that measures overall success. Add more rules once you understand how evaluation works.

Be specific

Vague scoring criteria lead to inconsistent results:

Good: "Score 5 if customer explicitly confirms issue is resolved and expresses satisfaction"

Too vague: "Score 5 if conversation went well"

Use clear score definitions

Each score level should have distinct, observable criteria that can be identified from the transcript.

Test your rules

After creating evaluation rules:

  1. Review several conversations with known outcomes
  2. Check if the evaluation scores match your assessment
  3. Refine the scoring logic based on results
  4. Iterate until scores are consistently accurate

Focus on actionable metrics

Choose evaluation criteria that:

  • Measure what actually matters to your business
  • Provide insights you can act on
  • Help you improve agent performance

Combine quantitative and qualitative

Use both:

  • Quantitative: Numeric scores for trending and comparison
  • Qualitative: Descriptive categories for understanding context

Common evaluation rules

  • Customer satisfaction (CSAT): Measures overall customer happiness with the interaction
  • Issue resolution rate: Tracks how often customer problems are successfully solved
  • First call resolution: Measures if the issue was resolved in the first conversation (no follow-up needed)
  • Compliance score: Evaluates adherence to company policies, legal requirements, or script requirements
  • Efficiency score: Assesses whether conversations are appropriately brief without rushing customers
  • Tone and professionalism: Measures agent politeness, empathy, and professional communication

Limitations and considerations

AI evaluation

Evaluation is performed by AI, which may occasionally misjudge nuanced situations. Spot-check evaluation results regularly to ensure accuracy and refine rules for better performance.

Improving evaluation accuracy

If evaluation results don't match your expectations:

  1. Review scoring logic: Make criteria more specific and detailed
  2. Add examples: Include example conversations in your scoring logic
  3. Refine score definitions: Clarify boundaries between score levels
  4. Test systematically: Run multiple known conversations through evaluation
  5. Iterate: Continuously refine based on results

Next steps

After setting up evaluation rules:

  • Monitor evaluation results across conversations
  • Use scores to identify training opportunities
  • Track score trends to measure improvement over time
  • Adjust agent configuration based on evaluation insights
  • Filter conversations by score to review successes and failures