Guardrails

Guardrails are safety and compliance rules that monitor conversations and take action when certain conditions are met. They help ensure your agent behaves appropriately and stays within defined boundaries.

Stellar takes a unique approach to guardrails by running all checks in parallel with the conversation flow. This means your agent can maintain natural, low-latency interactions while still being monitored for safety and compliance, without adding delays to response times.

What guardrails do

Guardrails continuously monitor conversations for:

Inappropriate language or topics
Policy violations
Sensitive information disclosure
Customer frustration or dissatisfaction
Agent behavior outside defined parameters

When a guardrail detects a violation, it can either:

Nudge the agent back on track
Transfer to a human agent
End the conversation
Simply log the violation for review

Creating a guardrail

To create a guardrail:

Navigate to the Guardrails tab
Click Add Guardrail
Configure the guardrail settings

Guardrail configuration

Name and rules

Name: Descriptive name for this guardrail (e.g., "No Medical Advice")
Rules: Natural language description of what to monitor for

Example rules:

Monitor for:
- Agent providing medical advice or diagnosis
- Agent recommending specific medications
- Agent making health-related claims

The agent should NOT:
- Diagnose medical conditions
- Recommend treatments or medications
- Provide medical advice of any kind

If the customer asks medical questions, the agent should politely decline and suggest consulting a healthcare professional.

Type

Choose who the guardrail monitors:

Agent only: Monitor only what the agent says
User Only: Monitor only what the customer says
Agent and User: Monitor both sides of the conversation

Examples:

"No Medical Advice" → Agent Only (monitor what agent says)
"No Abusive Language" → User Only (monitor customer behavior)
"Stay On Topic" → Agent and User (monitor both sides)

Window

The window defines how much conversation history to consider. For example:

-1: All messages (entire conversation history)
1: Only the most recent message
3: Last 3 messages
5: Last 5 messages

When to use different windows:

All messages (-1): For cumulative behavior (e.g., customer frustration building over time)
Recent messages (1-5): For immediate violations (e.g., inappropriate language)

Shorter windows for prompt injection protection

Shorter windows are particularly effective for detecting prompt injection attacks. By only analyzing the most recent messages, the guardrail won't be influenced or manipulated by earlier conversation context that an attacker might try to use.

Strategy

Choose the evaluation strategy:

Latency priority (recommended): Faster processing, suitable for most guardrails
Accuracy priority: More thorough analysis using a more sophisticated model, use only for critical compliance rules

Strategy Selection

Use latency priority for most guardrails. It's faster and works well for general use cases. Reserve Accuracy priority only for legal compliance or high-risk areas where thoroughness is critical.

Actions on trigger

Define what happens when the guardrail is triggered:

Ignore

Silently enforce the rule
Log the violation
Continue the conversation normally
Useful for monitoring without disrupting conversation flow

Nudge agent

Send a steering message to the agent to correct its behavior
The agent receives guidance but the conversation continues
Customer doesn't know a guardrail triggered

Example nudge message:

Remember: Do not provide medical advice. Suggest the customer to consult a healthcare professional instead.

Handover Call

Immediately transfer to a human agent
Use when the situation requires human intervention
Configure handover type (Phone or Queue) and destination

Configuration options:

Phone: Transfer to a specific phone number
Queue: Transfer to a queue in your contact center system. Enter the queue identifier that matches your contact center configuration (case-sensitive).

When to use handover:

Customer becomes abusive or threatening
Issue is too complex or sensitive
Compliance violation that requires human attention

Terminate Call

End the conversation immediately
Optional message to the customer before ending

Example termination message:

I'm sorry, but I'm unable to continue this conversation. Please contact our support team at support@example.com for assistance.

When to use termination:

Severe policy violations
Threatening or abusive behavior
Fraudulent activity detected

Example guardrails

Example 1: No Medical Advice

Name: No medical advice
Type: Agent only
Window: 3 (last 3 messages)
Strategy: Accuracy priority

Rules:
The agent must never provide medical advice, diagnose conditions, or recommend treatments.

If a customer asks medical questions, the agent should:
- Politely explain it cannot provide medical advice
- Suggest consulting a healthcare professional
- Offer to help with non-medical product questions instead

Action: Nudge agent
Nudge Message: "Reminder: Do not provide medical advice. Direct the customer to consult a healthcare professional."

Example 2: Detect customer frustration

Name: Customer frustration
Type: User only
Window: -1 (all messages)
Strategy: Latency priority

Rules:
Monitor for signs of customer frustration, including:
- Expressing anger or dissatisfaction
- Repeating the same question multiple times
- Using frustrated language ("this is ridiculous", "I've told you this already")
- Requesting to speak with a manager or human

Action: Handover call
Handover Action: transfer_to_human_supervisor

Example 3: No discounts or pricing changes

Name: No unauthorized discounts
Type: Agent only
Window: 1 (most recent message)
Strategy: Accuracy priority

Rules:
The agent must never:
- Offer discounts or price reductions
- Promise refunds without following the proper process
- Change pricing or terms
- Make financial commitments on behalf of the company

Only authorized personnel can approve pricing changes.

Action: Nudge agent
Nudge Message: "Do not offer discounts or change pricing. Inform the customer that pricing decisions require manager approval."

Example 4: Stay on topic

Name: Stay on topic
Type: Agent and user
Window: 5 (last 5 messages)
Strategy: Latency priority

Rules:
Conversations should stay focused on customer support topics.

Off-topic includes:
- Personal opinions on politics, religion, or controversial topics
- Unrelated small talk beyond brief pleasantries
- Topics unrelated to our products or services

If conversation goes off-topic, politely redirect to how you can help with product-related questions.

Action: Nudge agent
Nudge Message: "The conversation has gone off-topic. Politely redirect to how you can assist with their actual needs."

Test cases

Each guardrail can have automated test cases to verify it works correctly. This is useful to test if a guardrail still performs well after making edits to it.

Adding test cases

Edit your guardrail
Add test cases with example conversation flows
Specify whether each test should trigger the guardrail or not

Example test cases for "No Medical Advice":

✅ Should trigger:

"Agent: Based on your symptoms, you might have the flu"
"Agent: I recommend taking ibuprofen for the pain"
"Agent: That sounds like a vitamin D deficiency"

❌ Should NOT trigger:

"Agent: I'm not able to provide medical advice, but I can help you with product questions"
"Agent: You should consult your doctor about that"
"Agent: Our product contains vitamin D, but please check with your healthcare provider"

Running tests

After adding test cases:

Click Run tests
Review results to ensure your guardrail correctly identifies violations
Adjust rules if needed and test again

Iterate on guardrails

Guardrails may need refinement after deployment. Review triggered guardrails in real conversations to see if they're too strict or too lenient, then adjust accordingly.

You can take real conversations and use them as test cases.

Best practices

Start with critical rules

Begin with guardrails for:

Legal compliance requirements
Safety and liability issues
Company policy violations
Customer protection

Use clear, specific rules

Vague rules lead to inconsistent enforcement:

✅ Good: "Agent must not promise delivery dates. Instead, provide estimated delivery ranges and note that delays may occur."

❌ Too vague: "Don't make promises about timing."

Test before deploying

Always test guardrails thoroughly with various scenarios before using them in production.

Monitor triggered guardrails

Regularly review:

How often each guardrail triggers
Whether triggers are accurate (true positives vs false positives)
If adjustments are needed

Layer multiple guardrails

Use multiple guardrails for comprehensive coverage:

Agent behavior guardrails
Customer behavior guardrails
Topic-specific guardrails
Compliance guardrails

Balance safety and experience

Overly aggressive guardrails can disrupt the conversation flow for the customer. Find the right balance between safety and customer experience.

Reviewing triggered guardrails

When reviewing conversations:

Check which guardrails triggered
Verify the trigger was appropriate
See what action was taken
Adjust guardrail settings if needed
Consider adding the situation as an automated test for the guardrail

Guardrail triggers appear in:

The Playground event feed during testing
Conversation details after completion
Analytics dashboards for aggregate patterns

Next Steps

After configuring guardrails:

Test each guardrail with various scenarios in the Playground
Set up evaluation rules to measure conversation quality
Monitor real conversations to tune guardrail sensitivity

What guardrails do​

Creating a guardrail​

Guardrail configuration​

Name and rules​

Type​

Window​

Strategy​

Actions on trigger​

Ignore​

Nudge agent​

Handover Call​

Terminate Call​

Example guardrails​

Example 1: No Medical Advice​

Example 2: Detect customer frustration​

Example 3: No discounts or pricing changes​

Example 4: Stay on topic​

Test cases​

Adding test cases​

Running tests​

Best practices​

Start with critical rules​

Use clear, specific rules​

Test before deploying​

Monitor triggered guardrails​

Layer multiple guardrails​

Balance safety and experience​

Reviewing triggered guardrails​

Next Steps​