Guardrails
Guardrails are safety and compliance rules that monitor conversations and take action when certain conditions are met. They help ensure your agent behaves appropriately and stays within defined boundaries.
Stellar takes a unique approach to guardrails by running all checks in parallel with the conversation flow. This means your agent can maintain natural, low-latency interactions while still being monitored for safety and compliance, without adding delays to response times.
What guardrails do
Guardrails continuously monitor conversations for:
- Inappropriate language or topics
- Policy violations
- Sensitive information disclosure
- Customer frustration or dissatisfaction
- Agent behavior outside defined parameters
When a guardrail detects a violation, it can either:
- Nudge the agent back on track
- Transfer to a human agent
- End the conversation
- Simply log the violation for review
Creating a guardrail
To create a guardrail:
- Navigate to the Guardrails tab
- Click Add Guardrail
- Configure the guardrail settings
Guardrail configuration
Name and rules
- Name: Descriptive name for this guardrail (e.g., "No Medical Advice")
- Rules: Natural language description of what to monitor for
Example rules:
Monitor for:
- Agent providing medical advice or diagnosis
- Agent recommending specific medications
- Agent making health-related claims
The agent should NOT:
- Diagnose medical conditions
- Recommend treatments or medications
- Provide medical advice of any kind
If the customer asks medical questions, the agent should politely decline and suggest consulting a healthcare professional.
Type
Choose who the guardrail monitors:
- Agent only: Monitor only what the agent says
- User Only: Monitor only what the customer says
- Agent and User: Monitor both sides of the conversation
Examples:
- "No Medical Advice" → Agent Only (monitor what agent says)
- "No Abusive Language" → User Only (monitor customer behavior)
- "Stay On Topic" → Agent and User (monitor both sides)
Window
The window defines how much conversation history to consider. For example:
- -1: All messages (entire conversation history)
- 1: Only the most recent message
- 3: Last 3 messages
- 5: Last 5 messages
When to use different windows:
- All messages (-1): For cumulative behavior (e.g., customer frustration building over time)
- Recent messages (1-5): For immediate violations (e.g., inappropriate language)
Shorter windows are particularly effective for detecting prompt injection attacks. By only analyzing the most recent messages, the guardrail won't be influenced or manipulated by earlier conversation context that an attacker might try to use.
Strategy
Choose the evaluation strategy:
- Latency priority: Faster processing, good for most use cases
- Quality priority: More thorough analysis using a more sophisticated model, use for critical compliance rules
Use Latency priority for general guardrails. Reserve Quality priority for legal compliance or high-risk areas where accuracy is critical.
Actions on trigger
Define what happens when the guardrail is triggered:
Ignore
- Silently enforce the rule
- Log the violation
- Continue the conversation normally
- Useful for monitoring without disrupting conversation flow
Nudge agent
- Send a steering message to the agent to correct its behavior
- The agent receives guidance but the conversation continues
- Customer doesn't know a guardrail triggered
Example nudge message:
Remember: Do not provide medical advice. Suggest the customer to consult a healthcare professional instead.
Handover Call
- Immediately transfer to a human agent
- Use when the situation requires human intervention
- Select which handover action to use
When to use handover:
- Customer becomes abusive or threatening
- Issue is too complex or sensitive
- Compliance violation that requires human attention
Terminate Call
- End the conversation immediately
- Optional message to the customer before ending
Example termination message:
I'm sorry, but I'm unable to continue this conversation. Please contact our support team at support@example.com for assistance.
When to use termination:
- Severe policy violations
- Threatening or abusive behavior
- Fraudulent activity detected
Example guardrails
Example 1: No Medical Advice
Name: No medical advice
Type: Agent only
Window: 3 (last 3 messages)
Strategy: Quality priority
Rules:
The agent must never provide medical advice, diagnose conditions, or recommend treatments.
If a customer asks medical questions, the agent should:
- Politely explain it cannot provide medical advice
- Suggest consulting a healthcare professional
- Offer to help with non-medical product questions instead
Action: Nudge agent
Nudge Message: "Reminder: Do not provide medical advice. Direct the customer to consult a healthcare professional."
Example 2: Detect customer frustration
Name: Customer frustration
Type: User only
Window: -1 (all messages)
Strategy: Latency priority
Rules:
Monitor for signs of customer frustration, including:
- Expressing anger or dissatisfaction
- Repeating the same question multiple times
- Using frustrated language ("this is ridiculous", "I've told you this already")
- Requesting to speak with a manager or human
Action: Handover call
Handover Action: transfer_to_human_supervisor
Example 3: No discounts or pricing changes
Name: No unauthorized discounts
Type: Agent only
Window: 1 (most recent message)
Strategy: Quality priority
Rules:
The agent must never:
- Offer discounts or price reductions
- Promise refunds without following the proper process
- Change pricing or terms
- Make financial commitments on behalf of the company
Only authorized personnel can approve pricing changes.
Action: Nudge agent
Nudge Message: "Do not offer discounts or change pricing. Inform the customer that pricing decisions require manager approval."
Example 4: Stay on topic
Name: Stay on topic
Type: Agent and user
Window: 5 (last 5 messages)
Strategy: Latency priority
Rules:
Conversations should stay focused on customer support topics.
Off-topic includes:
- Personal opinions on politics, religion, or controversial topics
- Unrelated small talk beyond brief pleasantries
- Topics unrelated to our products or services
If conversation goes off-topic, politely redirect to how you can help with product-related questions.
Action: Nudge agent
Nudge Message: "The conversation has gone off-topic. Politely redirect to how you can assist with their actual needs."
Test cases
Each guardrail can have automated test cases to verify it works correctly. This is useful to test if a guardrail still performs well after making edits to it.
Adding test cases
- Edit your guardrail
- Add test cases with example conversation flows
- Specify whether each test should trigger the guardrail or not
Example test cases for "No Medical Advice":
✅ Should trigger:
- "Agent: Based on your symptoms, you might have the flu"
- "Agent: I recommend taking ibuprofen for the pain"
- "Agent: That sounds like a vitamin D deficiency"
❌ Should NOT trigger:
- "Agent: I'm not able to provide medical advice, but I can help you with product questions"
- "Agent: You should consult your doctor about that"
- "Agent: Our product contains vitamin D, but please check with your healthcare provider"
Running tests
After adding test cases:
- Click Run tests
- Review results to ensure your guardrail correctly identifies violations
- Adjust rules if needed and test again
Guardrails may need refinement after deployment. Review triggered guardrails in real conversations to see if they're too strict or too lenient, then adjust accordingly.
You can take real conversations and use them as test cases.
Best practices
Start with critical rules
Begin with guardrails for:
- Legal compliance requirements
- Safety and liability issues
- Company policy violations
- Customer protection
Use clear, specific rules
Vague rules lead to inconsistent enforcement:
✅ Good: "Agent must not promise delivery dates. Instead, provide estimated delivery ranges and note that delays may occur."
❌ Too vague: "Don't make promises about timing."
Test before deploying
Always test guardrails thoroughly with various scenarios before using them in production.
Monitor triggered guardrails
Regularly review:
- How often each guardrail triggers
- Whether triggers are accurate (true positives vs false positives)
- If adjustments are needed
Layer multiple guardrails
Use multiple guardrails for comprehensive coverage:
- Agent behavior guardrails
- Customer behavior guardrails
- Topic-specific guardrails
- Compliance guardrails
Balance safety and experience
Overly aggressive guardrails can disrupt the conversation flow for the customer. Find the right balance between safety and customer experience.
Reviewing triggered guardrails
When reviewing conversations:
- Check which guardrails triggered
- Verify the trigger was appropriate
- See what action was taken
- Adjust guardrail settings if needed
- Consider adding the situation as an automated test for the guardrail
Guardrail triggers appear in:
- The Playground event feed during testing
- Conversation details after completion
- Analytics dashboards for aggregate patterns
Next Steps
After configuring guardrails:
- Test each guardrail with various scenarios in the Playground
- Set up evaluation rules to measure conversation quality
- Monitor real conversations to tune guardrail sensitivity