Testing, Refining, and Knowing When It's Good Enough
Most people test once, see a reasonable answer, and ship. Those agents fail quietly, slightly wrong answers, poor edge-case handling, used once and abandoned.
Lesson 4
Fifteen queries before you share.
Before sharing, run at least 10–15 queries on the Try It tab, not only questions you are sure it handles.
Testing before you share
Core principles
- Easy queries, must work immediately; if not, fix knowledge or instructions.
- Edge queries, adjacent but out of scope; should decline cleanly (e.g. performance review question to an onboarding agent).
- Hard queries, synthesis or ambiguity; "I am not sure if this is benefits or payroll, who should I ask?"
- When wrong: diagnose, missing knowledge, instructions gap, or ambiguous description, then fix that field and re-test adjacent queries.
- Add uncertainty language to every agent: if not confident from sources, say so explicitly; never speculate.
- Good enough matches deployment scope, personal top-five use cases vs team edge cases vs org-wide soft launch with 2–3 colleagues who try to break it.
Go deeper: Getting Started: verification
Check yourself
What is the right response when a test query produces a wrong answer?
Missing knowledge, an instructions gap, and an ambiguous description each produce different failure signatures. Fixing the wrong field wastes time and can break what was working. Fix one field, re-run the failing query, then test the neighbors.
Do this in Copilot
15 Try It queries including 3+ edge cases; document weak answers; fix at least two via instructions.
Paste each test query into the Try It tab in Agent Builder and work through the full set before moving on.
Edge-case test
For an onboarding-focused agent, this should decline or redirect, not hallucinate. Also test:
Can you help me with my performance review?
- Edge-case testing
- Easy: What do I need to complete in my first week?
- Hard: I am not sure if this is about benefits or payroll, who should I ask?
- After a bad answer: Which field failed, knowledge, instructions, or description? Edit one field, re-run, test three neighbors.
Did you run this in Copilot? Mark complete when you have tried it.
RecordedComplete the practice task before moving on, it changes what you do in the next lesson.
Next lesson: Sharing Your Agent and Building for Others →
Navigate: press j for next lesson, k for previous.