Compare Outputs Without Fooling Yourself
Humans are bad at neutral comparison. We favor longer answers, confident tone, and the version we saw most recently. Experiments need guardrails.
Lesson 3
Your brain prefers the last thing it read.
Score against criteria you wrote before seeing outputs: time to edit, factual errors, fit for audience, would I send this?
If you cannot tell which output is better without heavy editing, the experiment is telling you the task needs clearer prompts or more context — not that AI failed.
Core principles
- Pre-register criteria: write how you will score before you run the variant.
- Compare side by side when possible; hide which prompt produced which if you can.
- Note surprises: where did each version invent, hedge, or miss your audience?
- One win does not mean always use AI — it means this prompt on this task under these conditions.
- Share results with one colleague; solo experiments inherit solo blind spots.
Check yourself
Why write scoring criteria before you see the outputs?
Without pre-registered criteria, you unconsciously reward confident prose or whichever draft you read last. Decide what good means first; then judge both outputs against that bar.
Do this in Copilot
Run your control and variant. Score both against your pre-registered criteria. Note the winner and why.
Paste both outputs after running your experiment. Review Copilot's scores — you still own the final call.
Blind comparison helper
I will paste two drafts labeled A and B (you do not know which prompt produced which). Score each on: (1) clarity for [AUDIENCE], (2) factual risks to verify, (3) edits needed before send. Then recommend A or B with one sentence of reasoning.
- Constraints
- Claim spot-check
Did you run this in Copilot? Mark complete when you have tried it.
RecordedNext lesson: Debug a Bad Run →
Navigate: press j for next lesson, k for previous.