Compare Outputs Without Fooling Yourself · Practical AI Experimentation

Your brain prefers the last thing it read.

Score against criteria you wrote before seeing outputs: time to edit, factual errors, fit for audience, would I send this?

If you cannot tell which output is better without heavy editing, the experiment is telling you the task needs clearer prompts or more context — not that AI failed.

Core principles

Pre-register criteria: write how you will score before you run the variant.
Compare side by side when possible; hide which prompt produced which if you can.
Note surprises: where did each version invent, hedge, or miss your audience?
One win does not mean always use AI — it means this prompt on this task under these conditions.
Share results with one colleague; solo experiments inherit solo blind spots.

Check yourself

Why write scoring criteria before you see the outputs?

Do this in Copilot

Run your control and variant. Score both against your pre-registered criteria. Note the winner and why.

Paste both outputs after running your experiment. Review Copilot's scores — you still own the final call.

Blind comparison helper

I will paste two drafts labeled A and B (you do not know which prompt produced which). Score each on: (1) clarity for [AUDIENCE], (2) factual risks to verify, (3) edits needed before send. Then recommend A or B with one sentence of reasoning.

Open Copilot →

Constraints
Claim spot-check

I will paste two drafts labeled A and B (you do not know which prompt produced which). Score each on: (1) clarity for [AUDIENCE], (2) factual risks to verify, (3) edits needed before send. Then recommend A or B with one sentence of reasoning.

Did you run this in Copilot? Mark complete when you have tried it.

Next lesson: Debug a Bad Run →