Signing in

You will be sent to MillerKnoll sign-in.

Compare Outputs Without Fooling Yourself

Humans are bad at neutral comparison. We favor longer answers, confident tone, and the version we saw most recently. Experiments need guardrails.

Lesson 3

Your brain prefers the last thing it read.

Score against criteria you wrote before seeing outputs: time to edit, factual errors, fit for audience, would I send this?

If you cannot tell which output is better without heavy editing, the experiment is telling you the task needs clearer prompts or more context — not that AI failed.

Core principles

  1. Pre-register criteria: write how you will score before you run the variant.
  2. Compare side by side when possible; hide which prompt produced which if you can.
  3. Note surprises: where did each version invent, hedge, or miss your audience?
  4. One win does not mean always use AI — it means this prompt on this task under these conditions.
  5. Share results with one colleague; solo experiments inherit solo blind spots.

Check yourself

Why write scoring criteria before you see the outputs?

Do this in Copilot

Run your control and variant. Score both against your pre-registered criteria. Note the winner and why.

Paste both outputs after running your experiment. Review Copilot's scores — you still own the final call.

Blind comparison helper

I will paste two drafts labeled A and B (you do not know which prompt produced which). Score each on: (1) clarity for [AUDIENCE], (2) factual risks to verify, (3) edits needed before send. Then recommend A or B with one sentence of reasoning.
Open Copilot →
  • Constraints
  • Claim spot-check

Did you run this in Copilot? Mark complete when you have tried it.

Next lesson: Debug a Bad Run →