Design a Test You Can Finish This Week · Practical AI Experimentation

Small enough to run; honest enough to trust.

Your control is how you work today (or your weakest usual prompt). Your variant is the hypothesis: GCSE-filled prompt, added context, chain-of-thought, or a saved library pattern.

Plan for one sitting: same document, same audience, same success criteria. If the test needs a week of setup, shrink it.

Core principles

Hold constant: the task, source material, audience, and how you score results.
Change one thing: usually the prompt or the context you provide.
Score simply: time to usable draft, number of factual fixes, would-you-send rating (1–5).
Run each version once before optimizing — premature iteration confuses the comparison.
Document both outputs. You will forget which was better by tomorrow.

Prerequisite: Copilot Basics — GCSE prompting

Check yourself

What must stay constant when comparing two prompts on the same task?

Do this in Copilot

Draft a control and variant prompt for your question from Lesson 1. List what stays constant.

Paste this into Copilot Chat and work through it before moving on.

Draft an experiment plan

I want to test whether Copilot helps with [TASK]. Audience: [WHO]. Success looks like: [METRIC]. Design a 30-minute experiment: control prompt, variant prompt, what I hold constant, and how I score results. Keep it doable in one sitting.

Open Copilot →

GCSE framework
Constraints

Example constant: same meeting notes, same recipient, same length limit.
Example score: "Usable without major rewrites" yes/no plus minutes spent editing.

I want to test whether Copilot helps with [TASK]. Audience: [WHO]. Success looks like: [METRIC]. Design a 30-minute experiment: control prompt, variant prompt, what I hold constant, and how I score results. Keep it doable in one sitting.

Did you run this in Copilot? Mark complete when you have tried it.

Next lesson: Compare Outputs Without Fooling Yourself →