Skip to main content
IE
Innovate / Engineer
Start Free Sprint
WIKI · STAGE 08 · DEVELOP

· A/B Test

ACTIVITY 08.20.02 · 5 MIN READ

A/B test, measured.

Also called:  Split test · Controlled experiment · Conversion test · A/B/n test

Show two versions to comparable traffic, change one thing, and let conversion rate decide, once you have enough data to trust the number.

— TL;DR

Pick one variable. Split traffic evenly. Wait until the sample is large enough that the gap is unlikely to be noise. Then call the winner. For a physical product the test usually lives on the landing page, price, imagery and packaging, not the hardware.

• • •

What an A/B test is

An A/B test shows version A to half your visitors and version B to the other half, at the same time, drawn from the same source, and compares one outcome you actually care about. For a pre-order page that outcome is conversion: of the people who landed, how many committed. Everything else is decoration.

Two rules carry the whole method. First, change one variable. If B has a new price and a new headline and a new photo, and B wins, you have learned that the bundle beat the old bundle. You have not learned which change did the work, so you cannot reuse it. One variable per test, every time.

Second, gather enough data before you call it. A conversion rate built on 30 visitors is almost pure noise. Flip a fair coin 30 times and you will often see 60/40; that does not make the coin biased. The same statistics apply to your two pages. You need a sample large enough that the difference you are seeing is unlikely to be chance, and you need to decide that sample size before you start, not when the result starts looking the way you hoped.

In my experience the most expensive A/B mistake is not picking the wrong winner. It is peeking early, seeing B ahead after a day, switching everyone to B, and never finding out the lead was random. Set the stopping rule first. Then hold to it.

A worked test · the proofing box

We ran three tests on the proofing box DTC pre-order page, one variable at a time, splitting the Sourdough School traffic so each version saw a comparable audience. Here is the price test in full, so you can see the shape of a result worth trusting rather than a generic template.

A/B tests · the proofing box
What we variedPrice only. Version A at £149, version B at £159. Same hero image, same headline, same page.
Sample1,840 visitors from the Sourdough School list, split close to 50/50, run over eleven days until the planned sample was reached.
Result£149 pre-ordered at 4.1%; £159 at 2.8%. Roughly a third more conversions at the lower price, on near-identical traffic.
SignificanceAt that sample the gap clears the usual 95% confidence bar (p well under 0.05). Unlikely to be chance, so worth acting on.
DecisionLaunch at £149. The extra £10 cost more in lost orders than it earned per unit on a £38–55 bill of materials.

The in-use hero image then beat the studio shot, and the no-app headline beat the reliability headline, each tested on its own afterwards. Three clean tests, three single variables, three numbers we could defend. Run together they would have told us nothing.

Calling it on noise, or doing it properly

✕  Calling a winner on 30 visits
  • Stops the test after 30 visitors because B is “clearly” ahead.
  • Changes price, photo and headline in the same version.
  • Peeks daily and switches the moment a lead appears.
  • Reports a percentage with no idea if it is significant.
✓  One variable, enough data, real signal
  • Sets the target sample size before the test starts.
  • Changes exactly one thing between A and B.
  • Runs to the planned sample, then reads the result once.
  • Checks the gap clears a 95% confidence bar before acting.

The left column feels faster and is usually wrong. The right column is slower by a few days and tells you something you can actually build on. On a launch decision worth thousands of units, the few days are cheap.

How it fits the bigger picture

A/B test is activity 08.20.02 in the framework, inside Stage 08 Develop. It feeds straight into fit, form & function (08.20.03), where the validated price, image and message inform how the physical product and its presentation are finalised before manufacture.

01 02 03 04 05 06 07 08 09 10 Idea Discover Innovate Evaluate Define Design Engineer Develop Manufacture Deliver YOU ARE HERE

What it can do

It replaces an argument about price, imagery or wording with a number drawn from real buyers. Done one variable at a time, it tells you not just which version won but by how much, so you can weigh a 1.3-point conversion lift against a £10 margin difference and decide on evidence rather than instinct.

What it can’t do

It can’t test more than one change at once and still give you a clean answer, and it can’t manufacture a signal that isn’t there. If your traffic is too thin to reach a meaningful sample in a sensible window, an A/B test will only dress up a coin-flip as a finding. Sometimes the honest answer is that you don’t have the volume to test yet.

See the full 10-stage process →

Try it yourself

Take your pre-order or product page. Pick the one variable you most want resolved: price, hero image, headline or packaging shot. Build two versions identical but for that one thing. Decide your target sample and stopping rule before you launch. Split traffic evenly, run it to the planned size, and read conversion once. If the gap doesn’t clear a 95% confidence bar, treat it as “no difference proven” and move on.

Want help framing what to test first? Start the Free Sprint → and the GPT will help you find the riskiest assumption worth a split test.

Your A/B-test checklist

Project notes: £149 beat £159

  From the notebook · optional reading

Three split tests on the proofing box pre-order page in Stockport, the temptation to call it early, and the £10 that nearly cost a third of the orders.

3 min read · click to open

Dan and Anna Hartley wanted to launch the proofing box at £159. The hunch was reasonable: premium ceramic, a UKCA and BS EN 61010 build, why not. We pushed back. “Don’t guess the price. Split it.” They had the Sourdough School audience to draw on, which is exactly the controlled traffic source a split test needs.

Test 1: price

One variable, price, everything else held still. £149 against £159. After two days £149 was ahead and Dan wanted to call it. I asked him to wait: “We agreed 1,800-odd visitors and eleven days. We’re at 300. That lead could vanish.” It didn’t, as it turned out, but it could have, and the discipline mattered more than this one outcome. At the planned sample £149 sat at 4.1% and £159 at 2.8%, comfortably past the 95% bar.

The maths was blunt. On a £38–55 bill of materials the extra £10 of price added margin per unit, but lost roughly a third of the orders. Fewer units at a fatter margin lost to more units at £149. We launched at £149.

Tests 2 and 3: image, then headline

Only after the price was settled did we test the hero image: the box in use on a kitchen counter against a clean studio shot. In-use won. Then, separately, the headline: the no-app promise against the reliability promise. No-app won, and not by a little.

The thing I keep coming back to: if we’d changed price, image and headline all at once to “save time”, we’d have a winning page and no idea why. Three clean tests cost about three weeks and gave us three reusable findings. Anna now tests packaging the same way for the year-two range.

— Develop stage, project notes, 2026

— Next in Develop → Fit, form & function