FIELD NOTES · 2026-05-29

The centroid problem:
why AI design regresses to the mean.

Open v0, Bolt, Lovable, Cursor, or Copilot, type a different prompt into each, and you get the same page back: Inter, a violet-to-indigo gradient, three equal cards, a centered hero, rounded-2xl everything. The sameness is not a coincidence and it is not a taste failure. It has a mechanism: the model is returning the centroid of its training set. Once you see that, the fix is obvious, and it is not "prompt better." (For the tool-by-tool version with a real export lint report, see the companion piece on AI-built websites.)

The sameness has a shape

Before the cause, the symptom. Across tools and prompts, AI-generated frontends converge on a recognizable centroid:

Inter (or the system stack) as the only typeface, on every surface, brand or not.
A violet → indigo gradient, usually #6366f1 → #8b5cf6 or a near neighbor.
Three equal cards in a row, each with an icon, a heading, and two lines of body.
A centered hero: centered eyebrow, centered headline, centered subhead, centered button.
rounded-2xl on everything, a soft drop shadow, a faint glass blur.
Emoji standing in for icons. Filler copy built from "leverage," "seamless," "empower."

We catalogue these as machine-checkable rules: the visual fingerprints of generated design. The point here is simpler: this is a distribution with a peak, and almost everything lands on the peak.

Why models converge on one look

1. Regression to the training mean

A generative model is, at heart, a function that returns the most probable continuation. "Most probable" for a landing page is the average of every landing page it has ever seen. The average of millions of Tailwind starter templates, dribbble shots, and shadcn demos is the violet-gradient, three-card, centered-hero page. The model is not being lazy. It is doing exactly what it was trained to do: return the centroid. Variance is penalized; the mean is rewarded.

2. Everyone shares the same scaffolding

shadcn/ui, Tailwind's default scale, and a handful of component libraries are the substrate under almost every AI builder. They are excellent, and they ship with defaults. When nobody overrides --primary, every app inherits the same primary. When nobody re-spaces the scale, every app breathes at the same rhythm. Shared defaults are shared identity. The tools converge because their foundations already did.

3. Identical briefs produce identical output

"Build me a landing page for my SaaS" carries almost no differentiating signal. A fintech dashboard and a kids' learning app submitted with that prompt are, to the model, the same request, so they get the same answer. The brief is where differentiation has to enter, and a one-line prompt has none to give.

4. There is no memory

Every generation starts from zero. The tool does not know what you accepted last time, what your brand actually is, or that it produced this exact hero for three other users this morning. With no memory, there is no way to move away from the centroid over time. Each run snaps back to the mean.

The sameness is not the model failing. It is the model succeeding at the wrong objective: return the average, with no brief to differentiate and no memory to learn from.

Why "just prompt better" doesn't fix it

Prompt engineering can nudge a single output off the peak. Ask for "brutalist, monochrome, no gradients" and you will get something less generic, once. But three problems remain:

It is not reproducible. The same prompt next week, or in a different tool, drifts back toward the mean.
It gives variance, not direction. You are pushing randomly away from the centroid, not toward a coherent identity derived from your actual brand.
It does not compound. Nothing is learned. The next page is as generic as the first, and you re-fight the same battle every time.

You can win individual skirmishes with prompting. You cannot win the war with it, because the pull toward the mean is structural.

What actually breaks the sameness

If the cause is "no brief, shared defaults, no memory, regress to the mean," the fix is the inverse of each. This is the approach ux-skill takes, deterministic, offline, no LLM in the loop:

Differentiate from a brief, not a prompt

Reduce the brief to seven continuous values (warmth, contrast, density, geometry, formality, motion, type personality) and a fintech dashboard and a kids' app stop being the same request. Those values then compile to a palette, a type ladder, a spacing scale, radii, and motion timings. Two different briefs produce two genuinely different systems, every time, because the math is different.

How the 7 axes map a brief to tokens

Use real brands as training data, not templates

The trap is to ship N brand templates and let users pick one: that just moves the centroid, it doesn't remove it. Instead, 160 documented brand systems act as training data: the synthesizer learns the relationships between axes and tokens from them, then generates an unbounded space of new systems. When you name a brand, you get its accuracy; when you don't, you get something new that was informed by all of them.

Why brand specs are training data, not templates

Lint the fingerprints out

The centroid look is enumerable, so it is checkable. A deterministic linter scans output for the violet gradient, the three-equal-card row, Inter-on-a-brand-surface, emoji icons, centered-everything, and the rest, and fails the build when it finds them. You cannot ship the average if the average does not pass the linter.

A regex linter for AI design slop · Browse the anti-pattern catalogue

Close the loop so it compounds

Record which designs scored well and were accepted; let that history re-rank future recommendations. The system moves away from the mean over time instead of snapping back to it on every run. Memory is what turns one good output into a direction.

What closed-loop actually means

A worked contrast

Same request, two paths. The brief: a payments dashboard, tone "serious, dense, trustworthy."

	Centroid output	Synthesized output
Type	Inter, one size up for h1	Geometric display + Inter body, 1.333 ratio from high contrast
Color	Violet → indigo gradient	Cool, low-warmth palette, single saturated accent
Spacing	Default Tailwind scale	4px base: dense wins over formal in the interaction matrix
Layout	Three equal cards, centered hero	Asymmetric grid, data-dense, left-anchored
Reproducible?	Only by luck	Identical inputs → identical output, every run

The second column is not "more creative." It is more specific, derived from the brief instead of regressed from the corpus. Specificity is the opposite of sameness.

The point

AI builders are not bad at design. They are good at returning the average, and the average has a look. To get something that is yours, you need three things the average can't supply: a brief that differentiates, a generator that creates instead of selects, and a memory that learns. Add those and the violet gradient stops being your default.

If you want to identify whether a site was already generated by AI, the nine visible tells are documented at how to tell if a website was AI generated, no code access required.

ux-skill is MIT-licensed, runs offline, ships no telemetry, and calls no model. Install it into Claude Code, Cursor, Windsurf, and 14 more tools:

pip install uxskill
# or
npx uxskill init