v3.0 · THE BRAIN · 2026-05-28

A design engine that learns offline,
and never calls an LLM.

Every other "generative design" tool reaches for an API. ux-skill v3.0 doesn't. Same brief always produces the same output. Reproducible across machines. Yet the system genuinely learns from its own decisions log over time. Here's how the math holds.

Why no LLM in the loop

An LLM in the generation path means:

Non-determinism: same prompt, slightly different output today vs. yesterday
Cost: every uxskill synthesize call costs money
Latency: network roundtrip, ~1-3 seconds per call
Rate limits: your CI breaks when you hit the cap
Vendor lock-in: your "design engine" is actually OpenAI's or Anthropic's
Privacy: your brief content goes to a third party

v3.0 has none of this. uxskill synthesize finishes in single-digit milliseconds. No network. No API key required. Works offline on a plane.

How a "generative" engine works without an LLM

The trick is in the layers:

Layer 1: Brief to axes (pure math)

Industry has a 7-axis seed dictionary. Tone tags push axes by ±0.10 to ±0.30. Forbidden tags clamp axes. All deterministic dict lookups + float adds. ~50 microseconds.

brief = Brief(industry="fintech-payments", tone=["bold", "serious"])
axes = compute_axes(brief)
# AxisValues(warmth=0.35, contrast=0.80, density=0.6,
#            geometry=0.4, formality=0.95, motion=0.55,
#            type_personality=0.4)

Layer 2: Axes to exemplars (sorting)

Compute 7-D Euclidean distance from the brief's axes to each brand's category-seed axes. Sort by distance. Tie-break by brand id alphabetically (deterministic). Top 8 win.

exemplars = pick_exemplars_by_axes(axes, n=8)
# [{"id": "stripe", ...}, {"id": "linear", ...}, {"id": "datadog", ...}, ...]

Layer 3: Exemplars to vocabulary (extraction)

For each chosen exemplar, pull palette anchors, type stack, radius signal, spacing signal. Combine into a Vocabulary. Pure dict access. No NLP.

Layer 4: Vocabulary + axes to tokens (math)

Weighted RGB mixing of palette anchors + warmth-shift = canvas / ink / primary hex codes. Modular scale at ratio 1.2/1.25/1.333 = size ladder. Spacing base picked from interactions.spacing_base_for(axes) (the documented conflict-resolution matrix). All deterministic.

Where determinism could leak (and why it doesn't)

If load_brands() returned brands in filesystem-order, two brands with identical axes would tie-break by inode. Different on different machines. v3.0 fixed this:

scored.sort(key=lambda t: (t[0], t[1]))  # (distance, brand_id_alphabetical)

If compute_axes() hashed differently across Python versions, the axis values would drift. We use plain dict adds, no hashing.

If two brands had identical 7-D distance AND identical id (impossible: ids are unique), we'd still be deterministic because of the unique id constraint.

How does it "learn" then?

The decisions ledger. Every call appends one JSONL line to .ux/decisions.jsonl:

{"_v": 1, "ts": 1727380123.4, "command": "design",
 "industry": "fintech-payments", "ui_type": "dashboard",
 "picked_brand": "stripe", "picked_style": "swiss-grid",
 "lint_score": 92, "user_accepted": true, ...}

The recommender, on the next call, reads this log filtered by (industry, ui_type) bucket. For each candidate, it counts how often that candidate's id appears in lines where lint_score >= 80 AND user_accepted = true. Each match adds +5 to the candidate's score before sorting.

Cold-start safe: fewer than 3 priors in a bucket = no-op. Behaves exactly like v2.0. Above the threshold, history bumps winning combinations to the top.

This isn't "ML" in the formal sense. There's no gradient descent. No labeled training data. No model checkpoint. But it IS learning, in the precise sense of "the system's output changes over time in response to feedback."

What the system can't do (be honest)

Generate copy. You write the words. The engine ships the tokens that frame them.
Source imagery. Picsum URLs in dev, you replace before ship.
Judge "modern aesthetic" subjectively. We replaced T's proposed LLM-judged axes with deterministic heuristics. The system measures what's measurable.
Predict viral. Every other tool that claims this is lying.

What the system CAN do

Synthesize a complete design language (palette + type + spacing + radius + motion + 12+ token names) in <10ms per call.
Score outputs 0-100 on 7 axes deterministically.
Auto-iterate polish until score ≥ 90 or 5 rounds.
Refuse to ship below quality gate (65). Force with --force.
Compose responsive-by-construction layouts (auto-fit minmax, container queries, zero media queries needed).
Learn over time which token combinations work for your specific industry + UI type.
Run offline on a plane, in a CI sandbox, in an air-gapped network.

Try it

pip install uxskill
uxskill stats --decisions     # what your install has logged
uxskill stats --html          # local dashboard at .ux/stats.html
uxskill lint . --score-only   # just the int
uxskill evolve out.html       # auto-loop to score 90+