Automated Design Critique

An expert reviewer
that uses your app.

An LLM-powered agent opens a real browser, follows your use case scenarios, and provides expert critique on information architecture, interaction design, visual hierarchy, and accessibility — like having a UX reviewer on call.

mix touchpoints.ux_review --headless false
Found 10 use cases with scenarios
UC-TPW-PersonaWorkspace: Creating a new persona
visit: Navigated to /personas
[strength] Clear call to action for “New persona” — prominent button placement
[suggestion] The empty state could include a brief explanation of what personas are for
[suggestion] After saving, list doesn’t scroll to the newly created item
OK — 3 observations, 12.4K tokens, 8 iterations
UC-TPW-Preview: Previewing the selected widget

Real browser

Wallaby + ChromeDriver

Expert eye

IA, interaction, visual, a11y

Use cases

Scenarios from your product cycle

Report

Categorized markdown output

How it works

A cognitive walkthrough, automated.

The agent reads your Gherkin use cases, opens a browser, logs in as the right persona, and attempts each scenario. Along the way it records expert observations about the design — not pass/fail assertions, but the kind of critique you’d get from a senior designer doing a walkthrough.

  1. Load scenarios

    The agent parses docs/USE_CASES.md and extracts Gherkin scenarios with their Given/When/Then steps. Each scenario becomes a task prompt for the reviewer.

  2. Authenticate

    In interactive mode, the browser opens and waits for you to log in manually — you can review as any user. In auto mode, the agent creates a user with the right role and authenticates via magic link.

  3. Navigate and critique

    The LLM receives the current page HTML and decides what to do next — click a link, fill a form, or record an observation. It thinks like a design expert: labeling, hierarchy, flow, affordances, accessibility.

  4. Generate report

    When all scenarios are done, the agent writes a categorized markdown report with observations grouped by category and severity — strengths, suggestions, concerns, and blockers.

What the reviewer looks for

Expert observations across six design dimensions.

Information architecture

Page structure, content grouping, labeling systems, navigation paths, and findability. Can users build a mental model of where things are?

Interaction design

Click targets, form flows, feedback after actions, error states, loading indicators, and progressive disclosure.

Visual design

Hierarchy, contrast, spacing, alignment, consistency of patterns, and whether the visual weight guides attention correctly.

Accessibility

Semantic HTML, heading order, form labels, color contrast, focus management, and screen reader considerations.

Content

Clarity of labels, instructions, error messages, and confirmation text. Is the language plain and actionable?

Navigation

Wayfinding cues, breadcrumbs, back links, and whether users always know where they are and how to get where they need to go.

Output

Actionable observations, not noise.

The report groups observations by category and severity. Each note references what the reviewer actually saw on the page — specific enough to act on, contextual enough to understand why it matters.

Strengths are noted too. Good design decisions deserve recognition, and they help the team understand which patterns to replicate.

Sample observations

Strength The persona workspace uses a clear card layout with name, nickname, and description visible at a glance. Good information density without clutter.
Suggestion The form builder’s “Add question” button sits below the fold after 4+ questions. Consider a sticky action bar or a floating add button.
Concern After submitting the feedback form, the success message uses the same visual weight as the form instructions. Users may not realize their submission went through.
Get started

One command, full review.

The reviewer runs as a Mix task. Start your dev server, run the command, and watch it work — or run headless and read the report after.

Interactive mode pauses for you to log in manually, so you can review as any user with any auth method. Auto mode creates a synthetic user and handles login via magic link.

# Watch the browser as it reviews
$ mix touchpoints.ux_review --headless false
# Review a specific use case
$ mix touchpoints.ux_review --scenario UC-TPW-PersonaWorkspace
# Auto-login for headless/CI runs
$ mix touchpoints.ux_review --login auto
# Set a token budget
$ mix touchpoints.ux_review --max-tokens 100000
=== UX Review Complete ===
8/10 scenarios completed
23 design observations recorded
142,000 tokens used
Report: reports/ux_review_2026-03-21_143022.md
Architecture

Pure Elixir. No new dependencies.

Built on Wallaby (browser automation) and Req (HTTP) — both already in the project. The Claude API provides the design expertise via tool use. Each module is a pure function with explicit state, making it easy to test and extend.

The architecture is designed for extraction. The core loop — observe page, decide action, record critique — has no Mix-specific code inside it. It could live in its own repo and work against any Phoenix app.

Module structure

Parser
Reads USE_CASES.md → Gherkin scenario structs
Claude Client
Messages API with tool use via Req
Browser Tools
Wallaby actions exposed as LLM tools
Agent Loop
Observe → act → critique → repeat
Reporter
Categorized markdown report output

Let your app review itself.

Write use cases, run the reviewer, improve the design. Expert UX feedback on every iteration.