Automated Design Critique
An expert reviewer
that uses your app.
An LLM-powered agent opens a real browser, follows your use case scenarios, and provides expert critique on information architecture, interaction design, visual hierarchy, and accessibility — like having a UX reviewer on call.
Real browser
Wallaby + ChromeDriver
Expert eye
IA, interaction, visual, a11y
Use cases
Scenarios from your product cycle
Report
Categorized markdown output
A cognitive walkthrough, automated.
The agent reads your Gherkin use cases, opens a browser, logs in as the right persona, and attempts each scenario. Along the way it records expert observations about the design — not pass/fail assertions, but the kind of critique you’d get from a senior designer doing a walkthrough.
-
Load scenarios
The agent parses
docs/USE_CASES.mdand extracts Gherkin scenarios with their Given/When/Then steps. Each scenario becomes a task prompt for the reviewer. -
Authenticate
In interactive mode, the browser opens and waits for you to log in manually — you can review as any user. In auto mode, the agent creates a user with the right role and authenticates via magic link.
-
Navigate and critique
The LLM receives the current page HTML and decides what to do next — click a link, fill a form, or record an observation. It thinks like a design expert: labeling, hierarchy, flow, affordances, accessibility.
-
Generate report
When all scenarios are done, the agent writes a categorized markdown report with observations grouped by category and severity — strengths, suggestions, concerns, and blockers.
What the reviewer looks for
Expert observations across six design dimensions.
Information architecture
Page structure, content grouping, labeling systems, navigation paths, and findability. Can users build a mental model of where things are?
Interaction design
Click targets, form flows, feedback after actions, error states, loading indicators, and progressive disclosure.
Visual design
Hierarchy, contrast, spacing, alignment, consistency of patterns, and whether the visual weight guides attention correctly.
Accessibility
Semantic HTML, heading order, form labels, color contrast, focus management, and screen reader considerations.
Content
Clarity of labels, instructions, error messages, and confirmation text. Is the language plain and actionable?
Navigation
Wayfinding cues, breadcrumbs, back links, and whether users always know where they are and how to get where they need to go.
Actionable observations, not noise.
The report groups observations by category and severity. Each note references what the reviewer actually saw on the page — specific enough to act on, contextual enough to understand why it matters.
Strengths are noted too. Good design decisions deserve recognition, and they help the team understand which patterns to replicate.
Sample observations
One command, full review.
The reviewer runs as a Mix task. Start your dev server, run the command, and watch it work — or run headless and read the report after.
Interactive mode pauses for you to log in manually, so you can review as any user with any auth method. Auto mode creates a synthetic user and handles login via magic link.
Pure Elixir. No new dependencies.
Built on Wallaby (browser automation) and Req (HTTP) — both already in the project. The Claude API provides the design expertise via tool use. Each module is a pure function with explicit state, making it easy to test and extend.
The architecture is designed for extraction. The core loop — observe page, decide action, record critique — has no Mix-specific code inside it. It could live in its own repo and work against any Phoenix app.
Module structure
Let your app review itself.
Write use cases, run the reviewer, improve the design. Expert UX feedback on every iteration.