October 22, 20247 min read1,387 words

How To Design Interfaces When AI Agents Are Users

The interface that answers itself, written for two audiences

An interface designed for one audience can pretend the other does not exist. An interface designed for two cannot.

A user opens an app, types a request, and walks away from the laptop. The agent operates the interface for the next twenty minutes, navigating through screens the user will never see, making decisions the user trusted it to make, and assembling a result the user will read when they come back. The interface served two audiences in those twenty minutes. The human at the start and end. The agent in the middle. The same screens, the same flows, the same buttons. Two completely different ways of being read.

This pattern is becoming common enough that it deserves a category, and the category challenges most of the assumptions a designer brings to a brief.

Why this is a different design problem

A traditional interface has one user. The user is human. The user reads the screen, interprets it through accumulated context, and acts on it through fingers and voice. The designer’s job is to anticipate what the user wants, present it legibly, and make the action easy. Decades of practice have built up the rules for how to do this well.

An interface with an agent in the loop has two users. One reads the screen with eyes that have been looking at screens for thirty years. The other parses the markup with no aesthetic sense at all and no patience for visual cues that do not have a textual equivalent. The two users are not the same user. They want different things and they fail in different ways.

The human user wants the interface to feel coherent, reduce cognitive load, and reward attention with information. The agent user wants the interface to be machine-readable, semantically structured, and predictable across screens. Designs optimised for the human alone often produce interfaces that are visually beautiful and structurally opaque. Designs optimised for the agent alone produce interfaces that are technically clean and aesthetically dead. Neither works for the dual-audience case, which is where the design problem actually lives.

What agents need that humans rarely notice

A long list of small things that humans tolerate, because the brain compensates, are blocking errors for an agent.

Visual hierarchy that depends only on size and weight, without semantic markup underneath, is invisible to an agent. The agent does not know which heading is more important than another. The h1, h2, h3 structure exists for the agent. The visual styling exists for the human. Most modern interfaces have either or both, but the proportion that have a semantic structure that matches the visual structure is much smaller than designers usually assume.

Status indicators that rely on colour alone are unparseable. A button that is grey instead of blue does not signal “disabled” to an agent unless an aria-disabled attribute, a textual label, or some other machine-readable signal accompanies the colour change. A tag that is red instead of green does not signal “error” without a text equivalent. The agent can read the colour value, but the colour value does not carry meaning unless the markup attaches one.

Hover states hide information from agents the same way they hide information from keyboard users and screen readers. A tooltip that appears on hover is, for an agent, often a tooltip that does not exist. The agent has to know to trigger the hover, and many agents will not. The information needs an alternative path, either by being visible without hover or by being available through a focus state that the agent can reach more reliably.

Modal dialogs that interrupt the flow are usable for humans, who recognise the visual disruption immediately, and confusing for agents, which may attempt to interact with elements behind the modal because the markup did not clearly indicate the disruption. The aria-modal attribute exists exactly for this case and is missing from most modal implementations on the web today.

Multi-step flows that hold state across pages are tricky for agents, which often do not maintain a clear model of where they are in the flow. A breadcrumb that humans use to orient does the same job for the agent, if the breadcrumb is marked up semantically rather than presented as a visual sequence of styled spans.

Each of these is a common pattern in modern interfaces, each is solvable, and each tends to be invisible to designers who have not yet had to test their interface with an agent in the loop.

The accessibility surprise

A lot of the work to make an interface usable by an agent overlaps almost completely with the work to make an interface usable by a screen reader. The semantic structure agents need is the same structure assistive technology has needed for years. The colour-with-text-label rule is the same WCAG rule that disability advocates have been making for decades. The keyboard-reachable focus states are the same ones blind users have always relied on.

This is good news. Designers and engineers who have invested in accessibility have most of the work for agent-readability already done. Designers and engineers who have not, have a much steeper learning curve than they expect, because the accessibility debt and the agent-readability debt are largely the same debt.

The accessibility advocates have been right for a long time, and the arrival of agents has produced a different kind of audience that benefits from the same investments. The companies that took accessibility seriously are now several years ahead of the companies that did not, on a problem the latter group did not see coming.

What the dual-audience design actually looks like

A useful frame for designing for both audiences is to treat the interface as having two layers. A presentation layer that the human reads, with all the visual richness, motion, hierarchy, and craft the human deserves. A semantic layer underneath, with consistent markup, machine-readable labels, predictable structure, and explicit state indicators. The two layers are produced by the same team in the same workflow, but they are reviewed against different criteria.

A design review for a human-only interface typically asks whether the screen is legible, whether the hierarchy is clear, whether the action is obvious, and whether the visual language is consistent with the rest of the product. A design review for a dual-audience interface asks all of those questions and adds a parallel set. Whether the semantic structure under the visual surface is clean. Whether every state has a textual signal. Whether every interactive element has a stable accessible name. Whether every flow can be completed without relying on visual-only cues.

The expanded review takes longer. It also catches issues that a human-only review would miss, including issues that affect human users with assistive technology. The expansion is one of those investments that looks like overhead in the moment and pays off across multiple audiences over time.

The teams I have watched adopt this kind of review process tend to converge on a checklist that lives next to their design system documentation. The checklist evolves as the team learns what specific patterns trip up the agents they care about. The checklist is also a quiet form of institutional memory, because new designers join with the checklist as a starting point and do not have to learn the lessons individually.

The longer-term shift this implies

A designer entering the field in 2024 will spend significant portions of their career designing for interfaces with agents in the loop. The skill set that this requires is not separate from the skill set the field already values. It is an extension of it, plus a deeper investment in semantic discipline that has historically been treated as a specialist concern. The next decade of interface design will be the decade in which the specialist concern becomes a generalist concern, and the designers who make this transition early will be the ones who feel most at home in the new shape of the work.

An interface designed for one audience can pretend the other does not exist. An interface designed for two cannot.

The pretending is going to stop. The interfaces will adjust, or the agents will fail at them. The teams will learn this through the failures. The teams that learn it before the failures get expensive will be ahead of the ones that learn it after.

Terms / explained

Described terms.

AI agent: A software system that perceives an interface, reasons about how to accomplish a goal within it, and takes actions on behalf of a user without requiring step-by-step instructions for each interaction.
Semantic interface: An interface in which each element carries a clear, machine-readable meaning through its markup, ARIA attributes, and accessible name, distinct from an interface where meaning is conveyed only through visual presentation.
Agent loop: The cycle in which an agent observes the current state of an interface, decides on a next action, executes it, and observes the new state, repeated until the goal is reached or the agent decides it cannot proceed.
Hover state: An interface state revealed when a pointer is held over an element, common in desktop interfaces and a frequent source of accessibility and agent-readability issues because the information is hidden until activated.

FAQ / questions

Frequently asked.

What does it mean for an AI agent to be a user?

An AI agent operating an interface on a user's behalf reads the same screens the user does and makes decisions about what to click, type, or extract. The agent acts as an intermediary that may take a single instruction from the user and execute many interface steps to fulfil it. The interface is being used twice for each task, once by the agent and once by the user reviewing what the agent did.

How does designing for an AI agent differ from designing for a human user?

Agents read structure where humans read aesthetics. Agents need stable, semantic, machine-readable identifiers for interface elements, where humans can tolerate visual ambiguity if the meaning is clear from context. Agents struggle with information conveyed only through layout or colour, where humans rely on those signals constantly. The same interface that delights a human may cause an agent to fail, and vice versa, unless the design accounts for both.

What are the most common design mistakes when an AI agent is a user?

Hiding important information in interactive states the agent cannot easily reach, like hover tooltips or expanded sections. Using purely visual signals for status, like colour without an accompanying text label. Building flows that depend on timing or animation cues. Naming interactive elements with marketing copy rather than functional labels. Each of these is a defensible choice for a human-only audience and a serious obstacle for an agent.

Ask / a model

Request an AI summary.

Hand this take to your model of choice for a summary, a deeper read, or a critique. Each link pre-fills a prompt that points the model at this page.

ChatGPT Claude Google AI Perplexity

Read / further