Designing AI ProductsLetting go of interfaces, certainty, and familiar metrics.
By Hazel Maria Bala
The Design Promise That Didn’t Materialize
During graduate school at Parsons Design & Technology, I was convinced I was training for a future of spatial computing, something closer to Minority Report than mobile screens. Intelligence would be visible. Interaction would be embodied. Design would feel closer to choreography than layout.
What materialized instead was something more familiar. After graduating, I became a UX designer, then a product designer. It was a rigorous and legitimate discipline, and for a long time, it made sense. We optimized usability. We ran double diamonds. We mapped flows. We built design systems. We collaborated—sometimes contentiously—within the holy trinity of Product, Design, and Engineering.
Even in interviews, legitimacy was proven through fluency in process: how you framed trade-offs, how you facilitated workshops, how you performed problem-solving on whiteboards or take-home exercises. Mastery meant showing that you understood how products get built.
I was good at this. And for a while, it felt sufficient.
Designing “AI” Before It Was the Product
My first experience designing AI products came at IBM Watson Health, in the context of oncology. The stakes were real. The environment was regulated. Patient health information, clinical workflows, and institutional trust all shaped what the product could be.
The AI was not there to replace oncologists. It was there to assist—automating repetitive tasks, surfacing insights, and improving efficiency. From a design perspective, the work still centered on interfaces: dashboards, visualizations, navigation, interaction quality. The intelligence lived behind the scenes.
My role was to make that intelligence legible and credible. Pixel-precise layouts. Clear information hierarchy. Smooth transitions. Interfaces that felt intelligent. To a user, this translated into trust.
I collaborated closely with product managers and engineers. We debated trade-offs. I ran user interviews. I maintained a design system that allowed the team to move quickly. It was coherent, professional work—and I was comfortable there.
In those days, the AI generated signals that informed human judgment but did not mediate decisions, trigger actions, or shape system behavior. Designing for that required interface clarity, not behavioral governance.
My Inflection Point: When the AI Changed
After the pandemic, AI became distinct.
Natural language processing (NLP) shifted to large language models (LLMs). Machine learning outputs became generative. Models no longer just ranked or classified—they reasoned, synthesized, and responded. To design at this level, understanding screens was no longer enough. I had to understand models, orchestration, retrieval, and system behavior.
At first, conversational AI felt familiar. A prompt. A response. A smarter user on the other side. I designed clean interfaces with speech bubbles, thoughtful calls to action, and clear affordances. From the outside, it looked like the same job with new vocabulary.
That wasn’t true.
Where Traditional Design Thinking Broke
Designing conversational AI at face value is a simple question-and-answer interaction between a human and an intelligent machine.
There is now a distinction that I discovered: the experience no longer existed in the interface; it lives in the AI.
The AI’s response is the experience, interaction design changes fundamentally. These systems are meant to be intelligent, yes, but not human. Yet they converse fluently, but lack judgment. They respond confidently, even when certainty is partially correct.
Users, meanwhile, ask vague questions because humans perceive AI to be ‘all-knowing’ and smart. The model answers anyway with a river of plausible answers that overwhelms the user.
This was the moment where my traditional product design instincts failed. The problem was no longer how information was displayed or finding a call to action. It was how the system behaved when it didn’t fully know how to respond.
I had to design for the AI to ask clarifying questions. To defer. To escalate. Because the ramifications aren’t about a bad user experience, it implies legal exposure, reputational damage, and real harm.
So the design questions changed:
When should the AI refuse to answer?
When should it escalate to a human?
What threshold is acceptable?
How does trust persist without false confidence?
There was no design-thinking workshop for this. No empathy map exercise prepared me for designing how an AI reasons with a human.
I learned by doing and having a lot of conversations with the Machine Learning engineers.
Predictive and Analytical Systems: Where It Got Harder
Predictive analytics introduced a different kind of tension. Numbers imply precision. Uncertainty reads as weakness.
Here, transparency became contentious. Business stakeholders worried that exposing confidence scores would undermine credibility. ML engineers argued the opposite, that transparency protects it. Domain experts, particularly in oncology, were skeptical of both. Years of lived experience outweighed any black-box output.
All three perspectives were valid. None resolved the problem.
This is where traditional product design truly stopped being enough for me.
Wireframes aren’t useful anymore for what I needed, but diagrams were. The double diamond was not the framework. The work shifted to system-level questions: confidence thresholds, hallucination risk, human-in-the-loop requirements, escalation paths, refusal behaviors, and accountability. Static Figma screens to user test aren’t enough anymore, because it tests the wrong thing once AI becomes part of system behavior.
My role became less about shaping intuitive screens and more about stewarding behaviors between an AI and a human. I need to calibrate trust over time, adapting the experience as models mature, and ensuring that human judgment is never displaced by artificial confidence.
Figma, design systems, and collaboration with PMs still matter. But they no longer sit at the center of my decision-making. They support the work, but they don’t define it.
How I’m Designing Now & What Changed for Me
User problems and having empathy still matter. But in AI products, the problem is no longer just task completion and intuitive screens to navigate.
What becomes important now is how humans interpret, trust, and act on the AI’s answers that might/might not be probabilistic, which affects user testing. Because once an AI output becomes probabilistic, static Figma mocks used for testing stop being honest. We will need live behavior to see how trust actually forms, or breaks. Therefore, we would require and prefer a low-code prototype.
The product I’m designing for is the relationship between uncertainty, authority, and decision-making. I also now expanded my engineering knowledge, now it’s more on ML terms and principles, than how my design is built on React.
As AI systems became less deterministic, my work shifted away from polishing interfaces and toward making system behavior legible. Diagrams replaced wireframes early on because the AI’s behavior and reasoning had to be agreed on before it appeared on an interface.
Reading research papers becomes part of my practice, this stopped being optional once design decisions could violate system safety. Understanding model limits became part of my design responsibility.
This is the work I choose to do now. I’m comfortable leaving parts of traditional product design behind to do it well.