AI prototyping for product managers

A PM leader at DeepHow, an industrial training/knowledge platform, described the exact thing that makes most AI prototyping tools fail a PM working on a real product:

“I’ve used Replit, but it’s kind of imaginary. It’s not real. I can give it background and context on what I’m trying to do, but ultimately, it’s not my product.” – Product Leadership, DeepHow

That is the gap most writing on AI prototyping skips. The tools generate something that looks like the product and is not the product. For a PM who has to defend two weeks of effort to an engineering team and a CEO, that distance is the difference between a prototype that ships and a prototype the team abandons.

Every search result for “ai prototyping for product managers” is a tool roundup. Roundups are written for readers who have not picked a tool yet. Most PMs have picked one, used it, run into the same wall, and need something else: the playbook the PMs who are actually shipping run. So here is that playbook, drawn from 70 interviews with PMs and designers at mid-market B2B SaaS companies, with notes on where the tools fit inside it.

The playbook, in three moves

1. Pick the hypothesis before you pick the tool

The first move is testing whether an idea is worth the engineering hours it will consume. The PM leader at DeepHow calls this the juice-worth-the-squeeze test – validating with customers before you ask engineering to commit. For the full protocol on this, see the pattern post.

Before you open Lovable or v0 or FigmaMake, the question is what hypothesis the prototype is testing. Not “what would this look like if it existed” – that is a visualisation problem, and Figma already solves it. The question is: what decision am I trying to make with this, and what would have to be true for my team to commit the engineering hours?

The playbook that avoids wasted engineering cycles starts with a hypothesis narrow enough that the prototype can actually falsify it. “Will customers understand this flow?” is testable from a prototype. “Is this a better experience overall?” is a meeting agenda, and the prototype you build to answer it will fail in review because it was never pointed at anything specific.

This step looks administrative. It feels easy to skip. But skipping it leads to picking their tool based on what is trending on LinkedIn that week, then discovering on Friday that the tool cannot actually answer the question they needed answered on Monday.

2. Match the product, or lose the handoff

Once the hypothesis is clear, the next move is choosing a tool that matches the real product closely enough that the handoff does not require engineering to rebuild it from scratch. This is where the “it’s not my product” problem takes out most tools.

Drew Muller is a PM at Ferry International. He runs a lean team, no embedded designer, and has tested more AI prototyping tools than anyone else I talked to across 2025 and early 2026. When I asked him what keeps going wrong, he landed on one specific thing:

“Tools really struggle with replicating quickly and easily and efficiently the design system of your app. And for me personally, visual congruency is really important. I don’t want it to look like something else. I want it to look like an embedded part of my app.” – Drew Muller, Ferry International

This is where design-system rejection becomes predictable. PMs who build outside the system find that design review turns into a rescope conversation, not a release conversation. For the full pattern and why it happens, see the design-said-start-over post.

There is a split in the category worth naming. Some AI prototyping tools optimise for producing a beautiful standalone artifact outside the real product – Lovable, v0, Bolt, Replit all live in this camp. Others optimise for output that looks, behaves, and lives like the real product. FigmaMake is pulling in that direction because it pulls from the team’s Figma design system. Cursor and Claude Code live in the real codebase but require engineering-grade fluency to drive. Else runs inside the team’s actual frontend code and reuses the components already there.

Which side of the split matters depends on what the prototype is for. If the hypothesis is “is this concept worth exploring at all”, a standalone artifact is fine. If the hypothesis is “will customers use this inside the product we actually ship”, a prototype that does not look or behave like the product is going to lie to you. For the side-by-side on this specific tradeoff: Else vs Lovable.

3. If you make it right, protect the polish

The third move is often learned the hard way. Design leaders I interviewed described the same pain over and over. You prototype, publish, and fix until the thing is right. Then the prototype hands off to engineering and engineering re-implements it. Most of what you built does not survive the rebuild. The designer goes back to the plan, works out what broke, and the team puts in another round of effort to recover what already existed in the prototype. The frustration is the same in every conversation: the work was already done. They want it real, not rebuilt.

The implicit design of most AI prototyping tools is: generate a prototype, show it to engineering, throw away the code, engineering rebuilds it. That model works when the prototype is disposable. It stops working the minute a PM or designer has spent a week getting interactions, states, empty-state behavior, error copy, and microcopy to where the team actually wants them.

The playbook I see working is to pick a tool that either produces engineering-grade output or stops at the point where engineering-grade output is someone else’s job. The middle zone – close enough to demo, too rough to ship – is where polish dies and where engineering rebuild cycles eat whole quarters.

This is the half of the decision that is easy to get wrong when picking a tool. The limits of the tool need to be known before the work starts. Once engineering is staring at what was shipped and asking which parts of it they are allowed to keep, the week is already gone. Tools in the “engineering-grade output” camp include Else, which opens a pull request on the frontend repo so the prototype is the thing that gets merged, rather than a visual reference for something engineering has to rebuild. For the full use case on shipping PRs without code rewrite, see the use-case guide.

Field note: the shape of a prototype that ships

Drew Muller was the first PM to ship two pull requests to production through Else with zero changes to the code output. The path there ran through the same wall most PMs hit.

Ferry International had let its embedded designer role lapse. Drew and the other PMs had absorbed the design work using FigmaMake and Magic Patterns – two-dimensional prototypes for visual flows, a bit of vibe coding for interactive work. The gap that would not close was the one Lauren described above. Every time the team built something that looked right in a prototyping tool, engineering had to rebuild it.

When Drew connected the Ferry International frontend repo to Else, the prototype started inheriting the existing components and the existing CSS. The output looked like the product, because it was the product, plus the change Drew wanted to test. When the engineering team reviewed the PR, they merged it. Twice.

Drew wrote this about the process on the Else homepage:

“With Else, we went from validated prototype to production PR in a fraction of the time it used to take. Engineering didn’t touch it until it was ready to merge.” – Drew Muller, Ferry International