Visual QA for Vibe-Coded Apps: Screenshots, Diffs, and Regression Checks

Hey there! Laura and Heidi here from SCRNIFY!

Vibe-coded apps are easy to start and easy to change. That is the fun part.

The hard part is knowing what broke after the fifth prompt, the copied component, the quick CSS fix, the regenerated layout, or the agent run that touched three files you did not expect it to touch.

Ask us how we know. The first four prompts feel like magic. The fifth one quietly moves the button under a footer.

Functional tests help. Type checks help. A quick manual click-through helps too. But visual bugs often slip through because the app still technically works. The form submits. The route loads. The button exists. It is just covered by a sticky footer, pushed off mobile, hidden behind a modal, or wrapped into a layout nobody intended.

This guide is about visual QA for vibe-coded apps: when to capture screenshots, when to use diffs, and how to turn confirmed UI issues into regression checks that do not make your test suite miserable.

Why vibe-coded apps need visual QA

Vibe coding changes the shape of frontend work.

You can ask an AI agent to build a dashboard, wire a pricing page, adjust copy, add a modal, or fix a mobile layout. It may do most of that correctly. It may also make visual changes outside the narrow thing you asked for.

Common failure modes:

A generated component looks fine on desktop but collapses badly on mobile
A card grid works with sample content but breaks with real labels
A modal has no usable scroll state
A dark mode change leaves one panel unreadable
A sticky header covers anchor links
A loading state shifts the page after the screenshot was taken
A regenerated page drops spacing, focus styles, or empty states

Most of these are not logic bugs. They are product bugs users can see. Which means they are the bugs people remember.

That means visual QA should be part of the loop, especially when changes come from agents, prompt edits, or large generated diffs.

Review the generated diff before the page

Before opening the app, look at what the agent changed.

Vibe-coded UI bugs often come from files adjacent to the requested change: a shared layout, a design token, a wrapper component, a global stylesheet, or a copied component that now affects more routes than expected.

Give the code diff a quick visual-risk pass:

Change in the diff	Visual QA risk
Global CSS or theme tokens	Many pages can change at once
Layout wrappers	Headers, sidebars, and page width can shift
Shared buttons or inputs	Forms may still work but look wrong
Conditional rendering	Empty, loading, and error states may disappear
Generated copy changes	Long labels can break cards and nav
New fixed or sticky elements	Mobile actions may get covered

This tells you where to point screenshots. If an agent only changed a settings form, but the diff also touched AppShell, check at least one unrelated route that uses the shell. Shared wrappers love making surprise cameos.

Start with screenshots, not diffs

Visual diffs are useful, but they are not the first step.

Start by capturing screenshots at known checkpoints. Screenshots answer a simpler question: what did the app look like after this change?

For a small app, a first pass might cover:

Home page
Main authenticated dashboard
Primary form
Empty state
Error state
Mobile version of the most important route

For a product flow, capture before and after the action:

onboarding-01-start.png
onboarding-02-profile-form.png
onboarding-03-after-submit.png
onboarding-04-dashboard-empty-state.png

Those files do not need to be clever. They need to be easy to sort, attach, compare, and discuss. Clever filenames are funny once and annoying forever.

This matters more for vibe-coded apps because the implementation may be moving faster than your memory of it. A screenshot gives you a record of what the UI actually looked like at a point in time.

Capture the risky states

Do not capture only the happy path.

Generated apps often look strongest in the default state because that is what the prompt described. The weaker states are usually the ones with less explicit instruction.

Capture these states early:

State	Why it breaks
Empty state	AI-generated layouts often assume data exists
Long content	Cards, tables, and nav labels wrap in odd ways
Error state	Validation copy may overflow or appear in the wrong place
Loading state	Skeletons and spinners can shift layout
Mobile	Desktop-first generated UI often hides core actions
Dark mode	One missed token can make text unreadable
Modal open	Scroll, focus, and fixed positioning bugs show up here

You do not need a huge screenshot matrix on day one. Please do not build a ceremonial QA cathedral for a two-page prototype. Pick the states where a visible bug would embarrass you in a demo or block a user from finishing the task.

Define "ready" once per route

For each route you capture often, write down what proves the UI is ready. Do this once, then reuse it.

Example:

await page.goto('https://preview.example.com/dashboard')
await page.getByTestId('dashboard-shell').waitFor({state: 'visible'})
await page.getByText('Recent activity').waitFor({state: 'visible'})
await page.screenshot({path: 'dashboard-loaded.png', fullPage: true})

For vibe-coded apps, this catches two problems at once. It prevents early screenshots, and it shows where generated code lacks stable test hooks. If there is no reliable selector or visible ready state, add one before building a visual QA habit around that page.

Make visual claims prove themselves

If an AI agent reviews your app, do not accept a bare summary.

Weak report:

The app looks good. I found one minor mobile issue.

Better report:

Desktop dashboard loaded without visible overlap. Evidence: 01-desktop-dashboard.png.
Mobile settings page has a blocked Save button at 390x844. Evidence: 04-mobile-settings-save-blocked.png.
I did not verify dark mode because the preview did not expose a theme toggle.

The second report is easier to trust because the visual claim points to evidence.

Use this prompt pattern for a vibe-coded PR review:

Review this deploy preview after an AI-generated UI change.
First inspect the code diff and identify routes/components at visual risk.
Use 1440x900 and 390x844 viewports.
Capture screenshots for affected routes, one adjacent route, and every visual issue.
Check empty, loading, error, mobile, long-content, and modal states when the changed component can render them.
For each finding, include URL, viewport, steps, expected behavior, actual behavior, severity, and screenshot filename.
If a screenshot does not prove the finding, mark it unverified.
Do not make pass/fail claims without evidence.

This keeps the agent from writing a polished report with thin proof. Pretty summaries are cheap. Evidence costs a little more and is worth it.

Add diffs after screenshots become routine

Once you have reliable screenshots, visual diffs can catch changes nobody noticed.

A visual diff compares a new screenshot against a baseline. If enough pixels changed, it flags the result. That can be useful for generated UI work because small prompt changes can create large layout changes.

But diffs are only as good as the page stability.

Good candidates for visual diffs:

Marketing page hero
Pricing table
Docs page layout
Static empty state
Component examples
Logged-out forms

Bad candidates:

Pages with live timestamps
Dashboards with changing data
Feeds with user-generated content
Ads or third-party widgets
Animations captured mid-transition
Personalized pages without stable test data

If the page changes every run, the diff will teach your team to ignore visual testing. Nothing kills a useful check faster than making everyone click past it.

Choose the right check

Not every visual concern belongs in a screenshot diff.

Use this as a starting point:

Issue type	Best check
Whole page unexpectedly changed	Screenshot diff
Button hidden on mobile	Playwright visibility and viewport assertion
Card spacing drift after theme edit	Component or page screenshot diff
Modal cannot scroll	Interaction test plus screenshot on failure
Empty state missing	Text/role assertion, then screenshot evidence
Long generated label wraps badly	Screenshot at fixed viewport with long test data
Dark mode contrast looks wrong	Accessibility check plus screenshot review
Animation glitches	Short video or manual review

The pattern is simple: use diffs for broad visual drift, assertions for rules you can state, screenshots for evidence, and manual review for judgment.

Reduce diff noise before raising the threshold

The tempting fix for noisy visual diffs is a higher threshold.

Sometimes that is right. Usually, first remove noise.

Try this before relaxing the diff too much:

Use seeded test data
Freeze dates and clocks
Disable animations during test runs
Hide third-party widgets outside the test scope
Capture one component instead of the full page
Wait for app-specific ready states
Use the same browser, viewport, and font setup each run

Thresholds should absorb tiny rendering differences. They should not hide real layout movement.

Turn confirmed visual bugs into regression checks

Not every visual issue needs an automated diff.

The useful loop is:

Find the issue with screenshot review
Confirm it matters
Fix it
Add the smallest regression check that would catch it again

For example, suppose an agent finds that the mobile settings page hides the Save button behind a sticky footer.

You might not need a full-page pixel diff. A focused check can be better:

await page.setViewportSize({width: 390, height: 844})
await page.goto('https://preview.example.com/settings')

const saveButton = page.getByRole('button', {name: 'Save'})
await expect(saveButton).toBeVisible()
await expect(saveButton).toBeInViewport()

Then capture a screenshot only on failure:

test.afterEach(async ({page}, testInfo) => {
    if (testInfo.status !== testInfo.expectedStatus) {
        await page.screenshot({
            path: `artifacts/${testInfo.title}.png`,
            fullPage: true,
        })
    }
})

This kind of regression check is less fragile than comparing the whole page forever.

Use diffs for surfaces, assertions for behavior

Visual diffs are good at catching surface changes. Assertions are better for specific rules.

Use visual diffs when the question is:

Did this page unexpectedly change?
Did the component still render like the approved baseline?
Did a generated layout drift after a prompt edit?

Use assertions when the question is:

Is the CTA visible?
Is the error message shown after invalid input?
Is the modal scrollable?
Is the focused element inside the dialog?
Is the mobile nav reachable?

For vibe-coded apps, mix both. Use screenshots and diffs to notice visual drift. Use focused assertions to lock down bugs you already understand.

Keep baselines under review

Baselines are not sacred. They are approved references.

When you intentionally change the UI, update the baseline as part of the same review. Do not let old baselines linger until everyone starts treating the diff output as noise.

A practical baseline review asks:

Is this visual change intentional?
Does the new screenshot match the design or product decision?
Did the change affect other viewports?
Are dynamic areas masked or stabilized?
Should this become a focused assertion instead of a broad diff?

This is where human review still matters. A diff can tell you something changed. It cannot tell you whether the change is better.

A concrete first-week setup

If you do not have visual QA yet, do not start with a giant baseline suite.

Start with one workflow:

Pick three routes: home, primary product route, and one form-heavy route
Add stable test data for empty, normal, and long-content states
Capture each route at 1440x900 and 390x844
Save screenshots as PR artifacts, not permanent tests yet
In review, label each visual change as intentional, bug, or unclear
Add screenshot diffs only for the surfaces that stayed stable across a few runs
Convert confirmed bugs into focused assertions

For example, after an agent changes billing copy:

1. Capture pricing page desktop and mobile
2. Capture checkout form mobile
3. Diff pricing table against previous approved baseline
4. Human confirms mobile CTA is now covered
5. Fix layout
6. Add assertion that CTA is visible and in viewport at 390x844
7. Keep screenshot on failure for debugging

This gives you a useful loop without making CI fail on every harmless pixel shift.

Save enough context to debug later

A screenshot folder without context becomes junk quickly. At minimum, save URL, viewport, commit, branch, timestamp, runner, and step name.

That can be a small manifest:

{
  "url": "https://preview.example.com/settings",
  "viewport": "390x844",
  "commit": "abc1234",
  "branch": "agent/settings-layout-fix",
  "capturedAt": "2026-06-04T12:00:00.000Z",
  "runner": "playwright",
  "step": "settings-mobile-save-button"
}

That context helps when someone finds an old screenshot in Slack, a CI artifact, or an agent report and asks, "What version was this?"

When a visual change is intentional, leave a short review note too:

Approved baseline update for pricing page.
Reason: annual plan copy changed and card height increased as expected.
Checked: desktop 1440x900, mobile 390x844.
Follow-up: added CTA viewport assertion for mobile checkout.

That note is boring. It will save time later.

Where remote Capture fits

Local browser automation is best when the test needs session state, DOM inspection, accessibility trees, or exact interaction steps.

Remote Capture is useful when you need repeatable screenshots or videos without managing browser infrastructure in every environment: deploy previews, marketing pages, docs captures, visual records for agent reports, and scheduled monitoring of public pages.

With scrnify, a Capture is sent to the Scrnify API:

scrnify capture https://preview.example.com/pricing \
    --type image \
    --format png \
    --full-page

That works well when the page can be reached from a URL and the desired state is reproducible. For deep authenticated flows, keep the screenshot inside the browser session unless you have a clean way to recreate the state remotely.

A practical visual QA loop

If you are adding visual QA to a vibe-coded app, start small:

Inspect the generated diff for visual-risk files
Pick affected routes plus one adjacent route
Capture desktop and mobile screenshots for each
Add screenshots to agent QA reports and PR reviews
Stabilize noisy data, dates, animations, and widgets
Add visual diffs only for stable surfaces
Convert confirmed bugs into focused regression checks

The point is not to automate every visual judgment. That way lies flaky tests and ignored reports.

The point is to make UI changes visible, reviewable, and harder to accidentally repeat.

Checklist

Use this before merging a vibe-coded UI change:

Did you capture the main route on desktop and mobile?
Did you check empty, loading, error, and long-content states where relevant?
Did visual claims include screenshot evidence?
Did you wait for a real ready state before capturing?
Did you remove dynamic noise before using screenshot diffs?
Did confirmed bugs become focused regression checks?
Did you save URL, viewport, commit, and step metadata with artifacts?

Screenshots are not a substitute for tests. Diffs are not a substitute for judgment. Regression checks are not a substitute for looking at the app.

Together, they give vibe-coded apps a feedback loop that can keep up with how fast the UI changes.

Try SCRNIFY and review current pricing. scrnify.com

If you are building visual QA workflows around AI-generated apps, we'd like to hear what keeps breaking. Drop us a line at support@scrnify.com or find us on Twitter @scrnify.

Cheers, Laura & Heidi