Blog

Notes on AI QA — practical testing guides, the engineering behind Prufa, and QA for the agentic era. Machine-verified product, same standard for the writing: no claims we can't back. Every number here comes from our own audits or a cited source, every product claim is checked against the code that ships, and where a competitor or alternative does something better, we say so. The guides are runnable checklists, the engineering notes show the actual mechanism — deterministic checks over LLM opinions — and the data posts publish the methodology and its limits alongside the findings.

2026-07-22 Engineering notes

AI testing false positives: 4 of our 6 findings were wrong

Our chaos agent flagged 6 findings on a live production app in July 2026. Four were wrong. The guards, the labeled corpus, and the CI precision floor.
2026-07-10 Engineering notes

Why AI agents fail at login: what we found fixing ours

Our QA agent kept failing a customer login behind reCAPTCHA. The browser passed 10/10 — the LLM's driving failed 5/5. The three fixes that shipped.
2026-07-05 Engineering notes

Idempotency keys for AI agents: what shipping ours taught us

A worked example of idempotency keys for AI agents: client-supplied vs server-derived keys, a Redis-down fallback, and a CLI key that only half-works.
2026-07-01 Engineering notes

Our AI QA agent caught our own landing page contradicting itself

We pointed Prufa at prufa.dev. It found our meta tag promising a 60-second audit while the homepage said 2 minutes — a real trust bug on our own site.
2026-07-01 Engineering notes

State of Launch QA 2026: we audited 63 launches. Here's what breaks.

We ran real-browser QA audits on 63 product launches (Show HN + r/SideProject) in June 2026. 78% recorded zero analytics on launch day. The full data.
2026-07-01 Practical QA guides

Test your analytics before launch

A loaded analytics snippet can still record nothing. What breaks before launch — and how to check a beacon actually fires, not just that the tag is there.
2026-06-30 Practical QA guides

Test your website on mobile before launch

What breaks on mobile before launch — sideways scroll, tiny tap targets, mobile-only JS errors — and how to get a verdict, not a phone-frame preview.
2026-06-29 Practical QA guides

Test your login flow before launch

What breaks login flows before launch, ranked — and how to verify the whole loop: you sign in, the session sticks, locked-out users recover.
2026-06-28 Practical QA guides

Test your password reset flow before launch

What breaks password reset flows, ranked — and how to verify the whole loop: the email arrives, the one-time link works, the new password logs you in.
2026-06-27 Practical QA guides

Test your contact form before launch

What breaks contact and lead forms, ranked — and the one pre-launch check no external tool can do: confirm the email actually arrived.
2026-06-26 Practical QA guides

Test your checkout flow before launch

What breaks checkout flows, ranked — and the safe way to verify each before launch: walk the flow up to the payment wall, no real card charged.
2026-06-24 Comparisons

No-code e2e testing: what it can and can't do

No-code e2e testing covers ~80–90% of common flows, but it's two different things — and one hits a wall. What it can't do, and how to choose.
2026-06-24 Comparisons

Synthetic monitoring vs real user monitoring: which you need

Synthetic monitoring vs real user monitoring: you almost always need both. When you can skip one, the deployment order, and what running both costs.
2026-06-23 Practical QA guides

AI QA agents vs traditional e2e suites when each wins

Decision guide for teams with a Playwright or Cypress suite: 6 questions, 5 places hand-written e2e still wins, 5 where agentic QA wins, switch/stay/layer
2026-06-23 Comparisons

Is uptime monitoring enough? What a 200 can't tell you

No — uptime monitoring confirms a page responds, not that signup or checkout still completes. Where pinging is enough, and where it's blind.
2026-06-23 Engineering notes

Website monitoring without analytics pollution

Synthetic and uptime monitoring quietly inflate your GA4 pageviews and conversions. Why GA4 won't save you, and how exclusion should be the monitor's job.
2026-06-21 Comparisons

AI QA tools: an honest roundup (and how to choose)

AI QA tools are really four different categories. An honest roundup of where bug0, QA Wolf, Momentic, Rainforest QA — and Prufa — each genuinely win.
2026-06-21 AI QA, explained

Agentic testing vs scripted testing: can you trust an LLM?

Agentic tests aren't flaky if plain code does the verifying. The honest hybrid: the LLM navigates, deterministic code grades — how to tell tools apart.
2026-06-21 Practical QA guides

Vibe coding exposed API keys: how they leak, how to check

Vibe-coded apps leak API keys via client-bundle hardcodes, git history, and the Supabase service_role mix-up. How to check, fix, and what a scan misses.
2026-06-21 Practical QA guides

Vibe coding support after launch: keep your app running

Vibe coding support after launch: the four things that silently rot a working app from the outside, and the periodic functional re-check that catches them.
2026-06-21 Comparisons

How much does a QA engineer cost (and when you don't need one)

What a QA engineer costs in 2026 — base salary, fully-loaded, and contractor rates, sourced and dated — plus the honest case for not hiring one.
2026-06-19 Practical QA guides

Client site QA checklist: the agency pre-delivery list

A repeatable pre-delivery QA checklist for agencies, ordered by what breaks across 49 launches — with the flow checks others skip and a forwardable report.
2026-06-19 Practical QA guides

How to test a vibe-coded app: the 6-step process

The ordered process to test a vibe-coded app before launch, sequenced by a June 2026 audit of 49 launches where 78% shipped a critical bug.
2026-06-19 Practical QA guides

Website launch checklist for indie hackers

In a June 2026 audit of 49 launches, 78% shipped a critical bug day one. An 8-item launch checklist for indie hackers, ordered by what breaks.
2026-06-16 Engineering notes

We audited 14 side-project launches. Zero critical bugs, same quiet flaws.

Real-browser audits on 14 side-project launches from r/SideProject. None had a critical bug — but 11 of 14 shared the same quiet, growth-capping flaws.
2026-06-15 Engineering notes

We pointed our chaos-QA agent at our own site. It found a shipped bug.

We aimed Prufa's chaos tester at prufa.dev. It caught a 103px mobile overflow our green CI had just shipped — plus two of its own false positives.
2026-06-15 AI QA, explained

Chaos testing for web apps: what it is, and what an LLM adds

What chaos testing for web apps is, how it relates to monkey testing, and what an LLM adds — plus where it fits next to your end-to-end tests.
2026-06-14 Practical QA guides

Is my vibe-coded app production ready? a scored assessment from 49 launches

Score your app against a 49-launch audit. 22% ship with no criticals; 73% ship with one; 4% need to hold. Six questions, ten minutes, real data.
2026-06-13 Practical QA guides

Test your signup flow before launch

Five things that break signup flows, ranked by how often they show up, with a 60-second audit that walks the flow end-to-end.
2026-06-13 Practical QA guides

Vibe coding testing checklist: 10 things to verify before you ship

Ten things to verify before you ship a vibe-coded app, grounded in a 49-launch audit. Each: failure mode, Prufa check, signal, 5-min manual fallback.
2026-06-12 Practical QA guides

QA for vibe-coded apps: what actually breaks

What actually breaks in vibe-coded web apps, in what order, and what to do before launch. Data from 49 fresh launches, June 2026.
2026-06-11 Engineering notes

How Prufa verifies a signup flow (and why the LLM never grades the result)

One signup run, end to end: an LLM-backed agent drives the browser, plain code grades the outcome against a public flow-spec. Same input, same verdict.
2026-06-11 For agents (and their humans)

llms.txt in 2026: who actually reads it (and why we shipped one anyway)

Major AI crawlers don't fetch llms.txt — coding agents do. The honest state of the spec in 2026, and the 30-minute version that's still worth shipping.
2026-06-11 Engineering notes

We audited 49 Show HN launches. 38 had a critical bug on day one.

Free QA audits on 49 fresh Show HN launches: 78% had a critical finding. The full breakdown of what actually breaks at launch — analytics, links, cookies.
2026-06-11 Practical QA guides

Website QA checklist before launch, ordered by what actually breaks

A pre-launch QA checklist ordered by real failure data from 49 audited launches: analytics first, then forms, links, SEO basics, consent, and user flows.
2026-06-11 AI QA, explained

Why Prufa exists: QA built for the agentic era

AI writes 10× more code than humans can validate. The QA gap isn't a missing test suite — it's surface area. Here's the architecture we bet on.