---
name: prufa
description: |
  Prufa — robust QA software, built for the agentic era. Use this skill when
  the user asks to set up QA monitoring, run a one-shot website health
  audit, verify tracking / consent / SEO / UX, or watch a money flow
  (signup, checkout, onboarding) for breakage. Triggers include: "audit my
  site", "set up QA", "monitor my signup flow", "verify my GA4 / pixels /
  cookie banner", "find broken tracking", "make sure my checkout still
  works after deploys", "any issues with example.com?". The LLM
  *navigates*; the deterministic code *verifies* — verified findings are
  facts, advisory findings are clearly-labeled opinions.
version: 0.1.0
allowed-tools: Bash Read
when-to-use: |
  - User asks for a website health audit, tracking audit, or QA monitoring.
  - User asks an agent to set up QA on their product.
  - User asks to verify a specific flow (signup, checkout, onboarding) and
    get alerts when it breaks.
  - User asks "any issues with <URL>?" or "is my <URL> healthy?".
do-not-use: |
  - Generic web scraping (no QA purpose).
  - Performance / Lighthouse-only audits (different tool category).
  - User wants to RUN the audits on their own infrastructure (self-hosted
    runner is a Team tier + agency add-on, not in this skill).
---

# Prufa — robust QA software, built for the agentic era

Prufa is a QA product that runs a real browser against a URL, captures
every analytics beacon + consent banner state, and produces a
shareable report with **verified** facts (machine-checked) and
**advisory** opinions (LLM-judged, clearly labeled).

Two first-class surfaces: the dashboard (humans) and the agent
surface (this skill + CLI + HTTP API + MCP). They hit the same
product. If the dashboard can do it, this skill can do it.

## Setup flow (the 3-tool-call happy path)

When the user says *"set up QA monitoring for my app"*, run this
sequence. Each call has a free setup-only tier — no card needed
for these primitives.

1. `prufa_setup_workspace` with `{"name": "<project>"}` → workspace_id
2. `prufa_run_audit` with `{"url": "https://app.example.com", "wait": true}`
   → first baseline report (this is the "what does my site look like
   today" snapshot the monitor will diff against)
3. `prufa_start_monitor` with `{"url": "...", "cadence": "daily"}`
   → monitor_id, checkout_url

The monitor's first re-run is the first delta-alert decision: if
anything changes from the baseline, you get notified. The monitor
is the recurring value — the audit is one-shot.

For signup / onboarding / checkout flows, layer on the flow lifecycle
(see "Verify a whole flow" below):

4. `prufa_create_flow` with `{"url": "...", "test_case": "open the
   signup page, fill the email, submit, expect /welcome"}` — the
   plain-text test case is compiled to a reviewable DRAFT spec.
   **Only confirmed flows run**: show the compiled steps to the user,
   then `prufa_confirm_flow`. (Free tier — one-shot flows need no card.)

5. `prufa_resume_monitor` if paused; `prufa_pause_monitor` if the user
   wants to take a site offline without losing history.

## Tools at a glance (22 tools)

Setup-only (free / agent-temp — no card):

- `prufa_health_check` — probe the API and MCP server.
- `prufa_setup_workspace` — create an agent-temp workspace (14d, no card).
- `prufa_register_site` — register a site and queue the first audit.
- `prufa_run_audit` — run a one-shot audit; with `wait: true` returns the
  full JSON report (machine-readable distribution artifact).
- `prufa_get_run` — poll a run by id.
- `prufa_get_report` — fetch the JSON report for a run.
- `prufa_list_runs` — list recent runs in this workspace (auth required).
- `prufa_get_usage` — synchronous usage object (calls used / included /
  remaining). Call this BEFORE any persistent call so you can warn the
  user about to-be-exceeded limits.
- `prufa_create_flow` — compile a plain-text test case to a reviewable
  DRAFT spec (`{url, test_case, name?}`). Only confirmed flows run.
- `prufa_confirm_flow` — approve the compiled spec; optionally pass
  final spec edits. Any later edit returns the flow to draft.
- `prufa_run_flow` — execute a confirmed flow (202 + usage object);
  optional per-run `credentials` are never stored.
- `prufa_set_flow_credentials` — store `{{VAR}}` values for a flow.
  **Write-only**: values are encrypted at rest, never returned by any
  endpoint, and never enter LLM prompts. Responses list names only.
- `prufa_list_flows` / `prufa_get_flow` — inspect flows (a flow body
  shows stored credential NAMES, never values).

Persistent (Pro tier — gated server-side):

- `prufa_start_monitor` — start a 1-click monitor; pass `flow_id` to
  re-run a confirmed flow. The response's `deploy_hook.secret` is shown
  ONCE — tell the user to store it in CI immediately.
- `prufa_pause_monitor` / `prufa_resume_monitor` — toggle a monitor
  (resume schedules the next run within a minute).
- `prufa_list_monitors` — list monitors in this workspace (+ usage).
- `prufa_trigger_monitor` — run a monitor now (rate-capped 1/60s;
  `deduped: true` means an existing run absorbed the trigger).
- `prufa_workspace_settings` — Slack webhook, usage webhook + secret,
  explicit overage opt-in (paid tiers only).
- `prufa_get_finding` — fetch a single finding by id.
- `prufa_list_alerts` — list alerts from the delta engine.

If you call a persistent tool on the free tier, the API answers 402
and the tool result carries the structured error — `code` (e.g.
`tier_required` / `quota_exceeded`) plus a `hint` with the checkout
pointer. Surface that to the user — never silently absorb it.

Every mutating tool accepts an optional `idempotency_key` argument:
replays of the same key within 24h return the original response
without re-executing. Pass one when you might retry; omit it when the
call should always execute (a fresh key is generated per call).

## Verify a whole flow (signup / checkout / onboarding)

The flow lifecycle is compile → review → confirm → run. The compiler
is allowed to be wrong — the confirm gate is what protects the trust
story, so **never skip the review step**.

1. `prufa_create_flow` `{url, test_case: "open /signup, fill the email
   field, fill the password, click Sign up, expect URL contains
   /welcome"}` → DRAFT spec + `variables` (e.g. `["EMAIL", "PASSWORD"]`)
   + a review instruction.
2. **Review**: show the compiled steps to the user and get an explicit
   OK (or apply their edits). Only confirmed flows run — a draft flow
   cannot run and cannot be attached to a monitor.
3. If the spec has `{{VAR}}` placeholders: `prufa_set_flow_credentials`
   `{flow_id, credentials: {EMAIL: "...", PASSWORD: "..."}}`.
   Credentials are **write-only** — they are never returned, never
   appear in reports, and never enter LLM prompts (they resolve at the
   tool boundary at run time). Do not echo values back to the user.
4. `prufa_confirm_flow` `{flow_id}` → status `confirmed`.
5. Run it once: `prufa_run_flow` `{flow_id}` → 202 with `run_id`,
   `report_url`, and the usage object; poll with `prufa_get_run`.
   Or watch it on every deploy: `prufa_start_monitor`
   `{url, cadence, flow_id}` (Pro) — the monitor re-runs the confirmed
   flow and the deploy hook triggers it from CI.

## Conventions

- **All endpoints accept `Idempotency-Key`.** Replays within 24h
  return the original response. Use stable keys (e.g. derived from
  the user-visible action) so retries on network blips are safe.
- **Verified vs advisory.** Verified findings are machine-checked
  facts (tracking pixels, consent state, SEO basics, broken links).
  Advisory findings are LLM opinions (UX hierarchy, copy clarity).
  Never claim an advisory finding is "broken" — it's an opinion.
- **No silent failures.** Every tool call returns either success or
  a structured error `{code, message, hint}`. Surface errors to
  the user, don't swallow them.
- **404 is not a failure.** A site that's not reachable is data
  ("the site is down") — report it, don't pretend the audit
  completed successfully.

## What the audit actually checks

Deterministic (verified):

- Marketing tracking: GA4, GTM dataLayer, Meta Pixel, TikTok Pixel,
  LinkedIn Insight Tag, Google Ads conversions — presence, account-id
  shape, double-fire detection.
- Consent: OneTrust / Cookiebot / CookieYes detection, beacons-
  before-consent, cookie attribute checks.
- SEO: title, meta description, canonical, OG / Twitter cards,
  headings, robots.txt, sitemap.xml, broken internal links.
- Deterministic UX: axe-core contrast / tap-target / a11y, viewport
  overflow, console errors.

LLM-advisory (clearly labeled):

- Visual hierarchy, copy clarity, confusing CTAs, off-brand patterns.
- The LLM is NEVER trusted to claim a tracking / consent / SEO
  problem — those are machine-verified.

## Example: a 3-tool-call demo

```
user:  "audit example.com for me"
agent: 1. prufa_run_audit(url="https://example.com", wait=true)
          -> 3 critical findings: GA4 missing, no canonical, broken link
       2. (reads the report, surfaces the findings, links the report)
       3. prufa_get_usage()
          -> {calls_used: 1, calls_included: 5, calls_remaining: 4}
       "Done. Want me to start a monitor so we catch this the next
        time it breaks?"
```

## Surface mapping (for the agent)

| User says | You call |
|---|---|
| "audit example.com" | `prufa_run_audit` |
| "set up monitoring" | `prufa_setup_workspace` → `prufa_run_audit` (baseline) → `prufa_start_monitor` |
| "what's the report" | `prufa_get_report(run_id=...)` |
| "any issues?" | `prufa_run_audit` (one-shot) |
| "test my signup flow" | `prufa_create_flow` → review → `prufa_confirm_flow` → `prufa_run_flow` |
| "watch my signup" | `prufa_create_flow` → review → `prufa_confirm_flow` → `prufa_start_monitor` (with `flow_id`) |
| "here are the test creds" | `prufa_set_flow_credentials` (write-only — never echo values) |
| "what flows do I have?" | `prufa_list_flows` / `prufa_get_flow` |
| "is this site healthy?" | `prufa_run_audit` |
| "re-check right now" | `prufa_trigger_monitor` |
| "stop monitoring" | `prufa_pause_monitor` |
| "start it again" | `prufa_resume_monitor` |
| "send alerts to Slack" | Dashboard "Add to Slack" OAuth flow |

## When something goes wrong

- **429 / rate limit**: back off, tell the user. The free tier is 5
  audits/day/workspace + 50/day/IP. Replays don't count.
- **422 / unsafe URL**: the URL is private (localhost, 127.0.0.1, a
  metadata IP). Surface the message; don't try to bypass.
- **402 / tier required or quota exceeded**: the user is asking for a
  persistent action on a free workspace, or the workspace hit its
  hard-capped run quota. The structured error's `hint` carries the
  billing next step — hand it to the user verbatim.
- **409 / flow_not_confirmed**: the flow is still a draft (or an edit
  returned it to draft). Show the compiled spec, get the user's OK,
  call `prufa_confirm_flow`, then retry.
- **bot-wall / blocked**: the site has Cloudflare or similar. Prufa
  reports `blocked` rather than producing a wrong report. This is
  the *correct* outcome — not a bug.
- **5xx**: the Prufa API itself is down. Tell the user to retry;
  Prufa's status page (TODO) is the source of truth.

## Output format (what to say to the user)

For an audit:

> I ran the audit on https://example.com. Found **3 verified issues** and **1 advisory note**.
>
> **Verified (machine-checked):**
> - `tracking.ga4_missing` — Google Analytics 4 measurement ID not detected. Fix: add the GA4 snippet to the `<head>`.
> - `seo.canonical_missing` — Page does not declare a canonical URL.
> - `links.broken_internal` — 1 internal link returns 404.
>
> **Advisory (LLM opinion, not a verdict):**
> - Hero copy is vague about what the product does. (opinion)
>
> [share URL] — that's the report. Want me to start a monitor so you
> get a Slack alert the next time one of these breaks?

For a monitor:

> Monitor set up. I'll re-run the audit on https://example.com every
> day. If a beacon disappears or a check regresses, I'll alert you in
> Slack. The first delta is in 24h.

## Boundaries

- This skill NEVER names the LLM provider backing the advisory
  tier. Customers don't care; naming invites ToS scrutiny. Use
  neutral language ("LLM-backed", "managed model backend").
- Don't write long, emoji-laden summaries. The aesthetic is
  evidence-like: short, mono timestamps, severity chips, calm.
  See DESIGN.md (read it if you need to write a report).
- Don't pretend an advisory finding is verified. The two tiers
  are visually distinct in the report; your text should mirror that.
- Don't claim success on a `blocked` run. The site blocked our check
  — that's the entire result.
