Home API Agent skill Schema (JSON)

Flow-spec format v1 — Public Spec

Status: versioned, public. Compiles from constrained-vocabulary

natural-language test cases (step 3 of the build order).

Stability: v1 is the contract. Backwards-compatible additions only;

breaking changes require a v2 the user opts into.

Why a separate spec

Plain-text test cases are written in natural language, but the

runner needs a deterministic, reviewable spec. The flow-spec

format is the bridge: it's the artifact the user confirms

*before* it runs on a monitor.

Two design choices follow from the trust invariant (PLAN § E-H3):

Compiles to a reviewable spec; the user can pin selectors, add

waits, override assertions, mark steps optional.

types. A v2 requires a new format and the user opts in.

The machine-readable contract is flow_spec.schema.json (JSON

Schema, Draft 2020-12), generated from the reference

implementation's models — the schema and the runner cannot drift.

Top-level shape


{
  "spec_version": "1",
  "name": "signup",
  "description": "Sign up a new user end-to-end on the marketing site.",
  "url": "https://example.com",
  "allowed_hosts": ["checkout.stripe.com"],
  "steps": [
    { "type": "goto", "url": "https://example.com" },
    { "type": "act", "action": "click", "target": "the sign-up button", "selector": "text=Sign up" },
    { "type": "act", "action": "fill", "target": "the email field", "selector": "input[name=email]", "value": "{{EMAIL}}" },
    { "type": "act", "action": "fill", "target": "the password field", "selector": "input[name=password]", "value": "{{PASSWORD}}" },
    { "type": "act", "action": "click", "target": "the submit button", "selector": "button[type=submit]" },
    { "type": "wait", "for": "[data-testid=dashboard]" },
    { "type": "expect", "kind": "url_contains", "value": "/dashboard" },
    { "type": "expect", "kind": "beacon", "vendor": "ga4", "event": "sign_up" }
  ],
  "assertions": [
    { "kind": "no_console_errors", "severity": "warning" },
    { "kind": "beacon_fires", "vendor": "ga4", "event": "sign_up", "severity": "critical" }
  ]
}

Unknown fields are refused everywhere — at the top level and

inside every step and assertion. A spec that says something the

consumer doesn't understand must not run with the unknown part

silently dropped (the "no silent failures" invariant applied to

parsing).

Step types

| type | Fields | Effect |

|---|---|---|

| goto | url | Navigate. The URL must pass the public-URL guard and the host allowlist. |

| act | action, selector?, target?, value? | One browser action: click, fill, press, hover, scroll. |

| expect | kind, ... | Deterministic verification: url_contains, url_matches, text_contains, beacon. |

| wait | ms *or* for | Sleep 1–30000 ms, or wait for a selector to appear. |

| extract | selector, into | Pull the element's visible text into a run variable. |

Every step also accepts optional (boolean, default false) —

see Failure semantics.

act carries a pinned selector (CSS or text= selector —

never raw XPath; a deliberate DX choice, DX7) and/or a target:

a short human description ("the sign-up button") that the agent

loop uses to re-resolve against the live page when the pinned

selector fails. At least one of the two is required. value is

required for fill and press and refused for click, hover

and scroll.

expect kinds:

| kind | Fields | Passes when |

|---|---|---|

| url_contains | value | The current URL contains value. |

| url_matches | value | The current URL matches value as a regular expression (validated at parse time; invalid regex is refused). |

| text_contains | value, selector? | The element at selector (default: body) contains value, case-insensitive. |

| beacon | vendor, event? | A matching BeaconEvent has been captured at any point in the flow up to this step. Event-name match is case-insensitive. |

The LLM never judges an expect — every kind is plain code over

the page or the captured beacon stream.

wait takes exactly one of ms (integer, 1–30000) or for

(a selector). Both or neither is refused.

extract stores the element's trimmed inner text under the

name into (an identifier: [A-Za-z_][A-Za-z0-9_]*). Later

steps reference it as {{name}} — see Variables.

Allowed hosts

allowed_hosts (top-level, optional, max 10 entries) lists bare

hostnames the flow may navigate to *beyond* the entry URL's host

and its subdomains — e.g. a hosted checkout. Entries are bare

hostnames only: no scheme, no port, no path (refused at parse

time). Each entry also admits its subdomains.

The allowlist is enforced by the runner outside the LLM (T7), on

every goto and after every act that causes navigation. A

hostile page — or a prompt-injected instruction — can never widen

it at run time: a navigation outside the allowlist fails the step

with spec_step_unresolvable.

Limits

A spec is a reviewable artifact, not a program. Hard ceilings:

| Field | Limit |

|---|---|

| name | 1–120 chars after trimming outer whitespace |

| description | ≤ 500 chars |

| url (top-level and goto) | 1–2048 chars |

| allowed_hosts | ≤ 10 entries |

| steps | 1–30 |

| assertions | ≤ 10 |

| selector (any step) | ≤ 512 chars |

| target | ≤ 200 chars |

| value (act / expect) | ≤ 2048 chars |

| wait.ms | 1–30000 |

| extract.into | identifier, ≤ 64 chars |

| vendor / event | ≤ 64 / ≤ 128 chars |

Assertions

The assertions block is independent of step order and is

evaluated after the flow completes. v1 defines exactly two kinds:

| kind | Fields | Fails when |

|---|---|---|

| no_console_errors | — | Any console error or page error occurred during the flow. |

| beacon_fires | vendor, event? | No matching BeaconEvent was captured during the flow. |

Every assertion has a severity: critical, warning (the

default) or info. Failed critical and warning assertions

become verified findings in the report; info becomes

advisory. Assertion failures never halt the run — the flow

already completed; they grade it.

Variables

{{VARIABLE}} placeholders come from two sources with different

rules:

credential vault ONLY in the value position of fill and

press steps, at the tool boundary — outside the LLM context

(E-S1). Credentials never enter a prompt, and every string that

leaves the run (summaries, details, logs, events) is redacted.

A secret placeholder anywhere else is not resolved.

anywhere a string appears — url, selector, value. They

are NOT secrets: they may appear in summaries and logs.

Exception: url_matches values are treated as a literal regex

and are never substituted.

A spec that references a variable the run cannot supply fails

*before the browser starts* with the credential_rejected

failure class — never mid-flow with a half-completed run.

Failure semantics

Steps are required by default. The contract:

reported as skipped, and the run fails with that step's

failure class.

step records a verified warning finding and the run

continues to the next step.

the only class that alerts. A failed *optional* expect still

emits a verified finding; it just doesn't halt the run.

evidence.

its severity; the run still completes.

Run-failure classes (DX3) drive distinct behavior downstream:

| Class | Meaning | Alert behavior |

|---|---|---|

| credential_rejected | A {{VAR}} the run can't supply, or a login rejected. | Ask the user; never alert. |

| spec_step_unresolvable | Selector/target can't be resolved, navigation blocked by the allowlist, step timeout. | Re-review CTA on the spec; no alert. |

| agent_uncertain | The managed model backend was unavailable while re-resolving a target. | Retry later; no alert. |

| assertion_failed | A required expect failed — the site verifiably misbehaved. | The only class that alerts. |

| quota_exceeded | The run hit its hard LLM-call cap. | Pause the monitor and tell the human (hard-cap contract); no alert. |

Conformance

A consumer of flow-spec v1 MUST:

  1. Validate `spec_version == "1"` and refuse anything else.
  2. Refuse unknown step `type` values and unknown fields; surface
  3. the error to the user verbatim.

    1. Treat steps as required by default; halt on a failed required
    2. step, report the remainder as skipped, continue past failed

      optional steps with a verified warning finding.

      1. Map a failed required `expect` to the `assertion_failed`
      2. run-failure class (DX3) — distinct alert behavior from

        agent_uncertain or credential_rejected.

        1. Resolve variables OUTSIDE the LLM context. The agent's tool
        2. API receives already-resolved values; credentials never enter

          the prompt and only resolve in fill/press values.

          1. Enforce `allowed_hosts` outside the LLM, on every navigation.
          2. Versioning

            • Additive: new step `type`, new `expect` kind, new assertion

            kind, new optional field → v1.x.

            • Breaking: rename, remove, change semantics → v2 with `Sunset`

            header on v1.

            • The JSON Schema is generated from the reference implementation

            and committed; a CI contract test asserts the two never drift.

            Authoring

            Validate against the published schema

            (/docs/specs/flow_spec.schema.json):

            
            import json, jsonschema
            from pathlib import Path
            
            schema = json.loads(Path("flow_spec.schema.json").read_text())
            jsonschema.validate(spec, schema)
            

            The schema enforces shape and limits; cross-field rules (act

            needs selector or target, wait needs exactly one of ms /

            for, value only on fill/press, regex validity) are

            enforced by the reference validator and listed above.

            Changelog

            • **2026-06-11 — v1 finalized.** Semantics corrected before the

            first public consumer: steps are required by default (a failed

            required step halts; optional: true continues with a verified

            warning), expect kinds named url_contains / url_matches /

            text_contains / beacon, allowed_hosts added (runner-

            enforced navigation allowlist), no_404 expect and

            within_steps assertion field dropped. flow_spec.schema.json

            is now generated from the reference implementation.