Vibe testing

Vibe testing is the rapid exploratory mode in AssureAgent. Instead of authoring scenarios one by one, you provide a high-level intent — “stress-test the agent’s hold-music handling”, “find places it leaks PII” — and the platform generates a batch of scenarios on the fly, runs them, and surfaces the most interesting failures.

The Anthropic key

Vibe testing is the only AssureAgent feature that requires an end-user key. You bring your own Anthropic API key; the platform uses it to power the scenario generation.

This is by design. Vibe testing can produce a large volume of generated scenarios on demand; routing those through a key you own keeps your usage and billing transparent.

Adding the key

Open Settings.
In Vibe testing key, paste your Anthropic API key.
Click Save.

The key is stored encrypted at rest and never logged. You can rotate any time; the new key takes effect on the next vibe test.

You don’t need this key for any other AssureAgent feature. Test runs, custom functions, reports — all of those work without it.

Running a vibe test

From the left nav, click Vibe testing.
Pick the target (an existing agent / number / endpoint you want to probe).
Write the intent — 1–3 sentences describing what you want to find.
Pick the batch size — how many scenarios to generate and run.
Click Run.

The platform generates the scenarios using your Anthropic key, runs them sequentially or in parallel (per your batch settings), and produces a summary report.

What the report shows

Vibe test reports group runs into outcomes:

Behavior matched intent — runs that surfaced what you asked about.
Notable but off-topic — runs that found something interesting but unrelated.
No signal — runs that completed without surfacing anything notable.

Click any group to see the underlying runs, full transcripts, and audio. Anything you find worth keeping can be promoted to a permanent scenario for future regression coverage.

When to use vibe testing

Early in agent development when you don’t know what to test for yet.
After a vendor change — quickly probe whether the new model behaves differently.
During pre-launch — find weird edge cases that hand-authored scenarios miss.
When triaging a customer complaint — paste the complaint as the intent, see what the platform reproduces.

When NOT to use vibe testing

For your regression baseline. Hand-authored scenarios are deterministic and reproducible; vibe tests are exploratory and the generated personas vary across runs. Use authored scenarios for the test suite that runs every night.
When you need exact reproducibility. Two vibe-test runs with the same intent will not produce identical scenarios.

Cost

You pay for the Anthropic API usage directly through Anthropic — AssureAgent doesn’t markup or proxy. Generation cost depends on intent complexity and batch size; a typical batch of 10 scenarios costs a few cents in tokens.

The call time for the resulting test runs counts against your AssureAgent test minutes quota the same as any other run.