How it works

The big picture

Every AssureAgent test has two AI agents in the conversation:

Your agent — the one you’re testing. AssureAgent doesn’t see its prompt, weights, or internals. It only talks to it.
Our caller — a persona we materialize from your scenario. It plays the customer.

We make them have a real conversation, on a real phone line (for voice) or a real WebSocket / HTTP channel (for chat), then we record everything and grade it.

flowchart LR
  S[Scenario] --> P[Path generation]
  P --> R[Test run]
  R --> C[Caller LLM]
  C <-.live conversation.-> A[Your agent]
  R --> T[Transcript + audio]
  T --> RPT[Report]

Stage 1 — Scenarios

A scenario describes one situation you want to test. “A frustrated cardholder disputing a charge”, “a new patient scheduling an appointment”, “an IVR pressing 1 for English then 2 to file a claim”. You author scenarios two ways:

Graph designer. A visual canvas of nodes and branches for flows where every turn is scripted.
Free-flow. A natural-language description. We turn that into a structured caller persona at runtime — opening line, objectives, behaviors, edge cases, escalation triggers.

Stage 2 — Test paths

One scenario almost always has multiple realistic conversations. From a free-flow scenario about a billing dispute, the platform generates a handful of distinct paths — one where the customer accepts the explanation, one where they escalate, one where the agent fails to verify them, etc. Each path is a separately runnable test.

Stage 3 — Test runs

When you run a path:

We materialize the caller persona from the scenario description (and any custom functions you declared).
For voice: we dial your agent’s phone number from a number we own. For chat: we open a session against your chat endpoint.
The caller LLM holds the conversation, following the persona’s rules — speak bare values when asked, press digits via the IVR rule, invoke custom functions at the right turns.
We record audio (voice) or messages (chat) end-to-end.

Stage 4 — Reports

After the run completes:

Transcript with role labels and timestamps. For voice tests, an audio player synced to the transcript.
Tool invocations are highlighted inline — including any custom functions the caller called.
Success criteria is evaluated against the conversation; you see a pass / fail with the reasoning.
Aggregations roll up across runs to surface trends — pass rate over time, regressions vs. last week, slowest paths.

Reports can be shared via tokenized links so stakeholders without accounts can see them. More on reports →

Two principles that explain everything

AssureAgent simulates customers, not engineers. We don’t reach into your agent’s code; we test it from outside, the way a real call comes in. This means our tests catch what users actually hit — not what your unit tests cover.
Authoring is the bottleneck, not running. Writing a good scenario is the hard part. Once written, runs are cheap and the platform expands one scenario into many paths automatically. Our docs are heavily focused on getting authoring right.