How it works
The big picture
Every AssureAgent test has two AI agents in the conversation:
- Your agent — the one you’re testing. AssureAgent doesn’t see its prompt, weights, or internals. It only talks to it.
- Our caller — a persona we materialize from your scenario. It plays the customer.
We make them have a real conversation, on a real phone line (for voice) or a real WebSocket / HTTP channel (for chat), then we record everything and grade it.
flowchart LR S[Scenario] --> P[Path generation] P --> R[Test run] R --> C[Caller LLM] C <-.live conversation.-> A[Your agent] R --> T[Transcript + audio] T --> RPT[Report]Stage 1 — Scenarios
A scenario describes one situation you want to test. “A frustrated cardholder disputing a charge”, “a new patient scheduling an appointment”, “an IVR pressing 1 for English then 2 to file a claim”. You author scenarios two ways:
- Graph designer. A visual canvas of nodes and branches for flows where every turn is scripted.
- Free-flow. A natural-language description. We turn that into a structured caller persona at runtime — opening line, objectives, behaviors, edge cases, escalation triggers.
Stage 2 — Test paths
One scenario almost always has multiple realistic conversations. From a free-flow scenario about a billing dispute, the platform generates a handful of distinct paths — one where the customer accepts the explanation, one where they escalate, one where the agent fails to verify them, etc. Each path is a separately runnable test.
Stage 3 — Test runs
When you run a path:
- We materialize the caller persona from the scenario description (and any custom functions you declared).
- For voice: we dial your agent’s phone number from a number we own. For chat: we open a session against your chat endpoint.
- The caller LLM holds the conversation, following the persona’s rules — speak bare values when asked, press digits via the IVR rule, invoke custom functions at the right turns.
- We record audio (voice) or messages (chat) end-to-end.
Stage 4 — Reports
After the run completes:
- Transcript with role labels and timestamps. For voice tests, an audio player synced to the transcript.
- Tool invocations are highlighted inline — including any custom functions the caller called.
- Success criteria is evaluated against the conversation; you see a pass / fail with the reasoning.
- Aggregations roll up across runs to surface trends — pass rate over time, regressions vs. last week, slowest paths.
Reports can be shared via tokenized links so stakeholders without accounts can see them. More on reports →
Two principles that explain everything
- AssureAgent simulates customers, not engineers. We don’t reach into your agent’s code; we test it from outside, the way a real call comes in. This means our tests catch what users actually hit — not what your unit tests cover.
- Authoring is the bottleneck, not running. Writing a good scenario is the hard part. Once written, runs are cheap and the platform expands one scenario into many paths automatically. Our docs are heavily focused on getting authoring right.