The Problem with AI ROI Claims
Every AI vendor has a case study. Every case study claims transformational results. Most of them are vague, unverifiable, or quietly cherry-picked from the best-case scenario across hundreds of deployments.
"40% reduction in support costs." No methodology. No baseline. No timeframe. Cool.
This article is meant to be different. These are patterns we see consistently across real deployments, with specific numbers, specific timeframes, and honest notes on what conditions are needed to get them. Where we've described specific scenarios, they're representative of real client types and real outcomes — not outliers, not best cases, not the one project that went unusually well.
What Actually Happens in the First 90 Days
Most businesses deploying their first AI agent go through three distinct phases. Skipping any of them tends to compress the timeline you wanted and stretch the one you actually get.
Days 1–14: Calibration. The agent goes live. Real users interact with it. You discover the edge cases you didn't anticipate in testing. The agent handles most things well, struggles with a handful of scenarios, and escalates more than you expected. This is normal. The calibration period is where you tune — it's the worst time to disengage and assume the project's done.
Days 15–45: Stabilisation. The main edge cases have been addressed. Escalation rate drops. The agent's responses get more consistent. Users start trusting it — they engage with it directly rather than trying to find a human first. The metrics start showing a clear pattern.
Days 46–90: Optimisation. You know what works and what doesn't. You expand scope carefully — adding query types the agent can handle, tightening escalation triggers, connecting additional data sources. ROI becomes clearly measurable.
Scenario 1: E-commerce Support Agent
Business type: Online retailer, 800–1,200 orders per month
Problem: Support inbox handling 600–900 tickets per month, primarily order status queries, return requests, and product questions. Two part-time support staff spending most of their time on repetitive queries.
Agent scope: Order status lookups connected to fulfilment system, return eligibility checks, product FAQ answers, escalation for disputes and complex cases.
Results at 90 days:
| Metric | Before | After | Change |
|---|---|---|---|
| Tickets resolved without human | ~8% | 64% | +56pp |
| Average first response time | 6.2 hours | 38 seconds | -99% |
| Support staff hours/week on tier-1 | 28 hours | 9 hours | -68% |
| Customer satisfaction score | 3.8/5 | 4.4/5 | +16% |
| Monthly support operational cost | ~$3,800 | ~$1,200 | -68% |
Payback period: The agent build cost $7,500. Monthly savings of ~$2,600. Full payback in 2.9 months.
What drove the result: Clean integration with the fulfilment system was the critical factor. The agent could give real answers about real orders, not generic responses. Without live data access, the deflection rate would have been much lower — probably half this.
Scenario 2: B2B Lead Follow-Up Agent
Business type: SaaS company, selling to SMBs, 150–200 inbound leads per month via website forms
Problem: Average response time to new leads was 4.5 hours. After-hours leads (roughly 35% of total) weren't contacted until the next morning. Conversion from lead to booked demo call was 8%.
Agent scope: Immediate response to form submissions, three qualifying questions, routing of hot leads to sales team with urgency flag, automated follow-up sequence for warm leads over 14 days.
Results at 90 days:
| Metric | Before | After | Change |
|---|---|---|---|
| Average lead response time | 4.5 hours | 52 seconds | -99% |
| After-hours leads contacted same day | 22% | 100% | +78pp |
| Lead to demo conversion rate | 8% | 14% | +75% |
| Sales team time on lead admin | ~12 hrs/week | ~3 hrs/week | -75% |
| Monthly demos booked | 12–16 | 22–28 | +65% |
Payback period: Agent build $9,500. Additional demos generated: roughly 10–12/month. At the company's close rate and deal value, incremental revenue covered the build cost within 6 weeks.
What drove the result: Speed, mostly. The same leads, handled faster, converted at close to double the rate. The content of the agent's messages mattered less than the timing — though it still mattered.
Scenario 3: Healthcare Practice — Admin Agent
Business type: GP practice, 4 GPs, 3,500 registered patients
Problem: Reception team handling 80–100 calls per day. Roughly 65% were appointment bookings, appointment queries, and FAQ-type questions about the practice. Staff burnout was high. Patients frequently waited on hold.
Agent scope: Appointment booking via website and WhatsApp, appointment reminders with reschedule option, FAQ responses (opening hours, prescription request process, referral status queries), escalation to reception for clinical queries.
Results at 90 days:
| Metric | Before | After | Change |
|---|---|---|---|
| Calls requiring receptionist per day | 80–100 | 30–40 | -60% |
| Average hold time | 4.2 minutes | 0 (for agent-handled queries) | N/A |
| No-show rate | 18% | 11% | -39% |
| After-hours appointment requests handled | 0% | 100% | — |
| Reception staff admin hours freed/week | — | ~18 hours | — |
Payback period: At a cost of £8,000 to build, the practice recovered that cost within the first 3 months in reclaimed staff time alone. The no-show reduction generated meaningful revenue recovery too — each avoided no-show kept a billable appointment slot filled.
What drove the result: Appointment reminder automation with easy reschedule was the biggest single win. Patients who would have simply not shown up instead rescheduled, keeping the practice's calendar full.
Scenario 4: Professional Services — Internal Knowledge Agent
Business type: 45-person consulting firm
Problem: Consultants were spending real time looking for internal documents, process guides, and client templates — or asking colleagues who had to stop their own work to answer. Estimated 2–3 hours per consultant per week on internal knowledge hunting.
Agent scope: Trained on internal document library (proposals, process guides, templates, past project summaries), integrated into Slack as a bot, natural language queries only, no system actions.
Results at 90 days:
| Metric | Before | After | Change |
|---|---|---|---|
| Time spent on internal knowledge queries | ~2.5 hrs/consultant/week | ~40 min | -73% |
| "Pinging a colleague" for internal info | Daily for most staff | Rare | Significant |
| Document retrieval accuracy | N/A (manual search) | 84% first-try accuracy | — |
| Staff satisfaction with internal tools | 2.9/5 | 4.1/5 | +41% |
Value calculation: 45 consultants × 1.8 hours saved per week × 48 working weeks = 3,888 consultant-hours per year recovered. At an average billing rate, that's meaningful leverage.
What drove the result: Document quality mattered more than anything else. Firms with well-organised, up-to-date internal documentation got dramatically better results than those with outdated or inconsistently formatted docs. The AI is only as good as the information it has access to — and there's no clever workaround.
What These Scenarios Have in Common
Looking across these deployments, a few patterns show up consistently:
Speed beats sophistication every time. In lead follow-up and customer support, response time is the single biggest driver of outcomes. A fast, simple agent outperforms a sophisticated slow one.
Live data access is the line between useful and useless. Agents that can look up real information — real orders, real availability, real account data — deflect far more than agents that can only answer from static content.
The first 14 days are the hardest. Every deployment surfaces edge cases that weren't anticipated. The businesses that see the best 90-day results are the ones that engage actively in the calibration period — not the ones that go live and wait. We can tell within the first week which group a client is going to land in.
Start narrow. The highest-ROI deployments in each scenario had a clearly scoped first phase. Not "handle everything" — "handle these six specific query types well, and escalate everything else."
And the honest caveat we'd attach to all of the above: these results assume reasonably clean underlying data. If your orders, leads, or documents are a mess, you'll spend the first month of any project cleaning that up before you see the headline numbers. We've watched it cut both ways.
What You Can Expect From Your First AI Agent
Based on these patterns, here's a realistic expectation for a well-scoped first deployment:
- Response time improvement: Near-instant for covered query types (hours to seconds)
- Deflection or automation rate: 50–70% of in-scope volume within 90 days
- Payback period: 2–4 months for most deployments, faster for high-volume scenarios
- Staff time recovered: 40–70% of time previously spent on the automated workflows
These are not guarantees. They're what well-built, well-scoped agents consistently deliver. Poorly scoped agents, built without clear success metrics, deliver much less and feel like a sunk cost six months in.
Ready to See What Your Numbers Could Look Like?
The best way to estimate your ROI is to map it against your actual volumes and costs. That's a 30-minute conversation, not a long sales process.
Talk to us about your business — we'll give you an honest projection based on your real numbers, including the cases where the math doesn't work and we'd rather tell you so.