Why Most AI Projects Go Wrong Before They Start
Most AI agent projects that fail or disappoint do so for reasons that were visible before a line of code was written. A scope that was too vague. Integrations that were assumed but not specified. Success metrics that were never agreed. Edge cases nobody surfaced in the kickoff.
A good scope of work prevents most of these failures. It forces clarity before money changes hands. It creates a shared understanding between client and developer. And — maybe most importantly — it gives both parties a reference point when something gets disputed three months in.
What follows is a complete template for an AI agent SOW. Use it to brief a developer, to evaluate a proposal you've received, or to structure the pre-project conversation that determines whether a build is going to succeed.
AI Agent Scope of Work Template
Section 1: Project Overview
1.1 Background
Describe the business context in two to three paragraphs:
- What does the business do?
- What problem is this agent solving?
- How is this handled today?
- What's prompting this project now?
Example: "Acme Retail is a UK-based online clothing retailer processing 2,000 orders per month. Customer support is currently handled by two part-time staff who spend approximately 35 hours per week responding to support emails. The majority of queries are order status checks, return requests, and product questions. Response times average 8–12 hours. The business is growing and cannot scale support headcount proportionally."
1.2 Project Goal
One sentence. What's the agent supposed to achieve?
Example: "Build and deploy an AI customer support agent that handles at least 65% of inbound support emails automatically, reducing average response time to under 5 minutes."
1.3 Success Criteria
List three to five measurable outcomes that define project success. These become the basis for acceptance at delivery.
| Metric | Target | Measurement method | Measurement window |
|---|---|---|---|
| Deflection rate | ≥65% | System logs | 90 days post-launch |
| Average first response time | <5 minutes | Email timestamp log | 90 days post-launch |
| CSAT score | ≥4.0 / 5 | Post-conversation survey | 90 days post-launch |
| Weekly human hours on in-scope queries | <10 hours | Team time tracking | 90 days post-launch |
Section 2: Agent Scope
2.1 In Scope — What the Agent Handles
List every interaction type the agent is expected to handle. Be specific.
| Interaction type | Description | Resolution path |
|---|---|---|
| Order status query | Customer asks where their order is | Agent retrieves from Shopify, provides tracking link |
| Return request | Customer wants to return an item | Agent checks eligibility, generates return label if eligible |
| Product question | Customer asks about product details | Agent answers from product catalogue |
| Delivery issue | Customer reports non-arrival or damage | Agent logs report, escalates to human team |
2.2 Out of Scope — What the Agent Does NOT Handle
Be equally specific about what the agent won't do. Ambiguity here is what causes scope creep — usually three months in, when someone says "but I assumed the agent would also handle…"
- Complaints involving legal claims or formal dispute processes
- Requests for partial refunds outside the defined return policy
- Custom orders or bespoke product requests
- Any query not listed in the In Scope table above
2.3 Escalation Definition
Define exactly when and how the agent escalates to a human:
| Trigger | Escalation action | Escalation destination |
|---|---|---|
| Query type not in scope | Agent acknowledges, forwards to human queue | Support inbox |
| Customer expresses significant frustration | Agent acknowledges, offers human callback | Priority queue |
| Return outside policy window | Agent explains, offers exception request form | Exceptions inbox |
| Classification confidence below threshold | Agent drafts response for human review | Draft queue |
Section 3: Technical Specification
3.1 Communication Channels
List every channel the agent will operate on:
- Website chat widget (specify platform: Intercom / Crisp / custom)
- Email (specify: inbound address, send-from address)
- WhatsApp (specify: WhatsApp Business account required)
- SMS
- Other: ___
3.2 Integrations Required
List every system the agent needs to connect to, with read/write requirements:
| System | Purpose | Access required | Read / Write |
|---|---|---|---|
| Shopify | Order data, customer data | Admin API key | Read + limited write |
| Royal Mail API | Tracking information | API key | Read only |
| Return portal | Return eligibility and label generation | API key | Read + write |
| Gmail | Inbound email reading and outbound sending | OAuth | Read + write |
3.3 Data and Authentication
How will the agent verify customer identity before accessing account data?
Example: "Customer identity verified by matching email address in the incoming email against the Shopify customer record. For sensitive actions (refund initiation), additional verification via order number required."
3.4 Knowledge Base
What information will the agent use to answer questions?
| Content type | Source | Format | Update frequency |
|---|---|---|---|
| Product information | Shopify product catalogue | Live API | Real-time |
| Return policy | Policy document | Static document | Updated as needed |
| Shipping information | Internal FAQ document | Static document | Monthly |
| Brand FAQs | Existing help centre | Static documents | As needed |
3.5 LLM and Infrastructure
Specify (or ask the developer to specify):
- Which LLM provider and model
- Hosting infrastructure (cloud provider, region)
- Data residency requirements
- Expected query volume (queries per day/month)
- Expected response time requirement (p95 target)
Section 4: Non-Functional Requirements
4.1 Security
- Data encryption requirements (in transit, at rest)
- Authentication and authorisation requirements
- Audit logging requirements
- Third-party data processing agreements
- Penetration testing requirements (if applicable)
4.2 Compliance
List applicable regulatory requirements:
- GDPR / UK GDPR data handling requirements
- Industry-specific requirements (FCA, HIPAA, etc.)
- Consumer protection requirements
4.3 Performance
| Requirement | Target |
|---|---|
| Response time (p50) | <2 seconds |
| Response time (p95) | <5 seconds |
| Uptime | ≥99.5% |
| Error rate | <1% |
4.4 Monitoring and Alerting
Define what monitoring is required and who receives alerts:
- Real-time error rate monitoring — alert if error rate exceeds 5% in any 30-minute window
- Daily performance dashboard accessible to client
- Weekly automated performance report delivered to [email]
- Monitoring access: client has read-only access to all monitoring dashboards
Section 5: Deliverables
Every deliverable the developer is responsible for at the end of the project:
- Deployed AI agent on [channels specified in 3.1]
- All integrations listed in 3.2 connected and tested
- Knowledge base loaded with content listed in 3.4
- Monitoring dashboards configured and accessible to client
- Handover documentation (architecture, how to update knowledge base, how to adjust escalation rules)
- Source code transferred to client repository
- Post-launch support period: [X weeks] of monitoring and rapid response included
Section 6: Timeline and Milestones
| Milestone | Description | Target date |
|---|---|---|
| Kick-off | Scope confirmed, access granted | Week 1 |
| Integration complete | All API connections live in staging | Week ___ |
| Knowledge base complete | All content loaded and reviewed | Week ___ |
| Internal testing complete | Developer testing against test set | Week ___ |
| Client UAT | Client testing period | Week ___ |
| Launch | Agent live in production | Week ___ |
| Post-launch review | 30-day performance review | Week ___ |
Section 7: Ownership and IP
Explicit statements on:
7.1 Code ownership: All code, prompts, configuration, and documentation created for this project is the sole property of [Client name] upon final payment.
7.2 Data ownership: All customer data, conversation data, and knowledge base content is the sole property of [Client name]. The developer has no rights to use this data for any purpose other than delivering the project.
7.3 Third-party licences: List any third-party tools, APIs, or services used, and confirm the client has or will have their own accounts for these.
Section 8: Acceptance Criteria
Define how delivery will be accepted:
Technical acceptance: All integrations pass automated test suite. Error rate below 1% over 48-hour monitoring period.
Functional acceptance: Agent correctly handles [X] representative test scenarios covering all in-scope interaction types.
Performance acceptance: Response time p95 below [X] seconds under [Y] concurrent users.
Success criteria acceptance: Measured against success metrics in Section 1.3 at [30/60/90] days post-launch.
Where Even a Good SOW Fails You
Two honest caveats before you treat this as a complete safety net. First, an SOW only protects you if both sides actually read it after signing. We've watched projects where the SOW was beautifully detailed and then immediately ignored, because the day-to-day conversation drifted into Slack and decisions got made that contradicted what was on paper. The SOW is a reference point, not a contract that runs itself.
Second, the level of specificity this template asks for can paralyse a kickoff if you treat it as a checklist to fill in. The point isn't to answer every question perfectly on day one — it's to surface the questions you can't answer yet, because those are the things that will bite you in week six if they aren't surfaced. A "we don't know yet, we'll figure out by week 2" answer in Section 3.5 is more useful than a confident wrong answer.
Using This Template
Share this template with any AI developer you're evaluating. Ask them to complete Sections 3 and 4 as part of their proposal. The quality and specificity of their answers tells you more about their capability than their portfolio does.
For developers who can't fill in Section 3 specifically — who can't tell you which LLM they plan to use and why, who can't describe the integration architecture — that's important information before you sign anything.
If you want help drafting a scope of work before you decide who to work with — even if you ultimately don't work with us — we're happy to do that.
Talk to us about your project — no commitment, just a conversation.