The Agent Is Live. Now What?
Most AI agent projects get a lot of attention during the build — scoping, development, testing, launch. The launch is the milestone. There are demos, announcements, the satisfaction of something going live.
Then the team moves on. The agent runs. Nobody watches it closely. Six months later, someone notices it's giving outdated information, or struggling with a new query type that emerged after launch, or producing responses that feel slightly off ever since you updated your product line.
This is how AI agents quietly become liabilities instead of assets.
Maintenance isn't glamorous. It doesn't generate announcements. But it's what decides whether your agent's performance at month twelve is better than at month one, or visibly worse.
What Changes After Launch
Several things shift in the environment around an AI agent after launch, and each creates maintenance work:
Your content changes. Products get added, removed, or modified. Prices change. Policies get updated. Opening hours shift. The agent's knowledge base reflects a snapshot of your business at the time it was built — and that snapshot ages.
User behaviour evolves. Real users ask things you didn't anticipate. New query types emerge. Seasonal patterns appear. Edge cases arise that your test set didn't cover.
Integrated systems change. The CRM you integrated updates its API. The courier you use changes their tracking URL format. The booking system adds a new field. Each change is a potential point of failure.
The underlying models change. LLM providers update their models periodically. An update that improves average performance can change behaviour on specific queries in ways that need prompt adjustment.
Your business changes. New products, new markets, new policies, new team members to route to — the agent needs to reflect the current state of your business, not the state it was in when you built it.
The Four Types of Maintenance Work
1. Knowledge Base Updates
The most frequent maintenance task is keeping the knowledge base current:
- Adding new product information when you expand your catalogue
- Updating pricing and specifications when they change
- Removing discontinued products or outdated policies
- Adding new FAQ content as new questions emerge from real conversations
- Keeping documents accurate and consistent — contradictions in the knowledge base are a common source of agent errors
Frequency: Weekly or fortnightly for businesses with regularly changing content. Monthly for businesses with stable content.
Effort: Low to moderate depending on content volume. The process should be documented and transferable — someone on your team, not just the developer, should be able to update the knowledge base.
2. Conversation Review and Prompt Tuning
Regular review of real conversations is the most valuable maintenance activity. It surfaces:
- Query types the agent is handling poorly
- Patterns in escalations that could be handled automatically with prompt adjustment
- Responses that are technically correct but feel off-brand or unhelpful
- New query categories that have emerged since launch
Prompt tuning based on conversation review improves performance continuously. An agent reviewed and tuned monthly for a year will perform dramatically better than one left untouched.
Frequency: Weekly in the first 90 days, then fortnightly or monthly as the agent stabilises.
Effort: 1–3 hours per review cycle for a focused agent. The reviewer reads a sample of 30–50 conversations, categorises issues, and implements prompt adjustments.
3. Integration Maintenance
The integrations connecting your agent to your systems are the most fragile part of the architecture. Third-party APIs change without notice. Authentication tokens expire. Rate limits get hit unexpectedly. Webhook endpoints go down.
Integration maintenance involves:
- Monitoring for failures and responding when they happen
- Updating API integrations when providers release new versions or deprecate old ones
- Testing integrations periodically against real data to confirm accurate information is coming back
- Managing API keys and tokens, including rotation when required
Frequency: Monitoring should be continuous. Active review of integration health monthly.
Effort: Low in stable periods, potentially significant during a major API change. We've seen a courier API change at a peak-season Monday morning consume an entire week — worth budgeting for.
4. Performance Monitoring
Quantitative monitoring tracks whether the agent is meeting its performance targets:
- Response volume and deflection rate
- CSAT scores from post-conversation surveys
- Escalation rate and escalation categories
- Response time
- Error rate
- Knowledge base hit rate (what percentage of queries find relevant content)
Monitoring without action is not useful. The value is in spotting when a metric moves in the wrong direction and investigating why.
Frequency: Dashboard review weekly. Automated alerts for threshold breaches in real time.
Effort: Low once dashboards are set up. The work is in the investigation and remediation when alerts fire.
What Happens If You Don't Maintain
The consequences of neglected maintenance compound over time:
Month 1–3: The agent performs well. It was built well and the business hasn't changed much yet. Maintenance seems unnecessary.
Month 4–6: A few products have changed. The knowledge base is slightly out of date. Some responses are subtly wrong. CSAT starts drifting down. Nobody has reviewed the conversations yet.
Month 7–9: A major product line changed significantly. The agent is confidently providing wrong information about it. Integration with the courier API broke silently three weeks ago — the agent has been saying "tracking information unavailable" for every order. Escalation rate has doubled. The support team is frustrated.
Month 10–12: The business has grown. New use cases have emerged. The original developer has moved on. Nobody fully understands the system. The agent gets described internally as "not working well." Plans are made to replace it.
The agent isn't broken. It just wasn't maintained.
Who Should Own Maintenance
Knowledge base updates should be owned internally — by whoever owns your content, products, or policies. They know when things change and can update the knowledge base directly. This needs a documented process and someone who understands how to add and edit content in the system.
Conversation review and prompt tuning is best done jointly — your team identifies issues (they know what good responses should look like), the developer implements the prompt changes (they know how to do it without breaking other things).
Integration maintenance typically needs developer involvement. API changes and authentication issues require someone who understands the technical architecture.
Performance monitoring should be shared — dashboards accessible to your team, and alerts routed to whoever will actually act on them.
Where Maintenance Goes Wrong Even With a Cadence
Two honest caveats worth naming. First, "maintenance" sometimes becomes a euphemism for "watching dashboards while doing nothing." A team will set up the alerts, review the metrics weekly, and never actually open a conversation transcript. Monitoring without sample review misses the most useful information, because the problems that matter often don't move the headline numbers in the first month. Read the conversations.
Second, ownership drift is the silent killer. The person responsible for knowledge base updates leaves the company; the courier API change happens during a quarter when nobody had been assigned monitoring; the developer who built the agent moves on without a proper handover. We've inherited several "the agent isn't working" projects where the actual diagnosis was "nobody has touched it in nine months." Bake ownership into the role, not the person.
Building a Maintenance Cadence
A sustainable maintenance schedule for most agents:
Weekly:
- Review automated monitoring alerts
- Check key performance metrics (deflection rate, CSAT, escalation categories)
- Review a sample of 20–30 conversations, flag issues
Monthly:
- Update knowledge base with content changes from the past month
- Implement prompt adjustments based on conversation review findings
- Review integration health
- Check API deprecation notices for upcoming changes
Quarterly:
- Review overall performance trends against targets
- Assess whether the agent's scope should expand or contract
- Comprehensive knowledge base audit
- Evaluate new capabilities that have become relevant
What a Maintenance Retainer Includes
If you're working with an AI development team on an ongoing basis, a maintenance retainer should explicitly cover:
- Conversation review sessions (how many per month, how many conversations reviewed)
- Knowledge base update capacity (how many update requests per month)
- Integration monitoring and response time SLA
- Prompt tuning as needed based on review findings
- Monthly performance report
- Priority response for production issues
Be specific about what's included and what triggers additional billing. Retainers that are vague about scope tend to end in either underdelivery or billing arguments — we've seen both.
The Compounding Return on Maintenance
A properly maintained AI agent doesn't plateau at launch-day performance. It improves continuously as:
- The knowledge base grows more complete
- The prompts are tuned to handle more query types well
- New capabilities get added as they become relevant
- The system reflects patterns from real conversations
The agent at month twelve should be meaningfully better than at month one — faster, more accurate, handling a wider scope, with fewer escalations. That compounding improvement is only possible with consistent maintenance.
Without it, the agent declines as its knowledge ages and the environment changes around it.
If you want help building a maintenance cadence that actually gets executed — or fixing one that's already drifting — we'd be happy to map it out with you.
Talk to us about maintaining your AI agent — no commitment, just a conversation.