Woyce

AI Development

How We Built a Support Agent for an E-commerce Brand: A Case Study

An AI support agent case study: a UK fashion retailer spent 40 hours a week on support emails. We built one that handled 71% — the build and 90-day numbers.

Woyce Technologies

AI & Engineering Team

Published Apr 10, 2026Reading minTopic AI Development

The Client

A UK-based fashion retailer selling through their own website and two marketplaces. Monthly order volume: 1,800–2,400 orders. Support team: two part-time staff plus the founder handling overflow.

They came to us in January with a clear problem: support was consuming 35–40 hours per week across the team, and the volume was growing faster than they could manage. Their average email response time was 11 hours. Reviews were starting to mention slow support. The founder was spending Sunday evenings answering order queries instead of working on the business.

Their ask: build something that reduces the manual support load without making the customer experience worse. That second clause mattered a lot to them — and to us.

The Scope We Agreed

Before writing a line of code, we spent a week with the client mapping their actual support volume. They gave us access to their support inbox and we categorised every email received over the previous 30 days.

The results:

Category	Volume	% of total
Order status / tracking	312	34%
Return requests	187	21%
Product questions (size, material, care)	143	16%
Delivery issues (damaged, missing)	96	11%
Account queries (login, address change)	74	8%
Other / miscellaneous	91	10%

The first four categories — 82% of volume — had clear, automatable resolution paths. We scoped the agent to handle those. We left "other / miscellaneous" and any delivery issues requiring compensation to the human team.

Agreed success metric before build: 65% deflection rate within 90 days of launch.

What We Built

Integration Layer

The agent integrates with:

Shopify — for order data, tracking information, customer details, and order status
Royal Mail and DPD APIs — for real-time tracking data
Return portal — their existing returns management system (a third-party tool) via API
Gmail — reading inbound support emails and sending replies via their support address

The email integration was the most complex piece. We built a system that reads new emails in the support inbox, classifies the query type, retrieves relevant context from Shopify and courier APIs, generates a response, and either sends it automatically (for high-confidence resolutions) or drafts it for human review (for lower-confidence or policy-edge cases).

The Classification System

Every incoming email is classified before any response is attempted. The classifier uses the email subject, body, and any Shopify order data linked to the customer's email address to determine:

Query category (from our taxonomy above)
Confidence score (how certain we are about the classification)
Recommended action (auto-respond, draft for review, escalate immediately)

Auto-respond threshold: 85%+ confidence on categories we know the agent handles well (order status, return eligibility). Below that, it drafts for human review. The threshold was deliberately conservative — we'd rather have a human approve a perfect response than send a confident wrong one.

Response Generation

For order status queries: the agent retrieves the order from Shopify, gets the current tracking status from the relevant courier API, and generates a response that includes the specific tracking link, last scan location, and estimated delivery date. The response is personalised — it uses the customer's name and references the specific items ordered.

For return requests: the agent checks the order date against the return policy (28 days from delivery), determines eligibility, and if eligible, generates a return label from the returns portal and sends it with instructions. If outside the window, it explains clearly and provides the contact for exception requests.

For product questions: the agent searches the product catalogue for the relevant item and answers from the product specifications. Size guide queries reference the actual measurement tables. Care instructions come from the product metadata.

Tone and Style Calibration

We spent more time on this than clients typically expect. The client had a distinct brand voice — warm, direct, slightly irreverent. Generic AI responses would have felt off-brand and undermined the customer experience they'd worked hard to build.

We gave the classifier 50 examples of good and bad responses from the existing support inbox, with annotations explaining what made each one good or bad. That informed the prompt design and gave us a reference set for tone evaluation during testing.

The Human Review Queue

Not everything is auto-sent. The agent drafts responses for:

Classifications below the confidence threshold
Return requests outside standard policy (partial returns, condition disputes)
Delivery issues involving potential compensation
Any email containing words indicating distress or dissatisfaction with the brand

The human team sees a queue in their inbox tool with draft responses pre-populated. For most drafts, they read, approve, and send in under 30 seconds. The workload shifts from reading-researching-writing to reviewing-and-approving.

What Broke in the First Three Weeks

Problem 1: Marketplace order numbers. Customers who ordered through the marketplaces (not the direct site) used marketplace order numbers in their emails. Our Shopify integration looked up orders by Shopify order ID or customer email. Marketplace order IDs didn't match.

Fix: Added a mapping layer that extracts marketplace order numbers from email body text and looks them up via the marketplace APIs.

Problem 2: Bundle product descriptions. Some orders included bundle items — three products listed as one SKU. The agent was describing the bundle SKU number when asked about specific items within the bundle, which was confusing for customers.

Fix: Updated the product catalogue mapping to expand bundle SKUs into their component products before the response generation step.

Problem 3: Overly formal tone on escalations. When the agent escalated a query to the human queue, it sent the customer a holding message. The initial version sounded corporate and cold — "Your enquiry has been received and will be addressed by a member of our team."

The founder flagged this immediately: "That sounds like it's from a bank, not us."

Fix: Rewrote the holding messages in the client's brand voice. Small change, big difference in how escalations felt to customers.

None of these were catastrophic, but all three reinforced something we already believed: shadow mode and tight monitoring in the first month catch the problems that demos never will.

The Results at 90 Days

Metric	Before	After (90 days)	Change
Auto-resolved without human	~5%	71%	+66pp
Average first response time	11.2 hours	4.1 minutes	-98%
Weekly support hours (human)	37–40 hours	9–12 hours	-74%
Customer satisfaction (CSAT)	3.9 / 5	4.5 / 5	+15%
Negative reviews mentioning support	3–4/month	0–1/month	-75%

We exceeded the agreed 65% deflection target by week eight (68%) and reached 71% by week twelve as we tuned the confidence thresholds and expanded product question coverage.

The CSAT improvement was the most surprising result. We'd expected deflection to improve satisfaction (faster responses) but not to that degree. Post-survey comments consistently mentioned speed — "got an answer in minutes, amazing" — and personalisation — "the reply actually referenced my specific order."

The founder's Sunday evenings are no longer spent in the support inbox.

The Cost and Payback

Build cost: £9,500

Monthly running cost: £220 (hosting, API costs, email processing)

Monthly maintenance retainer: £650 (weekly review, prompt updates as the product catalogue changes, integration maintenance)

Total first-year cost: £9,500 + (12 × £870) = £19,940

Value from time saved: 25–28 hours per week recovered at an effective rate of £18/hour = approximately £24,000 per year in recovered productive time.

Additional value: Reduction in negative reviews and associated brand damage. Not easily quantifiable but clearly meaningful for a direct-to-consumer brand.

Payback: approximately 5 months.

What We Would Do Differently

Start with better tone calibration. We got to the right place on tone, but the holding-message issue should have been caught in testing, not after launch. Tone review across every message type — not just the primary response — belongs on the pre-launch checklist.

Build the marketplace order lookup earlier. It was a predictable requirement given their sales channels. We should have scoped it into the build rather than retrofitting it in week two.

Set up monitoring dashboards on day one. We had logging from launch, but the client-facing dashboard took three weeks to build. The first three weeks of operational data were available but not easily visible to the client. Early visibility accelerates the tuning cycle and we know that now.

A fair caveat: this kind of result is realistic for businesses with clean order data and well-defined policies. If the underlying systems are messy — duplicate customers, inconsistent product data, undocumented exceptions — you'll spend half the project cleaning that up, and the deflection numbers come more slowly.

If This Sounds Like Your Business

The pattern we built — email triage, order lookup, return processing, auto-response with human review queue — is repeatable across e-commerce businesses at this scale. The specific integrations vary; the architecture is consistent.

Talk to us about your support volume — we'll map your current inbox against what the agent can handle and give you a realistic projection of what deflection rate you could expect. If the numbers don't work, we'll tell you.

AI support agent case studyecommerce AI supportAI customer service case studybuild AI support agentecommerce chatbot case studyAI agent results

Woyce Technologies

AI & Engineering Team · Woyce

Woyce Technologies builds AI chatbots, LLM integrations, voice AI, and full-stack web applications for businesses in the US, UK, Europe & APAC. Based in Rajkot, Gujarat.

READY TO BUILD?

Let's build something
that actually works.

Tell us about your project. We'll be honest about whether we're the right fit — and if we are, we move fast.

Talk to us about your business →Explore our AI services

AI Development

How We Built a Support Agent for an E-commerce Brand: A Case Study

An AI support agent case study: a UK fashion retailer spent 40 hours a week on support emails. We built one that handled 71% — the build and 90-day numbers.

Woyce Technologies

AI & Engineering Team

Published Apr 10, 2026Reading minTopic AI Development

The Client

A UK-based fashion retailer selling through their own website and two marketplaces. Monthly order volume: 1,800–2,400 orders. Support team: two part-time staff plus the founder handling overflow.

Their ask: build something that reduces the manual support load without making the customer experience worse. That second clause mattered a lot to them — and to us.

The Scope We Agreed

The results:

Category	Volume	% of total
Order status / tracking	312	34%
Return requests	187	21%
Product questions (size, material, care)	143	16%
Delivery issues (damaged, missing)	96	11%
Account queries (login, address change)	74	8%
Other / miscellaneous	91	10%

Agreed success metric before build: 65% deflection rate within 90 days of launch.

What We Built

Integration Layer

The agent integrates with:

Shopify — for order data, tracking information, customer details, and order status
Royal Mail and DPD APIs — for real-time tracking data
Return portal — their existing returns management system (a third-party tool) via API
Gmail — reading inbound support emails and sending replies via their support address

The Classification System

Every incoming email is classified before any response is attempted. The classifier uses the email subject, body, and any Shopify order data linked to the customer's email address to determine:

Query category (from our taxonomy above)
Confidence score (how certain we are about the classification)
Recommended action (auto-respond, draft for review, escalate immediately)

Response Generation

Tone and Style Calibration

The Human Review Queue

Not everything is auto-sent. The agent drafts responses for:

Classifications below the confidence threshold
Return requests outside standard policy (partial returns, condition disputes)
Delivery issues involving potential compensation
Any email containing words indicating distress or dissatisfaction with the brand

What Broke in the First Three Weeks

Fix: Added a mapping layer that extracts marketplace order numbers from email body text and looks them up via the marketplace APIs.

Fix: Updated the product catalogue mapping to expand bundle SKUs into their component products before the response generation step.

The founder flagged this immediately: "That sounds like it's from a bank, not us."

Fix: Rewrote the holding messages in the client's brand voice. Small change, big difference in how escalations felt to customers.

None of these were catastrophic, but all three reinforced something we already believed: shadow mode and tight monitoring in the first month catch the problems that demos never will.

The Results at 90 Days

Metric	Before	After (90 days)	Change
Auto-resolved without human	~5%	71%	+66pp
Average first response time	11.2 hours	4.1 minutes	-98%
Weekly support hours (human)	37–40 hours	9–12 hours	-74%
Customer satisfaction (CSAT)	3.9 / 5	4.5 / 5	+15%
Negative reviews mentioning support	3–4/month	0–1/month	-75%

We exceeded the agreed 65% deflection target by week eight (68%) and reached 71% by week twelve as we tuned the confidence thresholds and expanded product question coverage.

The founder's Sunday evenings are no longer spent in the support inbox.

The Cost and Payback

Build cost: £9,500

Monthly running cost: £220 (hosting, API costs, email processing)

Monthly maintenance retainer: £650 (weekly review, prompt updates as the product catalogue changes, integration maintenance)

Total first-year cost: £9,500 + (12 × £870) = £19,940

Value from time saved: 25–28 hours per week recovered at an effective rate of £18/hour = approximately £24,000 per year in recovered productive time.

Additional value: Reduction in negative reviews and associated brand damage. Not easily quantifiable but clearly meaningful for a direct-to-consumer brand.

Payback: approximately 5 months.

What We Would Do Differently

Build the marketplace order lookup earlier. It was a predictable requirement given their sales channels. We should have scoped it into the build rather than retrofitting it in week two.

If This Sounds Like Your Business

AI support agent case studyecommerce AI supportAI customer service case studybuild AI support agentecommerce chatbot case studyAI agent results

Woyce Technologies

AI & Engineering Team · Woyce

Woyce Technologies builds AI chatbots, LLM integrations, voice AI, and full-stack web applications for businesses in the US, UK, Europe & APAC. Based in Rajkot, Gujarat.

READY TO BUILD?

Let's build something
that actually works.

Tell us about your project. We'll be honest about whether we're the right fit — and if we are, we move fast.

Talk to us about your business →Explore our AI services

How We Built a Support Agent for an E-commerce Brand: A Case Study

The Client

The Scope We Agreed

What We Built

Integration Layer

The Classification System

Response Generation

Tone and Style Calibration

The Human Review Queue

What Broke in the First Three Weeks

The Results at 90 Days

The Cost and Payback

What We Would Do Differently

If This Sounds Like Your Business

Related guides

Woyce Technologies

More from theWoyce engineering desk.

Top 7 AI Agent Development Companies in 2026

Hire a Freelance AI & Chatbot Developer in India (2026 Guide)

Freelance AI Developer in Rajkot: Chatbots, Agents & LLM Integration

Let's build somethingthat actually works.

How We Built a Support Agent for an E-commerce Brand: A Case Study

The Client

The Scope We Agreed

What We Built

Integration Layer

The Classification System

Response Generation

Tone and Style Calibration

The Human Review Queue

What Broke in the First Three Weeks

The Results at 90 Days

The Cost and Payback

What We Would Do Differently

If This Sounds Like Your Business

Related guides

Woyce Technologies

More from theWoyce engineering desk.

Top 7 AI Agent Development Companies in 2026

Hire a Freelance AI & Chatbot Developer in India (2026 Guide)

Freelance AI Developer in Rajkot: Chatbots, Agents & LLM Integration

Let's build somethingthat actually works.

More from the
Woyce engineering desk.

Let's build something
that actually works.

More from the
Woyce engineering desk.

Let's build something
that actually works.