Every AI Company Has the Same Website
Open ten AI agency websites in tabs and read them side by side. We've done it. They are remarkably interchangeable. "We build cutting-edge AI solutions that transform your business." Same hero shot of an abstract neural network. Same process diagram with arrows pointing to "Discovery → Build → Deploy → Magic." Same "trusted by" logos you've never heard of.
This makes choosing an AI development partner genuinely hard — not because there aren't good companies out there (there are), but because the bad ones have gotten very good at looking like the good ones. The websites won't tell you the difference. The pitch deck won't either. The proposal templates have all been copy-pasted from the same five LinkedIn templates.
What works is asking the right questions on the first call and listening for specifics versus vibes. Below are eight we'd ask if we were the ones hiring. The "what good looks like" answers under each are the rough shape of a real answer — yours don't need to match word for word, but if a vendor can't get within that shape, take that as data.
1. "Can you show me something you built that's in production?"
Not a demo. Not a mockup. Not a case study with no metrics. A real thing, running, that real users are using right now.
Good AI development companies have production work. They can show you a live agent, share performance numbers, or put you on a call with a client who'll talk openly about the project. If the answer is "we have case studies on our website," ask to speak with that client directly. Hesitation is data. Outright refusal is more data.
What good looks like: "Here's a support agent we built for a logistics company. It handles around 800 tickets a month with a 72% deflection rate. Their operations manager is happy to do a quick call if you'd like to hear it from her instead of me."
2. "What goes wrong with AI agents in production, and how do you handle it?"
This is probably the single most useful question in this list. Anyone who has actually shipped AI to production has a war-stories list — models that hallucinated, tools that timed out, edge cases that nobody could have predicted, the one Tuesday the OpenAI API was down for four hours.
A team with real experience answers this fast and specifically. They'll have the story about the time an agent went into a loop, or the edge case where a customer's input broke the qualifier logic, or how they caught a prompt injection attempt before it embarrassed anyone. A team without real experience gives a generic answer about "robust testing" and "comprehensive monitoring." Those phrases have all the texture of a press release because that's where they came from.
What good looks like: "On a recent project we had an agent that would confidently answer questions it had no data for — classic hallucination. We added an explicit uncertainty check before any answer with low confidence got sent, and routed those to a human queue. The hallucination rate went from a problem we were watching to a non-issue inside a week."
3. "Who owns the code when we're done?"
This should be non-negotiable but it often gets glossed over in proposals. Some agencies build on proprietary platforms you cannot export from. Some quietly retain IP over the "core" agent logic. Some charge an ongoing licensing fee for software you already paid them to build.
You should own the code entirely. Full stop. If there are third-party tools in the stack (OpenAI, Anthropic, whatever the database is), the accounts and keys should be yours, not theirs. You should be able to fire your developer and hand the code to anyone competent without it falling over.
What good looks like: "Everything is in your repo from day one. You own it completely. We use standard, well-supported libraries — nothing proprietary — so any competent developer can pick it up and keep going if we get hit by a bus."
4. "How do you measure whether the agent is working?"
If a team can't answer this before they've built anything, they won't be able to answer it after either. Real AI development starts with defining what success looks like in numbers, not adjectives.
For a support agent: deflection rate, CSAT, escalation rate, average resolution time. For a lead follow-up agent: response time, qualification rate, conversion lift. The metrics aren't exotic — what matters is whether the team agreed to them upfront and built toward them.
If the answer is some shape of "we'll monitor it and make adjustments," push harder. Ask: what number would tell you the agent is failing? Silence here is the answer.
What good looks like: "We'd agree on target metrics before we start. For a support agent that usually means 60–70% deflection in the first 90 days, escalation rates under 15%, and a CSAT that matches or beats your current human-handled average. If we miss those, we keep tuning until we hit them — that's part of the engagement, not a separate scope."
5. "How do you handle the cases the agent can't handle?"
Every AI agent, no matter how well built, hits situations it wasn't designed for. What happens in those moments matters more than how the agent behaves on the easy cases.
A well-designed agent escalates gracefully — it recognises its own uncertainty, hands off to a human with full context, and doesn't strand the customer mid-conversation. A badly designed one either guesses (badly, often expensively) or fails silently with a generic "sorry, something went wrong." Ask specifically how the fallback path works. Who gets notified? What information gets passed along? What does the handoff feel like from the customer's side?
What good looks like: "When the agent's confidence drops below a threshold — or it detects an emotional signal or a complex edge case — it hands off to a human in your helpdesk with the full conversation transcript and a one-paragraph summary of what the customer is trying to do. The customer doesn't have to repeat themselves. We've found that's actually more important than the deflection rate for keeping CSAT up."
6. "What's your process for the first 30 days after launch?"
Shipping the agent isn't the end of the project. Month one in production is where you learn what you didn't know during development — real users behave differently from test users, edge cases emerge that nobody thought of, the prompt needs tuning based on the actual conversations happening.
A good team builds this period into the original scope. They watch real conversations, identify where the agent struggles, and adjust prompts and logic until it's stable. A bad team hands over the repo and moves on, which is how you end up running a half-tuned agent for six months wondering why it isn't living up to the pitch deck.
What good looks like: "There's a 4-week stabilisation period after go-live built into our engagement. We review conversations daily for the first two weeks and make adjustments. We share a KPI report at the 30-day mark and tune from there. It's not an optional add-on."
7. "What shouldn't we automate?"
This is the closest thing to a trick question in the list. The right answer is a list of things they'd push back on, not enthusiastic agreement that AI can handle everything.
Ethical AI developers know not everything should be automated. Complaints involving real distress. Medical advice. Legal guidance with binding implications. Situations where the human relationship is genuinely the product. A vendor who tells you AI can handle anything is a vendor optimising for the sale, not your outcome. The best AI developers will push back on parts of your brief — that's a feature, not a bug, and it's the clearest single signal that you're talking to people who actually care whether the project succeeds.
What good looks like: "We'd push back on automating your high-value enterprise sales conversations — the relationship matters too much there and you'd be optimising the wrong thing. And anything involving mental health, crisis situations, or genuine ethical weight should always have a human path. Happy to help you draw that line carefully."
8. "Can we start small?"
A good AI partner is not threatened by a scoped-down first project. They'll actively encourage it. Starting with one focused agent — one workflow, one integration, one clear success metric — lets you validate the approach before betting bigger. It also lets you validate them before betting bigger.
Be cautious of companies that pitch a sweeping, comprehensive AI transformation on the first call before they understand your business. That's an aspirational SOW pretending to be a strategy. The best first AI project is usually narrow, fast, and produces a clear measurable result in six to eight weeks — which then earns the right to a bigger conversation.
What good looks like: "We'd recommend starting with your support FAQ agent. It's the fastest path to a measurable result, and it gives us both a chance to figure out whether we work well together before either of us commits to anything bigger."
The Honest Version
We wrote this list because the AI market is genuinely hard to navigate right now. A lot of money is chasing a hot category, and not everyone chasing it can actually deliver. We've cleaned up after a few projects where the previous vendor handed over a demo dressed up as a product — and the client didn't realise until they tried to scale it.
At Woyce, we work with businesses that want to start specific, measure carefully, and build from there. We show you production work. We tell you when something isn't a good fit. And honestly, the times we've told a prospect not to hire us have ended up being the calls that earned us referrals later.
If that sounds like the kind of team you want to work with, let's talk.
Talk to us about your business — we'll give you honest answers, including if we're not the right choice.