The Market Is Full of Confident Amateurs
The AI development market has grown faster than the talent pool. Demand for AI agents, LLM integrations, and chatbot work has outpaced the supply of teams that can actually ship production-quality systems.
The gap is filled by agencies that rebranded from web dev to AI dev with no real change in capability, freelancers whose entire AI résumé is a ChatGPT wrapper, and offshore teams whose proposals look polished while their production work doesn't.
Spotting the difference before you sign anything is the skill this piece is trying to give you. What follows is the specific red flags — in proposals, in demos, in conversations — that we've consistently watched precede disappointing or outright failed AI projects. (For full disclosure: we're an AI development team. We'd rather lose a project to a competent competitor than win one we can't deliver well, so this list is also a tool we hope our prospective clients use on us.)
Red Flag 1: The Demo Works, But They Can't Explain Why
Every developer has a demo. The demo always works. What reveals technical depth is whether they can explain the architecture behind it.
After any demo, ask: "Can you walk me through how this actually works? What happens between the user sending a message and the response appearing?"
A team that can ship production AI should be able to explain:
- How the query is processed (what LLM, what prompt structure)
- How relevant information is retrieved (vector search? what vector database? why?)
- How the response is generated and filtered
- What happens when the LLM API is slow or unavailable
The red flag: Vague answers. "We use advanced AI techniques." "It's powered by GPT." "We have a proprietary system." Genuine technical depth produces specific, confident answers — including admissions of where a trade-off was made.
Red Flag 2: No Production References
A demo is easy. A live system handling real users in production is not.
Ask specifically: "Can you show me an AI agent you've built that's currently in production, handling real users?" Then ask: "Can I speak with that client?"
The combination — live production work plus a reference you can actually contact — is the single most reliable signal of real capability. Teams that can't provide it have almost certainly not shipped production AI.
The red flag: "We have several projects in progress." "Our clients prefer to stay confidential." "We can share case studies." Case studies written by the developer are not references. A client you can call is a reference.
Red Flag 3: They Agree With Everything You Say
AI development involves real constraints. Timelines, scope, technical complexity, edge cases, compliance — all of these create friction between what a client wants and what's achievable.
A competent developer pushes back when something isn't possible, isn't advisable, or isn't realistic. They tell you that a 4-week timeline for a complex integration won't happen. They tell you the feature you want to add mid-project will add three weeks. They tell you your proposed use case has a compliance problem you need to address before building.
The red flag: A developer who agrees with everything. Never challenges your assumptions. Says your timeline is fine, your scope is achievable, your idea is great — with no qualifications. That developer is managing the sale, not the project. We've watched several clients come to us after exactly this experience, six months into a build that was always going to overrun.
Red Flag 4: "It's Fully Automated" Without Qualification
No AI agent is fully autonomous. Every well-built AI system has:
- Scenarios where it escalates to a human
- Edge cases it handles with explicit uncertainty
- Topics that are explicitly out of scope
- A monitoring and review process for catching errors
A developer who claims their agent is "fully automated," "handles everything automatically," or "never needs human involvement" is either describing a very narrow scope you haven't fully understood, or overpromising.
The red flag: Automation claims without scope qualifications. Ask: "What does the agent do when it encounters something it can't handle?" If there's no clear escalation path, no uncertainty handling, no explicit out-of-scope definition — the agent has not been designed for production.
Red Flag 5: The Proposal Has No Success Metrics
A professional AI development proposal defines what success looks like — specific, measurable outcomes agreed before work begins.
Deflection rate. Response time. CSAT. Human hours saved. Cost per resolved ticket. These are the numbers that tell you whether the agent is working.
A proposal without success metrics is a proposal without accountability. If there's no agreed definition of success, any delivered system can be declared a success.
The red flag: A proposal that describes features, timelines, and cost — but not measurable outcomes. Ask: "What metrics will we use to evaluate whether this is performing?" If the developer can't answer, or gives you vague qualitative measures, that's a problem you'll feel later.
Red Flag 6: Security and Compliance Are Afterthoughts
When you ask about security, data handling, and compliance, listen carefully to when these topics appear in the conversation.
Experienced AI developers raise compliance and security proactively. They ask about your data handling requirements, your user base (are there vulnerable users?), your regulatory environment. They design the architecture around those requirements from the start.
Inexperienced developers address security when prompted — and then only superficially. "We follow best practices." "It's secure." These phrases mean nothing without specifics.
The red flag: Security and compliance discussed only when you raise them, or addressed with vague reassurances. For any AI system handling sensitive data, ask: "Who can access the conversation data? Where is it stored? What happens if I want to delete a user's data? What's the system prompt and who can see it?" The quality of those answers tells you most of what you need to know.
Red Flag 7: The Cheapest Quote Came With the Fastest Timeline
Price and timeline are the two dimensions clients most often optimise on. They're also the two most commonly manipulated to win projects.
A developer who quotes the lowest price and the shortest timeline has usually made one of two calculations: they're planning to cut scope quietly, or they've underestimated the work and will ask for more money later.
Real AI development has real costs. Scoping takes time. Integrations take time. Testing against actual edge cases takes time. Monitoring and tuning after launch takes time.
A 3-week, £2,500 proposal for a multi-integration agent with conversation memory and CRM sync is either narrower in scope than you think, or it isn't going to ship as described.
The red flag: The combination of lowest price AND fastest timeline. Either alone can be legitimate. Together they're almost always a sign that something's off — in scope understanding, in experience, or in intent.
Where the "Red Flag" Framing Can Mislead You
One honest caveat: a smaller, less polished team that produces uncomfortable answers isn't automatically a worse choice than a slick agency with all the right talking points. We've seen credentialed-looking shops fail and scrappy two-person teams ship beautifully. The signals above are signals, not a scoring rubric. Use them to ask better questions, not to instantly disqualify anyone who fumbles one of them. Sometimes the best developer for your situation is the one who said "I don't know yet, let me look at your data first" — which from the outside looks like Red Flag 3 inverted.
The Positive Signals to Look For
For balance, the signals that indicate a team is ready to ship production AI:
Specific production references. Live systems, client contacts you can call, actual metrics from deployed agents.
They push back on scope. They ask hard questions. They tell you some things aren't feasible as described. They recommend a narrower first scope.
They define success metrics before building. The proposal includes specific, measurable outcomes both parties agree to before work starts.
They raise security and compliance proactively. Without being prompted.
They talk about maintenance. They explain the agent will need ongoing monitoring, tuning, and knowledge base updates — and they have a plan for it.
They have a point of view. They recommend specific technologies for specific reasons. They explain trade-offs. They have opinions earned from experience.
We Welcome Scrutiny
This article is on our website because clients who ask hard questions tend to end up with better outcomes — including the ones who ask us those questions. And because we'd rather you go in with the list than discover it the painful way.
If you want to bring the questions from this piece to a conversation with us, we'd be happy to answer them specifically and put you in touch with clients you can speak to directly.
Talk to us about your project — no commitment, just a conversation.