Two Real Options, Different Trade-Offs
When a developer or technical founder decides to build an AI agent, there's a genuine architectural decision waiting early on: use the OpenAI Assistants API and its built-in capabilities, or build a custom agent architecture on the raw Chat Completions API with your own orchestration.
Both approaches produce real, production AI agents. The difference is in what you control, what you depend on, and what it takes to keep alive after launch. We've built both, and the recommendation depends on the situation more than either option's fans would tell you.
What the OpenAI Assistants API Gives You
The Assistants API is OpenAI's framework for building AI agents with persistent memory, tool use, and file handling. It manages several things that would otherwise require custom engineering:
Threads. Persistent conversation history stored by OpenAI. You don't manage conversation context yourself — you add messages to a thread and the API maintains the history.
File Search. Upload files (PDFs, documents, spreadsheets) and the Assistant can search them to answer questions. OpenAI handles chunking, embedding, and vector search for you.
Code Interpreter. A sandboxed Python environment the Assistant can use to execute code, process data, and generate files. Useful for analytical applications.
Function Calling. Define tools (functions) the Assistant can call, and OpenAI's orchestration handles the tool-calling loop — recognising when to call a tool, parsing arguments, and continuing after the tool result.
Built-in Runs management. The API manages the execution loop (runs) for you — polling for completion, handling tool calls, managing state transitions.
When the Assistants API makes sense:
- Rapid prototyping. You can have a working agent with memory and file search in hours rather than days.
- Small teams or solo developers. Less infrastructure to manage means less surface area to mess up.
- Applications where OpenAI's tool implementations are sufficient. If you need file search over a moderate document set and some built-in code execution, the Assistants API gets you there without custom development.
- Projects where OpenAI lock-in is acceptable. You're comfortable building on OpenAI's proprietary API with its associated pricing and terms.
What Custom Agent Architecture Gives You
A custom agent uses the Chat Completions API directly, with your own orchestration layer managing conversation state, tool calls, and memory. Usually built with LangChain, LlamaIndex, or custom code.
What you control:
Model choice. You can use any model — OpenAI, Anthropic Claude, Google Gemini, open-source models via Ollama or Together AI. You're not locked to OpenAI.
Memory architecture. You decide how conversation history is stored, compressed, and retrieved. Redis, PostgreSQL, vector databases, or in-memory — based on your requirements, not someone else's defaults.
Retrieval strategy. Full control over chunking, embedding models, vector databases, hybrid search, re-ranking, and query decomposition. You can optimise for retrieval quality instead of accepting whatever OpenAI ships.
Orchestration logic. Complex agent behaviours — multi-agent coordination, conditional routing, parallel tool calls, custom retry logic — need custom orchestration that the Assistants API can't accommodate.
Cost control. Custom architectures let you optimise token usage, use cheaper models for specific steps, and implement caching strategies that reduce cost at scale. The difference shows up on the bill.
Data residency. With a custom architecture, you can self-host models or choose providers with specific data residency guarantees. Conversation data can stay within your infrastructure.
When custom architecture makes sense:
- Production systems at scale. Cost optimisation, caching, and performance tuning that the Assistants API doesn't support.
- Multi-model or multi-provider requirements. Different models for different tasks.
- Complex agent orchestration. Multi-agent systems, complex conditional logic, or behaviours that go beyond the Assistants API's run management.
- High retrieval quality requirements. The quality of document retrieval is a primary product differentiator and you need control over every part of the pipeline.
- Data sovereignty. Conversation data can't leave a specific jurisdiction or infrastructure.
- Regulated environments. Financial services, healthcare, legal — where you need full auditability and control over data processing.
The Honest Trade-Offs
| Factor | Assistants API | Custom Architecture |
|---|---|---|
| Time to first working prototype | Hours | Days to weeks |
| Control over retrieval quality | Low | High |
| Model flexibility | OpenAI only | Any model |
| Cost at scale | Higher (OpenAI pricing) | Lower (optimisable) |
| Maintenance overhead | Low | Higher |
| Data residency control | Limited | Full |
| Complex orchestration | Limited | Full |
| Debugging transparency | Limited | Full |
| Vendor lock-in | High | Low |
The Assistants API Limitations Worth Knowing
Rate limits and latency. Assistants API runs are asynchronous — you submit a run and poll for completion. That adds latency compared to a direct Chat Completions call. Under load, run queue times can be unpredictable, and we've felt this in client projects.
File search quality ceiling. Built-in file search works well for simple document Q&A. For production RAG where retrieval quality is the whole game — large document sets, complex queries, domain-specific content — custom retrieval consistently outperforms.
Pricing at scale. Assistants API includes costs for storage and processing that compound at high volume. Custom architectures can be meaningfully cheaper when volume is high and someone has done the cost optimisation work.
Debugging difficulty. When an Assistant behaves unexpectedly, diagnosing the cause means working through OpenAI's tooling rather than your own logs. Custom architectures give you full visibility into every step, which matters more than it sounds until you're trying to chase a weird bug at 11pm.
Dependency risk. The Assistants API has changed significantly since launch. Building a production system on a proprietary, evolving API creates dependency risk that custom architectures avoid.
Our Recommendation
Use the Assistants API for: Prototypes, MVPs, internal tools with moderate requirements, and applications where getting to "working" quickly outweighs the need for optimisation and control.
Use custom architecture for: Production customer-facing agents, any application with meaningful scale, systems requiring high retrieval quality, regulated environments, and multi-agent orchestration.
The hybrid approach: Many teams prototype with the Assistants API, validate the use case, then migrate to custom when production requirements become clearer. This is a reasonable path — just plan the migration before you're under pressure to execute it. We've inherited "we'll migrate later" projects where "later" arrived in the form of a billing surprise.
What We Build With
We build custom agent architectures for production deployments. We use the Assistants API for rapid prototyping and proof-of-concept work. The decision is made explicitly at the start of every project, with the trade-offs written down — not handed off as a "choice we'll figure out later."
Talk to us about your agent — we'll tell you which approach fits your specific requirements and why, including the cases where the Assistants API is genuinely the right call.