Woyce

AI Development

How to Build an AI Chatbot with LangChain and OpenAI

A practical guide to building a production-ready AI chatbot using LangChain, OpenAI GPT-4, and Next.js — with RAG for grounding answers in your own data.

Woyce Technologies

AI & Engineering Team

Published Mar 15, 2026Reading minTopic AI Development

How to Build an AI Chatbot with LangChain and OpenAI — Woyce Technologies

Introduction

Building an AI chatbot that actually works for your business takes more than calling the OpenAI API. You need conversation memory, grounding in your own data, and a sensible way to handle edge cases. LangChain gives you the scaffolding for all of that — though you'll still write the parts that matter most yourself.

In this guide, we'll build a production-ready chatbot using LangChain, OpenAI GPT-4, and Next.js with Retrieval-Augmented Generation (RAG) so your bot answers from your business's actual documents — not just whatever the model picked up in training.

The distinction matters more than most people realise. A raw GPT-4 call will answer customer questions confidently and incorrectly if the answer isn't in its training data. A 12-person law firm that deployed a plain GPT-4 chatbot for client intake found it citing statutes that had been amended two years prior — correct-sounding, completely wrong. RAG solves this by pulling from documents you control and update. The model's job becomes synthesis, not recall.

What You'll Build

A Next.js API route that handles chat messages
A LangChain conversation chain with memory
A RAG pipeline using Pinecone as a vector store
A simple React chat UI

The architecture is deliberately minimal. A real deployment will need auth, rate limiting, logging, and error boundaries — but this gives you a working foundation you can extend without unpicking someone else's decisions.

Prerequisites

Node.js 18+
OpenAI API key
Pinecone account (free tier works)

You don't need prior LangChain experience, but you should be comfortable with async TypeScript and Next.js API routes. If you've never used a vector database, skim the Vector databases explained post first — it'll make the Pinecone step much clearer.

Step 1: Install Dependencies

npm install langchain @langchain/openai @langchain/pinecone @pinecone-database/pinecone

Pin your versions. LangChain's API surface changes frequently across minor versions. At the time of writing, langchain@0.3, @langchain/openai@0.3, and @pinecone-database/pinecone@3.x are stable together. Mixing patch versions from different release cycles is the single fastest way to lose an afternoon to TypeScript errors that should not exist.

Step 2: Set Up the RAG Pipeline

Developer working on AI chatbot pipeline with code on screen

Create src/lib/chatbot.ts to initialise your LangChain chain with Pinecone retrieval and conversation memory.

The chain uses ConversationalRetrievalQAChain, which retrieves the top 4 relevant document chunks from Pinecone, injects them into the prompt, and passes conversation history through BufferMemory.

Here is what is happening under the hood: when a user sends a message, LangChain first embeds that message using the same OpenAI embeddings model you used during ingestion. It then runs a similarity search against your Pinecone index and returns the four chunks with the highest cosine similarity. Those chunks get appended to the system prompt as context before the message hits GPT-4. The model sees both the retrieved context and the conversation history, which is why it can answer follow-up questions without losing the thread.

A few configuration choices worth making deliberately:

k: 4 — the number of retrieved chunks. Four is a reasonable default. Too few and you miss relevant context; too many and you burn tokens on noise, which degrades answer quality and raises cost. A 50-page product manual benefits from k=6. A tightly scoped FAQ chatbot often does better at k=3.

The embedding model. text-embedding-3-small is significantly cheaper than text-embedding-ada-002 and performs comparably on most retrieval tasks. Unless you have a specific reason to use ada-002, start with small.

The system prompt. This is where most projects go wrong. Do not skip it. Tell the model its role, what it should and should not answer, and how to handle questions that fall outside the retrieved context. A well-written system prompt cuts hallucinations more than any retrieval tuning.

Step 3: Create the API Route

Create src/app/api/chat/route.ts as a Next.js API route that accepts POST requests with a message field and returns the AI response.

Keep the route thin. Its job is to validate input, call the chain, and return a response — not to contain business logic. If you find yourself writing more than 60 lines here, something belongs in src/lib/chatbot.ts instead.

One pattern worth adding from the start: return the source documents alongside the answer. LangChain's ConversationalRetrievalQAChain gives you sourceDocuments on the response object. Surfacing these in your UI as citations lets users verify answers rather than simply trusting them — which matters particularly for legal, medical, or financial contexts where a wrong answer has real consequences.

Error handling is not optional. The OpenAI API returns HTTP 429 when you hit rate limits, and Pinecone times out under load. Handle both explicitly. A generic 500 response with no context makes debugging much harder at 2am when something breaks in production.

Step 4: Ingest Your Documents

Before users can chat, embed your documents into Pinecone: use RecursiveCharacterTextSplitter to chunk the text, OpenAIEmbeddings to create vectors, and PineconeStore.fromDocuments to store them. Tune the chunk size to your content — there's no universal right answer, and we've ended up adjusting it on almost every project.

What that adjustment looks like in practice: a property management company we worked with had lease agreements that ran 40–60 pages each. Their first pass used a 1000-token chunk size, which was splitting mid-clause. Retrieval was returning partial sentences, and the model was filling gaps with plausible but incorrect lease terms. Dropping to 512 tokens with a 50-token overlap fixed the clause fragmentation completely. A software documentation project ran in the opposite direction — short API reference entries were getting split across chunks, losing the pairing between a function signature and its explanation. Larger chunks (1500 tokens) and higher overlap (200 tokens) solved it.

The ingestion script should be idempotent. If you run it twice, you do not want duplicate vectors. Pinecone supports namespacing and upsert by ID — use both. Assign a stable ID to each document chunk (hash the source path plus chunk index, for example) so re-running the ingestion script updates existing vectors rather than adding duplicates.

Store metadata with every vector: source filename, page number, last modified date. You will want to filter by these during retrieval, and you cannot add metadata retroactively without re-ingesting.

Approach	Off-the-shelf chatbot (e.g. Intercom AI)	Custom LangChain + RAG build
Setup time	Hours	2–6 weeks depending on data complexity
Data grounding	Limited to public knowledge or simple FAQ upload	Full control — any document format, any update cadence
Answer accuracy on proprietary content	Low to moderate	High when tuned correctly
Conversation memory	Session-only, vendor-managed	Configurable — Redis, DB, or in-memory
Monthly cost at scale	$300–$2,000+ (per-seat or usage pricing)	OpenAI API + Pinecone (~$50–$400 depending on volume)
Customisation	Theme and copy only	Full control over prompts, retrieval logic, UI
Maintenance burden	Vendor-managed	Internal or outsourced

What to Expect in Practice

Team reviewing AI chatbot output and metrics on laptop

The first working version typically takes one to two days to get running locally. Getting it production-ready is a different scope.

A mid-size e-commerce brand with 8,000 SKUs and 5 years of customer support transcripts took three weeks to go from proof-of-concept to production. The technical build was the smaller part. Most of the time went into deciding which documents to ingest (and which to exclude — outdated return policies created more confusion than no policy at all), writing the system prompt through iteration, and setting up logging so the team could monitor what the bot was getting wrong.

A recruitment firm used a similar stack to build an internal chatbot over their candidate database and job specs. Their main challenge was retrieval relevance — a query for "senior engineer with payments experience" was matching on "engineer" and "experience" but missing the "payments" context. Adding metadata filters on industry tags at the Pinecone query stage fixed the accuracy meaningfully.

Response latency is worth thinking about early. A RAG pipeline with GPT-4 typically takes 2–5 seconds end-to-end: embedding the query, running the vector search, and generating the response. For a support chatbot this is acceptable. For an internal tool used dozens of times per hour by staff, it adds up. Streaming the response (OpenAI supports SSE) makes the perceived latency much shorter even when the total time is the same.

Common Mistakes

Using BufferMemory in production without a persistence layer. This is the most common oversight. BufferMemory holds conversation history in the Node.js process. When the process restarts — every deployment, every server event — the history is gone. Users mid-conversation lose all context. Use Redis or a database-backed memory store from the start.

Ingesting everything without curation. More data is not always better. Ingesting five years of internal Slack messages alongside product documentation creates retrieval noise. A chatbot asked about refund policy should not be retrieving a three-year-old thread about office snacks. Be deliberate about what goes in.

Forgetting to update the index. Documents change. If your Pinecone index is a snapshot from six months ago, your chatbot is answering from stale data. Build the ingestion pipeline as a scheduled job, not a one-time script.

Not testing adversarial queries. Ask your chatbot questions it should not answer. Questions outside its domain, leading questions, attempts to get it to make commitments it should not make. Find these before your users do.

Skipping structured logging. You will not know what's failing without logs. At minimum, log the user message, the retrieved chunk IDs, and the final response. This data is also how you improve the system over time — it's the only way to know whether your retrieval is working.

Key Takeaways

RAG grounds your chatbot in real business data, which is the single biggest defence against hallucinated answers
BufferMemory gives the chatbot conversation history within a session — fine for dev, not enough for production
For production, store conversation history in Redis or a database rather than in-memory, or you'll lose context every time the process restarts
Woyce Technologies builds custom LangChain chatbots — contact us if you'd like to talk through what's right for your use case

Frequently Asked Questions

How much does it cost to run a LangChain chatbot in production?

At moderate volume (roughly 10,000 queries per month), expect $30–$120/month in OpenAI API costs depending on message length and GPT-4 vs GPT-4o. Pinecone's free tier handles up to 1 million vectors, which is sufficient for most small business document sets. The biggest cost variable is how many tokens your retrieved chunks add to each prompt — optimising chunk size and k directly reduces spend.

Do I need a vector database, or can I use a simpler approach?

For fewer than 50 short documents, you can get away with loading them all into the prompt at query time — it's simpler and has no infrastructure to manage. Beyond that, retrieval becomes necessary both for accuracy (the context window has limits) and cost (sending every document with every message gets expensive fast). Pinecone is the easiest managed option; pgvector is worth considering if you already run PostgreSQL and want to keep your stack simple.

How long does it take to build a production-ready AI chatbot with LangChain?

A focused team with prior LangChain experience typically needs 3–6 weeks for a production deployment: 1 week for the core build and ingestion pipeline, 1–2 weeks for prompt tuning and retrieval accuracy work, and 1–2 weeks for productionising (auth, logging, error handling, deployment). If your data requires significant cleaning or your use case has compliance requirements, add time.

Can this chatbot handle multiple languages?

GPT-4 handles multilingual input and output well out of the box. The retrieval step is the limiting factor — if your Pinecone index contains only English documents, a query in French will still find relevant chunks (OpenAI embeddings are multilingual), but the retrieved context will be in English, which affects answer quality. For genuinely multilingual deployments, store documents in each target language separately and route queries to the matching namespace.

How do I prevent the chatbot from making things up?

Three things work together: a tight system prompt that instructs the model to say "I don't know" when the retrieved context doesn't cover the question, returning source documents alongside answers so users can verify, and logging responses so you can identify and fix patterns of hallucination over time. RAG reduces hallucination significantly compared to a bare API call, but it does not eliminate it — the model can still interpolate beyond what the retrieved chunks say.

Is LangChain the right choice, or should I use the OpenAI Assistants API instead?

The OpenAI Assistants API is simpler to get started with and handles file retrieval natively. LangChain gives you more control: over the retrieval logic, the vector store, the memory implementation, and how prompts are constructed. If you need to integrate multiple data sources, run custom retrieval logic, or avoid vendor lock-in on your document storage, LangChain is the better foundation. If you want something working in a day and your use case is straightforward, the Assistants API is worth evaluating first. See our detailed comparison for a side-by-side breakdown.

What happens if the OpenAI API goes down?

Without a fallback strategy, your chatbot goes down with it. For production deployments, implement a graceful degradation path: a clear error message to users, a fallback to a simpler keyword-search over your documents, or routing to a human. OpenAI's API has a published SLA and status page — monitoring it and alerting on degraded performance is worth setting up from day one rather than finding out from users.

build AI chatbotLangChain tutorialOpenAI chatbotGPT-4 chatbotRAG chatbotLangChain Next.js

Woyce Technologies

AI & Engineering Team · Woyce

Woyce Technologies builds AI chatbots, LLM integrations, voice AI, and full-stack web applications for businesses in the US, UK, Europe & APAC. Based in Rajkot, Gujarat.

READY TO BUILD?

Let's build something
that actually works.

Tell us about your project. We'll be honest about whether we're the right fit — and if we are, we move fast.

Talk to us about your business →Explore our AI services

AI Development

How to Build an AI Chatbot with LangChain and OpenAI

A practical guide to building a production-ready AI chatbot using LangChain, OpenAI GPT-4, and Next.js — with RAG for grounding answers in your own data.

Woyce Technologies

AI & Engineering Team

Published Mar 15, 2026Reading minTopic AI Development

Introduction

What You'll Build

A Next.js API route that handles chat messages
A LangChain conversation chain with memory
A RAG pipeline using Pinecone as a vector store
A simple React chat UI

Prerequisites

Node.js 18+
OpenAI API key
Pinecone account (free tier works)

Step 1: Install Dependencies

npm install langchain @langchain/openai @langchain/pinecone @pinecone-database/pinecone

Step 2: Set Up the RAG Pipeline

Developer working on AI chatbot pipeline with code on screen

Create src/lib/chatbot.ts to initialise your LangChain chain with Pinecone retrieval and conversation memory.

A few configuration choices worth making deliberately:

Step 3: Create the API Route

Create src/app/api/chat/route.ts as a Next.js API route that accepts POST requests with a message field and returns the AI response.

Step 4: Ingest Your Documents

Store metadata with every vector: source filename, page number, last modified date. You will want to filter by these during retrieval, and you cannot add metadata retroactively without re-ingesting.

Approach	Off-the-shelf chatbot (e.g. Intercom AI)	Custom LangChain + RAG build
Setup time	Hours	2–6 weeks depending on data complexity
Data grounding	Limited to public knowledge or simple FAQ upload	Full control — any document format, any update cadence
Answer accuracy on proprietary content	Low to moderate	High when tuned correctly
Conversation memory	Session-only, vendor-managed	Configurable — Redis, DB, or in-memory
Monthly cost at scale	$300–$2,000+ (per-seat or usage pricing)	OpenAI API + Pinecone (~$50–$400 depending on volume)
Customisation	Theme and copy only	Full control over prompts, retrieval logic, UI
Maintenance burden	Vendor-managed	Internal or outsourced

What to Expect in Practice

Team reviewing AI chatbot output and metrics on laptop

The first working version typically takes one to two days to get running locally. Getting it production-ready is a different scope.

Common Mistakes

Key Takeaways

RAG grounds your chatbot in real business data, which is the single biggest defence against hallucinated answers
BufferMemory gives the chatbot conversation history within a session — fine for dev, not enough for production
For production, store conversation history in Redis or a database rather than in-memory, or you'll lose context every time the process restarts
Woyce Technologies builds custom LangChain chatbots — contact us if you'd like to talk through what's right for your use case

Frequently Asked Questions

How much does it cost to run a LangChain chatbot in production?

Do I need a vector database, or can I use a simpler approach?

How long does it take to build a production-ready AI chatbot with LangChain?

Can this chatbot handle multiple languages?

How do I prevent the chatbot from making things up?

Is LangChain the right choice, or should I use the OpenAI Assistants API instead?

What happens if the OpenAI API goes down?

build AI chatbotLangChain tutorialOpenAI chatbotGPT-4 chatbotRAG chatbotLangChain Next.js

Woyce Technologies

AI & Engineering Team · Woyce

Woyce Technologies builds AI chatbots, LLM integrations, voice AI, and full-stack web applications for businesses in the US, UK, Europe & APAC. Based in Rajkot, Gujarat.

READY TO BUILD?

Let's build something
that actually works.

Tell us about your project. We'll be honest about whether we're the right fit — and if we are, we move fast.

Talk to us about your business →Explore our AI services

How to Build an AI Chatbot with LangChain and OpenAI

Introduction

What You'll Build

Prerequisites

Step 1: Install Dependencies

Step 2: Set Up the RAG Pipeline

Step 3: Create the API Route

Step 4: Ingest Your Documents

What to Expect in Practice

Common Mistakes

Key Takeaways

Related guides

Frequently Asked Questions

How much does it cost to run a LangChain chatbot in production?

Do I need a vector database, or can I use a simpler approach?

How long does it take to build a production-ready AI chatbot with LangChain?

Can this chatbot handle multiple languages?

How do I prevent the chatbot from making things up?

Is LangChain the right choice, or should I use the OpenAI Assistants API instead?

What happens if the OpenAI API goes down?

More from theWoyce engineering desk.

Top 7 AI Agent Development Companies in 2026

Hire a Freelance AI & Chatbot Developer in India (2026 Guide)

Freelance AI Developer in Rajkot: Chatbots, Agents & LLM Integration

Let's build somethingthat actually works.

How to Build an AI Chatbot with LangChain and OpenAI

Introduction

What You'll Build

Prerequisites

Step 1: Install Dependencies

Step 2: Set Up the RAG Pipeline

Step 3: Create the API Route

Step 4: Ingest Your Documents

What to Expect in Practice

Common Mistakes

Key Takeaways

Related guides

Frequently Asked Questions

How much does it cost to run a LangChain chatbot in production?

Do I need a vector database, or can I use a simpler approach?

How long does it take to build a production-ready AI chatbot with LangChain?

Can this chatbot handle multiple languages?

How do I prevent the chatbot from making things up?

Is LangChain the right choice, or should I use the OpenAI Assistants API instead?

What happens if the OpenAI API goes down?

More from theWoyce engineering desk.

Top 7 AI Agent Development Companies in 2026

Hire a Freelance AI & Chatbot Developer in India (2026 Guide)

Freelance AI Developer in Rajkot: Chatbots, Agents & LLM Integration

Let's build somethingthat actually works.

More from the
Woyce engineering desk.

Let's build something
that actually works.

More from the
Woyce engineering desk.

Let's build something
that actually works.