How Voice AI Works
When a user calls your business number, Twilio receives the call and streams the audio. Amazon Lex handles automatic speech recognition (ASR) to convert speech to text, then natural language understanding (NLU) to pull out intent. Your backend handles the logic, and Amazon Polly turns the response back into speech.
The flow: caller speaks → Twilio captures audio → Lex ASR + NLU → your logic → Polly TTS → Twilio plays the audio back.
Setting Up Twilio Voice with Amazon Lex
Step 1: Create a Lex Bot
In the AWS Console, create an Amazon Lex bot with intents for your use case — BookAppointment, CheckStatus, and a Fallback intent for anything unrecognised. The Fallback isn't optional; it's what catches the long tail of real calls.
Step 2: Create a Twilio Phone Number
In your Twilio console, buy a number and point the voice webhook to your Next.js API route at your domain.
Step 3: Handle Inbound Calls
Create a Next.js API route that returns TwiML. Use twilio.twiml.VoiceResponse to greet callers with an Amazon Polly voice and connect to your Lex websocket endpoint.
Step 4: Handle Lex Fulfillment
When Lex recognises an intent with all required slots filled, it calls your fulfillment Lambda. The Lambda books the appointment, checks order status, or whatever the business logic is, then returns a spoken response.
Production Considerations
- Use Amazon Polly Neural voices (Joanna, Matthew) for noticeably more natural-sounding TTS — the standard voices give the call away as a bot immediately
- Implement DTMF fallback — let users press numbers if speech recognition fails, especially in noisy environments
- Log every conversation for quality review; you'll want this the first time something goes wrong
- Add a human escalation intent that transfers to a live agent via Twilio TaskRouter
A note on what we've watched go wrong: skipping the fallback and DTMF paths because the demo worked. The demo always works. Real callers have accents, background noise, and bad signal — and an agent that can't gracefully escape to a human or a keypad will end the call abruptly enough to lose the customer.
Talk to us if you want help shipping a Twilio + Amazon Lex voice AI system, or if you just want a second opinion on an architecture you've already drafted.