- n8n v1.121 or later, self-hosted (Docker on a Hetzner CX22 at ₹740/month is fine)
- Twilio account with one Indian DID purchased (~₹110 setup + ₹55/month rental)
- OpenAI API key with Whisper access (no separate billing — same key as GPT)
- Anthropic API key with Claude Haiku 4.5 enabled
- Slack workspace with Bot Token scopes: chat:write, channels:read
- An SMTP credential (we use Zoho Mail, ₹190/user/month) or Gmail OAuth2 in n8n
- A public HTTPS URL for your n8n instance — Twilio will not call HTTP webhooks
/voice/incoming (handles the initial call) and /voice/recording (handles the recording-complete callback).
Here is the actual Webhook node JSON for the incoming path:
{
"parameters": {
"httpMethod": "POST",
"path": "voice/incoming",
"responseMode": "responseNode",
"options": {
"rawBody": false
}
},
"id": "a1b2c3d4-incoming",
"name": "Twilio Incoming Webhook",
"type": "n8n-nodes-base.webhook",
"typeVersion": 2,
"position": [240, 300]
}recordingStatusCallback URL points at your second webhook path.
{
"parameters": {
"respondWith": "text",
"responseBody": "Welcome to Softechinfra. Please describe your reason for calling after the beep. Press hash when done. ",
"options": {
"responseCode": 200,
"responseHeaders": {
"entries": [
{ "name": "Content-Type", "value": "application/xml" }
]
}
}
},
"id": "b2c3d4e5-twiml",
"name": "Return TwiML",
"type": "n8n-nodes-base.respondToWebhook",
"typeVersion": 1,
"position": [460, 300]
}transcribe="false") because Whisper handles Hindi/English code-switching far better — Twilio's transcription drops to about 60% accuracy on a Pune accent based on our 200-call sample.
Verification: after deploying, configure the DID's "A Call Comes In" webhook to https://n8n.softechinfra.com/webhook/voice/incoming. Test by calling the number from your phone. You should hear the greeting. If Twilio's call log shows "11200 — HTTP retrieval failure," your n8n instance is not publicly reachable on HTTPS.
## Block 2 — Whisper transcription
When the recording is ready, Twilio POSTs to your second webhook with RecordingUrl in the body. The URL is HTTP — you must rewrite it to HTTPS by appending .wav and adding Basic Auth from your Twilio account SID and auth token.
Three nodes here: Webhook → Set (build the audio URL + auth header) → HTTP Request (POST to Whisper).
The HTTP Request to Whisper:
{
"parameters": {
"method": "POST",
"url": "https://api.openai.com/v1/audio/transcriptions",
"authentication": "predefinedCredentialType",
"nodeCredentialType": "openAiApi",
"sendBody": true,
"contentType": "multipart-form-data",
"bodyParameters": {
"parameters": [
{
"parameterType": "formBinaryData",
"name": "file",
"inputDataFieldName": "data"
},
{ "name": "model", "value": "whisper-1" },
{ "name": "language", "value": "en" },
{ "name": "response_format", "value": "json" },
{ "name": "temperature", "value": "0" }
]
},
"options": { "timeout": 30000 }
},
"id": "c3d4e5f6-whisper",
"name": "Whisper Transcribe",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [900, 300]
}language: "en" even for Hindi-English code-switched calls. Whisper-1 trained on multilingual data; forcing English actually produces cleaner Hinglish transliterations that the Claude classifier handles better than romanised Devanagari.
Gotcha we hit twice: Twilio's recording URL returns a 302 redirect to S3. n8n's HTTP Request node v4.2 follows redirects by default, but the binary download flag must be on or you get the redirect HTML, not the audio. Set "Response → Response Format" to "File" and the field name to data.
## Block 3 — Claude Haiku classifier
This is the node that earns its keep. We feed Claude a strict JSON schema, the call transcript, and the routing context. Output is a clean object with intent, confidence, summary, and the destination Slack channel.
The HTTP Request node hitting Anthropic:
{
"parameters": {
"method": "POST",
"url": "https://api.anthropic.com/v1/messages",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{ "name": "x-api-key", "value": "={{ $credentials.anthropicApi.apiKey }}" },
{ "name": "anthropic-version", "value": "2023-06-01" },
{ "name": "content-type", "value": "application/json" }
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={\n \"model\": \"claude-haiku-4-5-20251115\",\n \"max_tokens\": 400,\n \"system\": \"You are a call-routing classifier for Softechinfra, an Indian IT services firm. Read the inbound caller transcript and return strict JSON only — no preamble. Schema: { intent: 'sales' | 'support' | 'accounts' | 'careers' | 'spam', confidence: number 0-1, summary: string max 140 chars, suggested_owner_email: string, urgency: 'now' | 'today' | 'this_week' }. Spam includes telemarketing, IVR test calls, and abusive language. If the transcript is silent or under 6 words, return intent='spam' with confidence 0.95.\",\n \"messages\": [\n {\"role\": \"user\", \"content\": \"Transcript: {{ $json.text }}\"}\n ]\n}",
"options": { "timeout": 20000 }
},
"id": "d4e5f6g7-claude",
"name": "Claude Classify",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [1140, 300]
}claude-haiku-4-5-20251115 shipped in November 2025; the native node only got the constant in Feb. For production we hit the API directly so we are not blocked on n8n releases.
Cost math, per call: about 600 input tokens (system + transcript), 80 output tokens. At Haiku 4.5 rates that is (600 × $0.80 + 80 × $4) / 1M = $0.0008, or ₹0.07. Round to ₹0.08.
A Code node parses the JSON safely and pushes the result downstream:
// Parse Claude's response
const raw = $input.first().json.content[0].text;
let parsed;
try {
parsed = JSON.parse(raw);
} catch (e) {
// Fallback: dump to manual triage
parsed = {
intent: 'support',
confidence: 0,
summary: 'CLASSIFIER FAILED — review manually: ' + raw.slice(0, 100),
suggested_owner_email: 'contact@softechinfra.com',
urgency: 'today'
};
}
return { json: { ...parsed, transcript: $('Whisper Transcribe').first().json.text } };{{ $json.intent }} routes to four Slack channels: #sales-inbound, #support-tickets, #accounts-inbound, #spam-quarantine. Each branch has a Slack node with a formatted message block.
The Slack message uses Block Kit so the rep sees structured info, not a paragraph:
Inbound call routed: {{ $json.intent }} ({{ $json.confidence }})
{{ $json.summary }}
From: {{ $('Twilio Incoming Webhook').first().json.From }}
Urgency: {{ $json.urgency }} | Suggested owner: {{ $json.suggested_owner_email }}
> _Transcript:_ {{ $json.transcript }}suggested_owner_email Claude returned, with the transcript in the body and a Twilio recording link for evidence. We do not trust Claude's email choice blindly — a separate Set node maps intent to a hard-coded allowlist of real reps' emails, so even if Claude hallucinates an email, the actual destination is safe.
## Cost comparison — n8n vs the alternatives
For 800 calls/month (typical 6-person Indian SMB):
The self-hosted n8n number includes the Hetzner CX22 (~₹740) plus the per-call AI + telephony spend (₹1.45 × 800 = ₹1,160). Make.com pricing pulled from their May 2026 site; Zapier from theirs. n8n Cloud Starter would hit its 2,500-execution limit if every call fires 4+ nodes — we usually recommend Pro (~₹4,200) if a client refuses to self-host.
## When NOT to do this — three real cases we declined
Case 1 — A clinic that takes appointments over the phone. They wanted the AI to book the slot, not just route. That is a different workflow with Cal.com or Google Calendar integration plus DTMF input. We quoted it; they wanted ₹15k and a 3-day build. Real cost is closer to ₹65k because you need fallback for missed slots, slot-conflict handling, and an SMS confirmation loop.
Case 2 — A Surat textile exporter. 80% of their calls are in Gujarati. Whisper-1's Gujarati WER is ~25% at conversational speed. We told them to wait for Whisper-3 or use a specialised Indic-ASR like Sarvam. Better to be honest than ship a 60%-accuracy router.
Case 3 — Any business under 50 calls a month. A receptionist app on a phone (₹0/month, costs your sister's time) beats the AI workflow. The break-even is around 200 calls / month at our blended billing.
EXECUTIONS_DATA_MAX_AGE=720 (30 days) and you are clean.
## FAQ
### How long does the n8n receptionist take to build?
For a competent n8n user with all credentials ready, the 47-node workflow takes about 6 working hours including the Twilio TwiML loop testing. We allow 7 working days end-to-end for client builds because Twilio India DID verification takes 2–3 business days, and TRAI OSP disclosure copy review eats another day.
### Can it transfer the live call to a human?
Yes, replace the Record verb with a Dial verb in the TwiML. You lose the AI routing because the AI runs after recording finishes, but you can run a hybrid: Claude classifies the first 8 seconds, then conditionally dials the right rep. We have built it twice; it adds about 12 nodes and ₹0.18 per call.
### Does Whisper handle Hindi-English mixed calls?
Yes. Whisper-1 is multilingual and we force language: "en" so it transliterates Hindi tokens phonetically (e.g., "delivery kab aayegi" stays readable). For pure Hindi calls, switch to language: "hi". Code-switching at conversational speed: about 88% word-accuracy in our test set, good enough for intent classification.
### What if Claude returns invalid JSON?
Three protections. First, we instruct it to return strict JSON only in the system prompt. Second, the Code node wraps JSON.parse in a try/catch and routes to a manual-triage channel on failure. Third, we run a daily check on the failure channel and prompt-tune monthly. Real failure rate in production: ~0.4%.
### Is this DPDP Act compliant?
It can be. You need three things: a disclosure jingle at call start, automated transcript retention deletion (we use 30 days), and a data-processing agreement with each subprocessor (Twilio, OpenAI, Anthropic). All three publish DPAs. Your DPO should review the chain before go-live.
### Can I run this on n8n Cloud instead of self-hosting?
Yes — for under ~1,500 calls/month, n8n Cloud Pro (~₹4,200/mo) makes sense because you skip the ops overhead. Above that, self-hosting on Hetzner saves ₹2k–₹5k/month and gives you better latency to the Twilio US edge. Most of our clients self-host.
### What is the alternative if I do not want OpenAI?
Replace Whisper with Deepgram Nova-2 (₹0.36/min, faster) or Sarvam Saaras (~₹0.42/min, much better Indic). Replace Claude Haiku with Mistral Small 24B (₹0.05/call, self-hosted) or Gemini Flash 2.0 (₹0.06/call, faster). The n8n graph stays the same — only the HTTP Request bodies change.
Want this exact n8n receptionist built and deployed for your office?
We ship the full 47-node workflow above, configured to your Twilio DID and Slack workspace, in 7 working days. Typical cost: ₹38,000–₹62,000 depending on call volume and language mix. Suitable if you take 200+ calls a month and lose business to missed routing. No slides — just your problem and our honest take on whether we can help.
Book a 20-min Call
