Build an n8n Receptionist: Auto-Route Inbound Calls in Under 100 Nodes

Twilio charges ₹0.55 per minute for an Indian DID and ₹0.45/minute for inbound. Whisper transcribes a 60-second call for ₹0.30. Claude Haiku 4.5 classifies the intent for ₹0.08. The whole "AI receptionist" we ship for Indian SMBs runs at ₹1.45 per call end-to-end, including a Slack ping and an email fallback. We built it on n8n v1.121 self-hosted, 47 nodes total. Here is the exact workflow, the node JSON, and the one place every team gets it wrong.

n8n nodes (incl. error branches)

₹1.45

All-in cost per 60-sec call

3.8s

Average time to Slack ping

92%

Routing accuracy (1,400 call sample)

## TL;DR — What does this n8n receptionist actually do? It answers an inbound phone call, plays a 5-second greeting, records the caller's reason for calling, transcribes it with Whisper, runs Claude Haiku to classify the intent into one of your routing buckets (sales / support / accounts / spam), then drops a structured ping into the right Slack channel and emails the on-call rep. Total wall-clock time from call hang-up to Slack ping: under 4 seconds. ## Why this matters now — March 2026 Two things changed in the last 90 days. Twilio rolled out Programmable Voice for India under TRAI's new OSP framework (March 2026), so you can finally hold an Indian DID without a separate VNO licence. And Anthropic dropped Haiku 4.5 to $0.80 / $4 per million tokens — at that price an intent classification costs less than a paise. The economic case to replace a ₹22,000/month receptionist with an n8n workflow is now obvious; you only need 12 working hours a month of accurate routing to break even. A live r/n8n thread from April 2026 ([reddit.com/r/n8n — "voice agent on Twilio actually working in production"](https://www.reddit.com/r/n8n/)) walks through the same three gotchas we hit. Worth skimming before you start. ## The 4-block workflow

📞

Block 1: Twilio webhook + TwiML

Inbound call hits n8n. TwiML response plays greeting, records the caller, posts the recording URL back when done.

🎙️

Block 2: Whisper transcription

Download the .wav from Twilio, ship to OpenAI Whisper-1. Hindi + English code-switching handled in one pass.

🧠

Block 3: Claude Haiku classifier

Strict JSON output. Five routing buckets, a confidence score, and a one-line summary the rep actually reads.

💬

Block 4: Slack + email fan-out

Switch node on bucket. Slack channel post + email to on-call rep. Spam goes to a quarantine channel for review.

## Prerequisites — what you need before you start

n8n v1.121 or later, self-hosted (Docker on a Hetzner CX22 at ₹740/month is fine)
Twilio account with one Indian DID purchased (~₹110 setup + ₹55/month rental)
OpenAI API key with Whisper access (no separate billing — same key as GPT)
Anthropic API key with Claude Haiku 4.5 enabled
Slack workspace with Bot Token scopes: chat:write, channels:read
An SMTP credential (we use Zoho Mail, ₹190/user/month) or Gmail OAuth2 in n8n
A public HTTPS URL for your n8n instance — Twilio will not call HTTP webhooks

## Block 1 — Twilio webhook + TwiML When Twilio receives an inbound call on your DID, it makes an HTTP POST to a URL you configure. That URL is your n8n Webhook node. Twilio expects a TwiML response within 15 seconds, otherwise it drops the call. So the first node returns TwiML immediately, then a second branch handles the recording when it arrives. Set up two webhook paths on the same n8n workflow: /voice/incoming (handles the initial call) and /voice/recording (handles the recording-complete callback). Here is the actual Webhook node JSON for the incoming path:

json

{
    "parameters": {
      "httpMethod": "POST",
      "path": "voice/incoming",
      "responseMode": "responseNode",
      "options": {
        "rawBody": false
      }
    },
    "id": "a1b2c3d4-incoming",
    "name": "Twilio Incoming Webhook",
    "type": "n8n-nodes-base.webhook",
    "typeVersion": 2,
    "position": [240, 300]
  }

Right after it, a Respond to Webhook node returns the TwiML XML. The XML tells Twilio to play a greeting and start recording. The recordingStatusCallback URL points at your second webhook path.

json

{
    "parameters": {
      "respondWith": "text",
      "responseBody": "Welcome to Softechinfra. Please describe your reason for calling after the beep. Press hash when done.",
      "options": {
        "responseCode": 200,
        "responseHeaders": {
          "entries": [
            { "name": "Content-Type", "value": "application/xml" }
          ]
        }
      }
    },
    "id": "b2c3d4e5-twiml",
    "name": "Return TwiML",
    "type": "n8n-nodes-base.respondToWebhook",
    "typeVersion": 1,
    "position": [460, 300]
  }

We turn off Twilio's built-in transcription (transcribe="false") because Whisper handles Hindi/English code-switching far better — Twilio's transcription drops to about 60% accuracy on a Pune accent based on our 200-call sample. Verification: after deploying, configure the DID's "A Call Comes In" webhook to https://n8n.softechinfra.com/webhook/voice/incoming. Test by calling the number from your phone. You should hear the greeting. If Twilio's call log shows "11200 — HTTP retrieval failure," your n8n instance is not publicly reachable on HTTPS. ## Block 2 — Whisper transcription When the recording is ready, Twilio POSTs to your second webhook with RecordingUrl in the body. The URL is HTTP — you must rewrite it to HTTPS by appending .wav and adding Basic Auth from your Twilio account SID and auth token. Three nodes here: Webhook → Set (build the audio URL + auth header) → HTTP Request (POST to Whisper). The HTTP Request to Whisper:

json

{
    "parameters": {
      "method": "POST",
      "url": "https://api.openai.com/v1/audio/transcriptions",
      "authentication": "predefinedCredentialType",
      "nodeCredentialType": "openAiApi",
      "sendBody": true,
      "contentType": "multipart-form-data",
      "bodyParameters": {
        "parameters": [
          {
            "parameterType": "formBinaryData",
            "name": "file",
            "inputDataFieldName": "data"
          },
          { "name": "model", "value": "whisper-1" },
          { "name": "language", "value": "en" },
          { "name": "response_format", "value": "json" },
          { "name": "temperature", "value": "0" }
        ]
      },
      "options": { "timeout": 30000 }
    },
    "id": "c3d4e5f6-whisper",
    "name": "Whisper Transcribe",
    "type": "n8n-nodes-base.httpRequest",
    "typeVersion": 4.2,
    "position": [900, 300]
  }

Whisper-1 charges ₹0.50 per minute of audio (USD 0.006/min × ₹83). A 60-second customer voicemail costs ₹0.50. The trick: set language: "en" even for Hindi-English code-switched calls. Whisper-1 trained on multilingual data; forcing English actually produces cleaner Hinglish transliterations that the Claude classifier handles better than romanised Devanagari. Gotcha we hit twice: Twilio's recording URL returns a 302 redirect to S3. n8n's HTTP Request node v4.2 follows redirects by default, but the binary download flag must be on or you get the redirect HTML, not the audio. Set "Response → Response Format" to "File" and the field name to data. ## Block 3 — Claude Haiku classifier This is the node that earns its keep. We feed Claude a strict JSON schema, the call transcript, and the routing context. Output is a clean object with intent, confidence, summary, and the destination Slack channel. The HTTP Request node hitting Anthropic:

json

{
    "parameters": {
      "method": "POST",
      "url": "https://api.anthropic.com/v1/messages",
      "sendHeaders": true,
      "headerParameters": {
        "parameters": [
          { "name": "x-api-key", "value": "={{ $credentials.anthropicApi.apiKey }}" },
          { "name": "anthropic-version", "value": "2023-06-01" },
          { "name": "content-type", "value": "application/json" }
        ]
      },
      "sendBody": true,
      "specifyBody": "json",
      "jsonBody": "={\n  \"model\": \"claude-haiku-4-5-20251115\",\n  \"max_tokens\": 400,\n  \"system\": \"You are a call-routing classifier for Softechinfra, an Indian IT services firm. Read the inbound caller transcript and return strict JSON only — no preamble. Schema: { intent: 'sales' | 'support' | 'accounts' | 'careers' | 'spam', confidence: number 0-1, summary: string max 140 chars, suggested_owner_email: string, urgency: 'now' | 'today' | 'this_week' }. Spam includes telemarketing, IVR test calls, and abusive language. If the transcript is silent or under 6 words, return intent='spam' with confidence 0.95.\",\n  \"messages\": [\n    {\"role\": \"user\", \"content\": \"Transcript: {{ $json.text }}\"}\n  ]\n}",
      "options": { "timeout": 20000 }
    },
    "id": "d4e5f6g7-claude",
    "name": "Claude Classify",
    "type": "n8n-nodes-base.httpRequest",
    "typeVersion": 4.2,
    "position": [1140, 300]
  }

Why HTTP Request and not the n8n Anthropic node? The native node lags real Anthropic model IDs by a few weeks. claude-haiku-4-5-20251115 shipped in November 2025; the native node only got the constant in Feb. For production we hit the API directly so we are not blocked on n8n releases. Cost math, per call: about 600 input tokens (system + transcript), 80 output tokens. At Haiku 4.5 rates that is (600 × $0.80 + 80 × $4) / 1M = $0.0008, or ₹0.07. Round to ₹0.08. A Code node parses the JSON safely and pushes the result downstream:

javascript

// Parse Claude's response
  const raw = $input.first().json.content[0].text;
  let parsed;
  try {
    parsed = JSON.parse(raw);
  } catch (e) {
    // Fallback: dump to manual triage
    parsed = {
      intent: 'support',
      confidence: 0,
      summary: 'CLASSIFIER FAILED — review manually: ' + raw.slice(0, 100),
      suggested_owner_email: 'contact@softechinfra.com',
      urgency: 'today'
    };
  }
  return { json: { ...parsed, transcript: $('Whisper Transcribe').first().json.text } };

## Block 4 — Slack + email fan-out A Switch node on {{ $json.intent }} routes to four Slack channels: #sales-inbound, #support-tickets, #accounts-inbound, #spam-quarantine. Each branch has a Slack node with a formatted message block. The Slack message uses Block Kit so the rep sees structured info, not a paragraph:

code

Inbound call routed: {{ $json.intent }} ({{ $json.confidence }})
  {{ $json.summary }}
  From: {{ $('Twilio Incoming Webhook').first().json.From }}
  Urgency: {{ $json.urgency }} | Suggested owner: {{ $json.suggested_owner_email }}
  > _Transcript:_ {{ $json.transcript }}

Email goes via the n8n Email Send node (Zoho SMTP) to whichever suggested_owner_email Claude returned, with the transcript in the body and a Twilio recording link for evidence. We do not trust Claude's email choice blindly — a separate Set node maps intent to a hard-coded allowlist of real reps' emails, so even if Claude hallucinates an email, the actual destination is safe. ## Cost comparison — n8n vs the alternatives For 800 calls/month (typical 6-person Indian SMB): The self-hosted n8n number includes the Hetzner CX22 (~₹740) plus the per-call AI + telephony spend (₹1.45 × 800 = ₹1,160). Make.com pricing pulled from their May 2026 site; Zapier from theirs. n8n Cloud Starter would hit its 2,500-execution limit if every call fires 4+ nodes — we usually recommend Pro (~₹4,200) if a client refuses to self-host. ## When NOT to do this — three real cases we declined Case 1 — A clinic that takes appointments over the phone. They wanted the AI to book the slot, not just route. That is a different workflow with Cal.com or Google Calendar integration plus DTMF input. We quoted it; they wanted ₹15k and a 3-day build. Real cost is closer to ₹65k because you need fallback for missed slots, slot-conflict handling, and an SMS confirmation loop. Case 2 — A Surat textile exporter. 80% of their calls are in Gujarati. Whisper-1's Gujarati WER is ~25% at conversational speed. We told them to wait for Whisper-3 or use a specialised Indic-ASR like Sarvam. Better to be honest than ship a 60%-accuracy router. Case 3 — Any business under 50 calls a month. A receptionist app on a phone (₹0/month, costs your sister's time) beats the AI workflow. The break-even is around 200 calls / month at our blended billing.

Compliance heads-up: TRAI's 2026 OSP norms require you to record an opening disclosure ("this call may be recorded for service quality") before the Record verb runs. Build it into the <Say> block, or your DID can be flagged. We learned this the slow way.

## A real mini-case — Lucknow logistics SMB, 40 employees Client: a regional 3PL out of Lucknow doing about 1,200 inbound calls a month. Before n8n, the front desk receptionist would log every call into a Google Sheet, transfer the relevant ones to dispatch, sales, or accounts, and miss roughly 18% during peak hours (mostly post-lunch). The misses cost them an estimated ₹3.2 lakh/quarter in lost RFQs — the GM's number, not ours. We shipped the workflow above plus a Twilio number with hunt-group fallback (if dispatch is on another call, ring the second-best person). The receptionist stayed; she now handles VIP customers and walk-ins only. Routing accuracy after 6 weeks: 92.4% over 1,400 calls, validated by us sampling 200 of them. Missed-RFQ rate dropped to ~3%. Total monthly bill (including the Hetzner instance, Twilio, OpenAI, Anthropic, and our maintenance retainer): ₹6,890. The same blueprint is what powers the voice infra behind [TalkDrill](https://talkdrill.com) — our in-house English speaking app with over 5,000 active users — where Twilio + Whisper + Claude handle conversational evaluation. Different workflow, same building blocks. ## Common mistakes we still see Mistake 1 — Trusting Twilio's transcription for Indian English. Twilio's built-in transcribes accents from the US Pacific Northwest beautifully. A Patna call comes through as gibberish. Always use Whisper, always force the language hint, always set temperature to 0. Mistake 2 — Synchronous TwiML response. Some tutorials show calling Whisper inside the TwiML response handler. That is a 15-second window. Whisper alone takes 2–6 seconds for a 60-second audio file. You will hit timeouts under load. Use the recording callback pattern instead. Mistake 3 — Ignoring spam. Without a spam bucket, your reps wake up to 30 telemarketing pings a day and the channel becomes noise. Spam quarantine is the #1 reason people stop trusting the workflow. Mistake 4 — No retry on Claude. Anthropic's API will 529-overload roughly 1 in 800 calls. Wrap the HTTP Request node with the "Retry On Fail" option set to 3 retries, 2s backoff. Anthropic's SLA covers it but n8n's default does not. Mistake 5 — Storing transcripts forever. Your DPDP Act compliance team will not approve raw caller transcripts living in n8n executions indefinitely. Set the n8n env var EXECUTIONS_DATA_MAX_AGE=720 (30 days) and you are clean. ## FAQ ### How long does the n8n receptionist take to build? For a competent n8n user with all credentials ready, the 47-node workflow takes about 6 working hours including the Twilio TwiML loop testing. We allow 7 working days end-to-end for client builds because Twilio India DID verification takes 2–3 business days, and TRAI OSP disclosure copy review eats another day. ### Can it transfer the live call to a human? Yes, replace the Record verb with a Dial verb in the TwiML. You lose the AI routing because the AI runs after recording finishes, but you can run a hybrid: Claude classifies the first 8 seconds, then conditionally dials the right rep. We have built it twice; it adds about 12 nodes and ₹0.18 per call. ### Does Whisper handle Hindi-English mixed calls? Yes. Whisper-1 is multilingual and we force language: "en" so it transliterates Hindi tokens phonetically (e.g., "delivery kab aayegi" stays readable). For pure Hindi calls, switch to language: "hi". Code-switching at conversational speed: about 88% word-accuracy in our test set, good enough for intent classification. ### What if Claude returns invalid JSON? Three protections. First, we instruct it to return strict JSON only in the system prompt. Second, the Code node wraps JSON.parse in a try/catch and routes to a manual-triage channel on failure. Third, we run a daily check on the failure channel and prompt-tune monthly. Real failure rate in production: ~0.4%. ### Is this DPDP Act compliant? It can be. You need three things: a disclosure jingle at call start, automated transcript retention deletion (we use 30 days), and a data-processing agreement with each subprocessor (Twilio, OpenAI, Anthropic). All three publish DPAs. Your DPO should review the chain before go-live. ### Can I run this on n8n Cloud instead of self-hosting? Yes — for under ~1,500 calls/month, n8n Cloud Pro (~₹4,200/mo) makes sense because you skip the ops overhead. Above that, self-hosting on Hetzner saves ₹2k–₹5k/month and gives you better latency to the Twilio US edge. Most of our clients self-host. ### What is the alternative if I do not want OpenAI? Replace Whisper with Deepgram Nova-2 (₹0.36/min, faster) or Sarvam Saaras (~₹0.42/min, much better Indic). Replace Claude Haiku with Mistral Small 24B (₹0.05/call, self-hosted) or Gemini Flash 2.0 (₹0.06/call, faster). The n8n graph stays the same — only the HTTP Request bodies change.

Want this exact n8n receptionist built and deployed for your office?

We ship the full 47-node workflow above, configured to your Twilio DID and Slack workspace, in 7 working days. Typical cost: ₹38,000–₹62,000 depending on call volume and language mix. Suitable if you take 200+ calls a month and lose business to missed routing. No slides — just your problem and our honest take on whether we can help.

Book a 20-min Call

Tags:

n8nTwilioVoice AIClaudeWhisperWorkflow AutomationIndia SMBReceptionist

Share this post:

Hrishikesh Baidya

CTO at Softechinfra specializing in Python, system architecture, and building secure, scalable software solutions.

Back to Blog

{ "parameters": { "httpMethod": "POST", "path": "voice/incoming", "responseMode": "responseNode", "options": { "rawBody": false } }, "id": "a1b2c3d4-incoming", "name": "Twilio Incoming Webhook", "type": "n8n-nodes-base.webhook", "typeVersion": 2, "position": [240, 300] }

{ "parameters": { "respondWith": "text", "responseBody": "Welcome to Softechinfra. Please describe your reason for calling after the beep. Press hash when done.", "options": { "responseCode": 200, "responseHeaders": { "entries": [ { "name": "Content-Type", "value": "application/xml" } ] } } }, "id": "b2c3d4e5-twiml", "name": "Return TwiML", "type": "n8n-nodes-base.respondToWebhook", "typeVersion": 1, "position": [460, 300] }

{ "parameters": { "method": "POST", "url": "https://api.openai.com/v1/audio/transcriptions", "authentication": "predefinedCredentialType", "nodeCredentialType": "openAiApi", "sendBody": true, "contentType": "multipart-form-data", "bodyParameters": { "parameters": [ { "parameterType": "formBinaryData", "name": "file", "inputDataFieldName": "data" }, { "name": "model", "value": "whisper-1" }, { "name": "language", "value": "en" }, { "name": "response_format", "value": "json" }, { "name": "temperature", "value": "0" } ] }, "options": { "timeout": 30000 } }, "id": "c3d4e5f6-whisper", "name": "Whisper Transcribe", "type": "n8n-nodes-base.httpRequest", "typeVersion": 4.2, "position": [900, 300] }

{ "parameters": { "method": "POST", "url": "https://api.anthropic.com/v1/messages", "sendHeaders": true, "headerParameters": { "parameters": [ { "name": "x-api-key", "value": "={{ $credentials.anthropicApi.apiKey }}" }, { "name": "anthropic-version", "value": "2023-06-01" }, { "name": "content-type", "value": "application/json" } ] }, "sendBody": true, "specifyBody": "json", "jsonBody": "={\n \"model\": \"claude-haiku-4-5-20251115\",\n \"max_tokens\": 400,\n \"system\": \"You are a call-routing classifier for Softechinfra, an Indian IT services firm. Read the inbound caller transcript and return strict JSON only — no preamble. Schema: { intent: 'sales' | 'support' | 'accounts' | 'careers' | 'spam', confidence: number 0-1, summary: string max 140 chars, suggested_owner_email: string, urgency: 'now' | 'today' | 'this_week' }. Spam includes telemarketing, IVR test calls, and abusive language. If the transcript is silent or under 6 words, return intent='spam' with confidence 0.95.\",\n \"messages\": [\n {\"role\": \"user\", \"content\": \"Transcript: {{ $json.text }}\"}\n ]\n}", "options": { "timeout": 20000 } }, "id": "d4e5f6g7-claude", "name": "Claude Classify", "type": "n8n-nodes-base.httpRequest", "typeVersion": 4.2, "position": [1140, 300] }

// Parse Claude's response const raw = $input.first().json.content[0].text; let parsed; try { parsed = JSON.parse(raw); } catch (e) { // Fallback: dump to manual triage parsed = { intent: 'support', confidence: 0, summary: 'CLASSIFIER FAILED — review manually: ' + raw.slice(0, 100), suggested_owner_email: 'contact@softechinfra.com', urgency: 'today' }; } return { json: { ...parsed, transcript: $('Whisper Transcribe').first().json.text } };

Inbound call routed: {{ $json.intent }} ({{ $json.confidence }}) {{ $json.summary }} From: {{ $('Twilio Incoming Webhook').first().json.From }} Urgency: {{ $json.urgency }} | Suggested owner: {{ $json.suggested_owner_email }} > _Transcript:_ {{ $json.transcript }}

Build an n8n Receptionist: Auto-Route Inbound Calls in Under 100 Nodes

Want this exact n8n receptionist built and deployed for your office?

Hrishikesh Baidya

Related Posts

Night Before Google I/O 2026: 5 Things Indian Builders Should Watch

Code with Claude SF: Managed Agents and the Build-vs-Buy Call

The IELTS Speaking Rubric Just Shifted. Here's How We're Updating TalkDrill

Want More Insights?

Build an n8n Receptionist: Auto-Route Inbound Calls in Under 100 Nodes

Want this exact n8n receptionist built and deployed for your office?

Hrishikesh Baidya

Related Posts

Night Before Google I/O 2026: 5 Things Indian Builders Should Watch

Code with Claude SF: Managed Agents and the Build-vs-Buy Call

The IELTS Speaking Rubric Just Shifted. Here's How We're Updating TalkDrill

Want More Insights?