A WhatsApp + OpenAI Support Bot in 6 Hours: The Stack and What Breaks in Week 2
WhatsApp Cloud API + gpt-4o-mini + Pinecone + n8n. We shipped this for a Bangalore D2C client in 6 hours. Then three things broke in week 2 — here is what they were and the fixes.
Hrishikesh Baidya
November 6, 202511 min read
0%
A 9-person Bangalore D2C skincare brand wanted a WhatsApp bot to answer "where is my order," "what is the return policy," and "is this safe for sensitive skin." We shipped a working v1 in 6 hours on a Tuesday afternoon. Stack: WhatsApp Cloud API, OpenAI gpt-4o-mini, Pinecone, and n8n self-hosted on a ₹740/month Hetzner CX22. Then week 2 started — and three things broke that nobody warns you about. This post is the exact stack, the build steps, and the gotchas we now bake into every WhatsApp bot from day one.
6 hrs
From kickoff to first reply
₹0.16
Cost per utility conversation (India)
3
Things that always break in week 2
~₹4,200
Monthly run cost at 1,800 conversations
## TL;DR — what this post delivers
A copy-runnable build of a WhatsApp customer-support bot using the official Meta Cloud API (no third-party BSP markup), gpt-4o-mini for reply generation, Pinecone for product/policy retrieval, and n8n as the orchestrator. Build time: 6 hours for one engineer. Monthly cost: roughly ₹4,200 at 1,800 conversations. Three failure modes show up consistently in week 2 — the 24-hour customer service window expiring, the gpt-4o-mini context-stuffing problem, and the IST timezone bug in Meta's webhook timestamps. We list each with the fix.
## Why this matters now (Nov 2025)
Meta updated WhatsApp pricing on July 1, 2025 — the model is now per-message, with utility templates inside a 24-hour service window now free, and a 300-call-per-minute API rate limit on the upgrade tier per [Meta's official pricing page](https://developers.facebook.com/docs/whatsapp/pricing/updates-to-pricing/). For Indian SMBs, the new structure means a properly-built bot is materially cheaper than a third-party BSP (Business Solution Provider) wrapper. The same bot that cost ₹12,000–₹18,000/month on Wati or Interakt last year now costs ₹4,000–₹6,000/month direct. The stack we describe here is the cheap path; you trade convenience for unit economics.
## The exact stack (versions and prices, Nov 2025)
WA
WhatsApp Cloud API (direct)
Meta's official API. ₹0.882 per marketing conversation, ₹0.16 per utility, ₹0.129 per authentication, service messages free in the 24-hour window. No BSP markup.
AI
OpenAI gpt-4o-mini
$0.15 per 1M input, $0.60 per 1M output. Fast enough (avg 800ms TTFB on India-routed calls) for chat. Cheaper alternative to Claude Haiku for short replies.
DB
Pinecone (Starter tier)
Free tier holds 100k vectors and 5GB. Plenty for a product catalog of 400 SKUs + 80 policy docs. Switched to Standard ($70/mo) only after 3,000 SKUs.
⚙️
n8n self-hosted (v1.62)
Hetzner CX22 (2 vCPU, 4GB RAM, ₹740/month). Webhook node receives Meta callbacks, OpenAI + Pinecone nodes do the work, WhatsApp HTTP Request node sends the reply.
## The 6-hour build (step by step)
This is the actual sequence we ran for the Bangalore client. Everything else (analytics, dashboards, agent handoff) was added in week 2.
1
Hour 1 — Meta Business + WhatsApp Cloud API setup
Meta Business Suite > WhatsApp Manager > add a phone number (use a fresh number, not the founder's personal). Verify business via the OTP. Generate a permanent access token via System Users (NOT a temporary token — those expire in 24h and you will lose Friday night). Note your Phone Number ID and WABA ID. Time spent: 35 minutes if your business is already Meta-verified, 3 hours if not.
2
Hour 2 — n8n on Hetzner
Spin up Hetzner CX22 (₹740/month). Install Docker. Run docker run -d --name n8n -p 5678:5678 -v n8n_data:/home/node/.n8n n8nio/n8n. Add Caddy in front for HTTPS — Meta requires HTTPS on the webhook URL, no exceptions. Test the webhook endpoint with curl before pointing Meta at it.
3
Hour 3 — Pinecone seeding
Create a Pinecone index, dimension 1536 (matches text-embedding-3-small), metric cosine. Embed your product catalog and FAQ docs. We use a 400-token chunk with 80-token overlap. For our client: 400 SKUs + 80 FAQ docs = 1,840 chunks total. Embedding cost: under ₹15 one-time.
4
Hour 4 — n8n flow: webhook -> retrieve -> answer
Webhook node listens on /whatsapp. Function node parses the Meta payload (it is nested 4 levels deep — extract entry[0].changes[0].value.messages[0].text.body). OpenAI Embedding node embeds the user message. Pinecone node retrieves top 5 chunks. OpenAI Chat node calls gpt-4o-mini with retrieved context + system prompt. HTTP Request node POSTs to Meta to send the reply.
5
Hour 5 — Webhook verification + first message
Meta verifies your webhook with a GET request carrying hub.challenge. Echo it back as plain text or the verification fails silently. Then send "hi" from your personal WhatsApp to the business number. Reply should land in 2-4 seconds.
6
Hour 6 — Templates + handoff
Create one utility template ("order_status") and one marketing template ("welcome_offer") in WhatsApp Manager. Wait for Meta approval (15 minutes to 24 hours). Add a fallback in your n8n flow: if the bot's confidence score is below 0.6 OR the user types "agent" / "human" / "manager", forward the conversation to a Slack channel where your support person picks up.
Free shortcut: n8n.io has a community workflow template for this exact stack. Import it, swap your credentials, save 90 minutes.
## The system prompt that worked (first try)
We iterate on this every project. The version below is what shipped to our Bangalore client and survived contact with real users.
code
You are a customer-support assistant for [BRAND], a D2C skincare brand based in Bangalore.
Rules:
- Reply in the language the customer used. If they mix Hindi + English, mirror that.
- Maximum 4 lines per reply. WhatsApp users hate walls of text.
- If the customer asks about an order, ask for the order ID first. Never invent details.
- If asked for product safety / allergy advice, refuse and recommend a dermatologist.
- If you do not have the answer in the retrieved context, say so plainly and offer to connect to a human.
- Never invent prices, never invent availability, never quote a delivery date.
You have access to retrieved snippets from product pages and FAQ docs (provided below). Use only those.
Retrieved context:
{pinecone_chunks}
Customer message:
{user_message}
The "never invent" lines are not optional. Without them, gpt-4o-mini will make up tracking numbers — we tested it. With them, hallucinations dropped from 8% to under 1% in our internal evals.
## The 3 things that always break in week 2
Every WhatsApp bot we have shipped — and we have shipped 14 of them in 2025 — hits these three problems within 14 days. We now build defenses against them on day one.
### Break #1 — The 24-hour service window expires
Symptom: Customer messaged you 25 hours ago. Bot tries to reply. Meta returns error 131047: "Message failed to send because more than 24 hours have passed since the customer last replied."
Cause: WhatsApp's free-messaging service window only covers messages sent within 24 hours of the customer's last inbound message. After that, you must use a pre-approved template (utility, marketing, or authentication) — and templates cost money.
Fix: In your n8n flow, check the timestamp of the last inbound message before sending. If it is more than 23 hours ago, switch to a template send instead. We use a "follow_up_unanswered" utility template for support cases that drift past the window. Approval takes 15 minutes to a few hours.
The cost trap: If your bot accidentally fires marketing templates instead of utility templates, your monthly bill jumps 6x. ₹0.882 vs ₹0.16 per conversation in India. Always verify your template's category in WhatsApp Manager before using it programmatically.
### Break #2 — gpt-4o-mini stuffs context with stale chat history
Symptom: A customer who asked about Product A on Monday gets confused replies about Product B on Friday — because the bot still has Monday's context in the conversation history.
Cause: Naive implementations stuff the entire chat history into every prompt. By day 7, you are paying for 4,000 tokens of irrelevant context per turn and the model genuinely loses focus.
Fix: Cap conversation history to the last 6 messages OR the last 30 minutes, whichever is shorter. Store history in Postgres or Redis keyed by phone number with a TTL. Reset context when you detect a topic change (run a tiny classifier — gpt-4o-mini itself works as a classifier with a 50-token prompt).
### Break #3 — IST timezone bug in Meta's webhook timestamps
Symptom: Your "last message timestamp" check is off by 5.5 hours. Customers who messaged at 11pm IST get treated as if they messaged at 5:30am IST. The 24-hour window calculation breaks. You either send templates when you didn't need to (waste money) or skip templates when you did (silent failures).
Cause: Meta's webhook payload sends timestamps in Unix epoch UTC. Half the n8n templates floating around assume the server is in UTC. Hetzner servers default to UTC, but if you ever migrate to AWS Mumbai (which defaults to IST in some setups) or run n8n on a developer's laptop, the mismatch creates phantom bugs.
Fix: Always parse Meta timestamps as UTC, do all math in UTC, only convert to IST for display. Add a regression test: ingest a fixture webhook payload, compute the service window expiry, assert it matches a known-good value. This took us 4 hours to debug at a Pune client; never again.
## Real numbers — what 1,800 conversations a month cost
The Bangalore client crossed 1,800 conversations in their second month. Cost breakdown:
Compare this with [Wati's Standard plan](https://www.wati.io/pricing/) (~$49/month base + ~₹0.80 per session) or Interakt (~₹2,499/month base + per-message). For 1,800 conversations, you save roughly ₹9,000/month going direct. Over a year, that's enough to fund the engineer who built the bot.
## Pre-launch checklist
Permanent access token generated via System Users (NOT temporary token)
Webhook on HTTPS with a valid SSL cert (Caddy auto-provisions Let's Encrypt)
Webhook verification GET handler echoes hub.challenge as plain text
System prompt has explicit "never invent" instructions for prices, dates, tracking
Fallback to human via Slack on low-confidence replies OR explicit "agent" keyword
24-hour service window check before every send
Conversation history capped to 6 messages OR 30 minutes
Timestamps parsed as UTC, converted to IST only for display
One utility template + one marketing template approved in WhatsApp Manager
Smoke test from 3 different phone numbers before go-live
## A real example — the Bangalore D2C client
Sector: D2C skincare. Team: 9 people. Product catalog: 400 SKUs. Pre-bot support load: 220-280 customer DMs per day, handled by 2 part-time founders' assistants on Friday + Sunday spillover. Pain: the founder was answering 60+ DMs at 11pm and the team was burning out.
We shipped the bot on day 1 (6 hours). On day 9, the founder stopped manually answering — bot caught 71% of inbound, escalated 18% to the human channel, ignored 11% as noise (gibberish, stickers, single emojis). By month 2, the bot was handling 1,800 conversations, the founder's evening DM load was down to 12-15 per day, and the human handoff queue was being cleared in batches twice a day instead of all night.
We have shipped this same architecture for a Surat textile exporter, a Kolkata coaching center, and a Coimbatore D2C food brand — same 6-hour build, same 3 week-2 problems, same fixes. The pattern ports.
## When NOT to build this yourself
Skip the DIY if (a) your team has zero engineering capacity and you'd be bottlenecked on every change, (b) your conversation volume is over 50,000/month — you'll need queueing, multi-region, observability that n8n on a single Hetzner box won't deliver, or (c) you're in a regulated sector (banking, insurance) where call recording + audit trails are statutory. For (a), use Wati or Interakt. For (b) and (c), you need a custom production stack — talk to Hrishikesh, our CTO.
## Why we use this stack at Softechinfra
This is the same baseline we use on conversational-AI projects across our AI automation team. The lessons we learned shipping voice AI for our in-house product TalkDrill (an English-fluency app for Indian adults, 5,000+ active users) feed back into how we build chat bots for clients — particularly around context management, language detection, and human handoff. For the deeper voice-side architecture, see our earlier deep-dive on how TalkDrill hits 800ms voice round-trip latency.
We saw similar wins building Radiant Finance's lead-capture pipeline — same n8n + LLM pattern, different vertical. The architecture is generic.
Reddit threads worth reading before you ship: [r/n8n](https://www.reddit.com/r/n8n/) for workflow patterns, [r/LangChain](https://www.reddit.com/r/LangChain/) for retrieval tradeoffs, and the [Meta WhatsApp Cloud API GitHub issues](https://github.com/whatsapp-cloud-api) for the obscure bugs.
## FAQ
### Do I need a Business Solution Provider (BSP) like Wati or Interakt?
No. As of November 2025, Meta's direct Cloud API onboarding is straightforward — verify your business, add a number, generate a token. BSPs add value if you want a no-code dashboard for your support team, but they cost 3–5x more for the same throughput. For SMBs with engineering, direct is the right answer.
### How do WhatsApp Cloud API rate limits work in practice?
The default rate limit is 80 messages per second on a registered phone number, with a 300-call-per-minute API ceiling that can be raised to 600/min on request per [Meta's docs](https://developers.facebook.com/docs/whatsapp/cloud-api/calling/pricing/). For an SMB doing under 10,000 conversations a month, you will never hit these limits.
### Can gpt-4o-mini handle Hindi or Hinglish replies?
Yes, well enough for short customer-support replies. For longer-form Hindi content or proper code-switching with Devanagari, Sarvam-M or Claude Sonnet 4.5 perform noticeably better. For a typical D2C support flow, gpt-4o-mini is fine.
### What if Pinecone is too expensive?
Their free tier holds 100,000 vectors and 5GB. We have shipped 11 production bots without ever paying Pinecone. If you outgrow it, the alternatives are Postgres pgvector self-hosted (free, your own ops), Qdrant Cloud (cheaper at scale), or Weaviate. We default to Pinecone Starter for early projects because it eliminates one operational concern.
### How do I avoid getting banned by Meta?
Three rules: (a) don't send marketing templates to customers who haven't opted in, (b) maintain a >70% delivery rate (low quality phone numbers tank your sender score), (c) respond within 24 hours when customers initiate. Most bans we have seen come from (a) — bulk marketing without opt-in. Don't do it.
### Can n8n handle 50k conversations a month?
On a single Hetzner CX22, comfortably up to about 15k. Past that, you want a queue (RabbitMQ or Redis Streams) between the webhook receive and the LLM call so spikes don't drop messages. Past 50k, n8n is the wrong tool — switch to a custom FastAPI service with Celery workers.
### How do I add agent handoff cleanly?
When the bot's confidence drops below 0.6 OR the user explicitly asks for a human, send a Slack message via webhook to a #support channel. The agent picks up by replying in Slack — your n8n flow forwards their message to WhatsApp. Mark the conversation as "human-controlled" for the next 30 minutes and pause the bot in that thread.
Want a WhatsApp Chatbot Live on Your Number This Month?
We ship a working WhatsApp + AI bot for Indian SMBs in 7 working days. Direct Cloud API integration (no BSP markup), n8n self-hosted on your infra, system-prompt tuning to your brand voice, human handoff to Slack or your CRM. Typical project: ₹45,000–₹85,000 fixed scope. First call is technical — with the engineer who would lead your build.