A 3-Day Build: Customer-Support Chatbot for a Diwali D2C Brand on Claude Haiku 4 + Freshdesk + WhatsApp
A Bengaluru candle D2C brand needed a Diwali support bot in 72 hours. Claude Haiku 4 + Freshdesk API + WhatsApp Cloud. Retrieval pipeline + 11 escalation triggers that protect brand voice.
Hrishikesh Baidya
September 24, 202513 min read
0%
A Bengaluru D2C candle brand — 18 SKUs, 11k Instagram followers, ₹3.4 crore FY25 revenue — was 16 days from Diwali with a 4-person support team and a Freshdesk inbox already at 280 unresolved tickets. Their founder texted us on a Saturday: "Diwali bhaag raha hai, 3 din me kuch live kar do." We shipped a Claude Haiku 4 support chatbot wired into Freshdesk and WhatsApp Cloud API by Tuesday evening. By Diwali eve it was handling 71% of inbound support without a human. Build cost: ₹78,000. Run cost during the festive fortnight: ₹6,400.
3 days
Saturday brief to Tuesday production
71%
Auto-resolved tickets in Diwali fortnight
11
Hard escalation triggers (brand-voice safety)
₹6.4k
Run cost — 14-day Diwali window
## The Answer in 60 Words
Inbound message hits WhatsApp or the website widget. We classify intent in 1 Haiku call (40ms, ₹0.02), retrieve 4 chunks from a 220-page Postgres+pgvector knowledge base, generate a brand-voice reply with Haiku 4 (avg 1.1s, ₹0.18), and write a Freshdesk ticket either way. 11 escalation rules block the bot from answering anything that could damage the brand — refunds, defects, allergic reactions, influencer queries.
## Why This Matters Now
Diwali shopping behaviour in 2025 shifted measurably toward DM and WhatsApp — Meta's own [India 2025 festive report](https://about.fb.com/news/2024/10/festive-shopping-india-meta/) noted a 38% YoY jump in business-initiated WhatsApp conversations during the festive fortnight. Most D2C brands hire 2–3 temp support staff for the spike, train them in 2 days, and lose them after Bhai Dooj. A Claude Haiku 4 bot with a tight retrieval pipeline costs less than 1.5 days of one temp staffer's salary and does not quit on Day 8 when the founder yells about a misshipped order.
## The Client (Specific Details)
- Sector: Premium soy-wax candles + diffusers, online-only
- Location: Bengaluru HQ, fulfilment in Bommasandra
- Channels: WhatsApp (45% of inbound), Instagram DM (28%), website chat widget (17%), email (10%)
- Stack on day 0: Shopify, Freshdesk Growth tier, Razorpay, Shiprocket
- Pain: 280 unresolved tickets going into Diwali, founder doing 2 am replies, 1-star reviews citing "no response"
- The trigger: founder calculated 1 lost order = ₹2,400 LTV; missed-response rate during last Diwali was estimated at ~14%
## Why Claude Haiku 4 Specifically
We tested 3 models on the brand's 60-question evaluation set: GPT-4o-mini, Claude Haiku 4, and Llama 3.1 70B via Together.
Model
Cost / 1k convs
P50 latency
Brand-voice score (human)
Hallucination rate
Claude Haiku 4
₹260
1.1s
4.4 / 5
2.1%
GPT-4o-mini
₹190
0.9s
3.6 / 5
3.4%
Llama 3.1 70B (Together)
₹140
1.4s
3.1 / 5
5.7%
Brand voice is where Haiku 4 wins. The founder reviewed 40 sample replies blind — Haiku produced replies that "sound like our team", GPT-4o-mini sounded like "every other support bot". The 27% cost premium was an easy yes. Anthropic's [Haiku 4 launch notes](https://www.anthropic.com/news/) document the same observation: improved tone control on shorter outputs.
## The Architecture (One Diagram, in Words)
Inbound message → channel adapter (WhatsApp Cloud / IG DM / Freshdesk web widget) → unified webhook → intent classifier (Haiku, 50-token output) → retrieval over Postgres + pgvector (4 chunks, 800 tokens) → composer call to Haiku 4 with system prompt + brand FAQ + escalation rules → 11-rule guard checks the output → Freshdesk ticket written either way → reply sent.
CL
Claude Haiku 4 (claude-haiku-4-20250514)
Two calls per turn — intent classifier (cheap) + composer with retrieval context. Total avg 1.1s, ₹0.20 per conversation.
VEC
Postgres + pgvector for retrieval
220-page knowledge base (FAQs, shipping policy, fragrance notes, ingredient list, COD policy). 1,840 chunks of 280 tokens. text-embedding-3-small at ₹3 to embed the entire base.
FD
Freshdesk as system of record
Every interaction creates or appends to a ticket via the Freshdesk API. Bot replies are tagged "ai-handled". Human takeovers are seamless — agent sees the full bot transcript.
11
11 escalation triggers
Hard-coded rules that route the conversation to a human regardless of bot confidence. Refund requests, allergy mentions, complaints with anger language, influencer queries, B2B leads, etc.
## The 11 Escalation Triggers (Brand-Voice Safety Rules)
These are the exact rules the bot runs after every generated reply, before sending. Any rule firing = handoff.
Refund or return language — refund, return, paisa wapas, money back, exchange — bot never quotes refund policy unilaterally
Defect or quality complaint — broken, leaking, melted on arrival, fragrance off, kharab — needs photo + Shiprocket lookup
Allergic reaction or health concern — itchy, allergic, headache, asthma, baby — legal liability, never bot-handle
Bulk or B2B inquiry — bulk, corporate, gifting 100+, hotel order, wedding favour — high LTV, founder talks personally
Influencer or PR pitch — collab, barter, PR, media — marketing team handles
Wrong order delivered — wrong product received — needs order lookup + photo + reverse pickup
Anger language with a 0.7+ sentiment score — bot routes silently, agent picks up before customer escalates publicly
Order over ₹15,000 — premium customer, deserves human
Conversation past 5 turns without resolution — bot isn't going to win on turn 6
Mention of legal, consumer court, NCDRC, social media post — immediate de-escalation, founder loop-in
The 11th rule was added on day 4 — a customer threatened to "post on Twitter and tag everyone". The bot calmly continued generating replies. The founder caught it because he was watching the dashboard. We added the rule at 11 pm that night. Manvi, our QA lead, ran a regression suite of 80 hostile messages before we redeployed.
## The Retrieval Pipeline (Where Most RAG Bots Fail)
The naive answer is "embed everything, top-k retrieve, stuff context, generate". The naive answer fails on D2C support.
We chunked the 220-page knowledge base 3 different ways and reranked. First pass: semantic chunks of 280 tokens with 60-token overlap. Second pass: each chunk got a "what-question-does-this-answer" rewrite added to its embedding (HyDE-style), so user questions match better. Third pass: a small reranker (cohere/rerank-multilingual-v3.0 at ₹0.08 per query) picked the top 4 from a top-12 retrieval. Final result: 89% of generated replies cited a chunk that a human would have cited.
# Python sketch — runs in a FastAPI handler
from anthropic import Anthropic
import psycopg2
from cohere import Client as Cohere
anth = Anthropic()
cohere = Cohere()
def answer(user_msg: str, conv_history: list) -> dict:
# 1. Classify intent (cheap)
intent = anth.messages.create(
model="claude-haiku-4-20250514",
max_tokens=40,
system="Classify support intent. Return one of: order_status, product_info, shipping, fragrance_query, complaint, refund, bulk, other.",
messages=[{"role": "user", "content": user_msg}],
).content[0].text.strip()
if intent in ("complaint", "refund", "bulk"):
return {"action": "escalate", "intent": intent, "reason": "intent-rule"}
# 2. Retrieve + rerank
q_emb = embed_text(user_msg)
candidates = pg_vector_search(q_emb, k=12)
reranked = cohere.rerank(
model="rerank-multilingual-v3.0",
query=user_msg, documents=[c.text for c in candidates], top_n=4
)
context = "
".join([candidates[r.index].text for r in reranked.results])
# 3. Compose with brand voice
reply = anth.messages.create(
model="claude-haiku-4-20250514",
max_tokens=400,
system=BRAND_VOICE_PROMPT + "
The two cheap improvements that mattered most: HyDE-style "question-rewriting" of each chunk (took 90 minutes, raised retrieval precision by 14 points) and the Cohere reranker (₹0.08 per query, raised final answer quality by another 9 points in human eval).
## The 3-Day Build Plan
1
Day 1 morning — Knowledge base ingest
Founder dumped 220 pages from Notion + Shopify + a CA-supplied PDF policy. We extracted, cleaned, chunked, embedded with text-embedding-3-small (₹3 for the whole base), wrote into Postgres + pgvector. Used the HyDE rewrite trick on every chunk before the second embedding.
2
Day 1 evening — Intent classifier + first generated reply
Wrote the Haiku 4 intent classifier with a 60-example few-shot dataset. Composed first end-to-end answers on 12 test questions. Founder reviewed, approved tone on 9, asked us to "tone down the cheerfulness" on 3. Updated brand voice prompt.
3
Day 2 morning — Channel adapters
WhatsApp Cloud API webhook, Instagram DM via Meta's Messenger API (founder's IG was already a Meta business account), Freshdesk inbound email parser, plus a JS chat widget for shopify.com. All four route to the same /webhook/inbound endpoint with a normalized payload.
4
Day 2 evening — 11 escalation rules + Freshdesk integration
Coded the 11 guard rules. Wired Freshdesk: every conversation creates a ticket; bot replies append a private note "AI-handled, confidence X". Escalations re-assign to the support queue with a tag "needs-human".
5
Day 3 morning — Eval harness + 240-question regression
Built a Jupyter notebook that runs 240 historical tickets through the bot, scores each on intent-correct, retrieval-relevant, brand-voice-pass, escalation-trigger-correct. Founder ran 40 spot-checks. We tightened 4 prompt rules.
6
Day 3 afternoon — Soft launch on 10% traffic
Coin flip in the webhook — 10% of inbound got the bot, 90% went to humans as before. Founder watched the Freshdesk dashboard for 4 hours. Zero complaints. Bumped to 50%.
7
Day 3 evening — Full rollout + monitoring
100% of inbound goes through the bot. Grafana dashboard tracks: avg latency, % escalated, tickets per channel, P95 cost per conversation. PagerDuty alarm on retrieval-precision drop or escalation rate > 50%.
## The Cost Breakdown (Festive Fortnight, 14 Days)
For context: 3 temp support agents for the same period would cost ₹84,000 in salary + ₹6,000 in laptop loaner + onboarding time. Bot served 71% of those 3,400 conversations end-to-end.
## The Pre-Launch Checklist
220-page KB ingested with HyDE-style chunk rewrites
Cohere reranker key tested with 50 sample queries
11 escalation rules unit-tested against 80 hostile messages
Brand-voice prompt approved by founder on 40 spot-checks
Freshdesk ticket creation tested across all 4 inbound channels
WhatsApp Cloud API templates approved by Meta (utility category)
Coin-flip rollout flag tested in production at 10% / 50% / 100%
Grafana dashboard live with latency + escalation-rate widgets
PagerDuty alarm on retrieval precision drop > 12 points overnight
Founder trained on the kill switch (env DISABLE_BOT=1)
## When Not to Build This
Skip if (a) your support volume is under 50 inbound messages a day — humans cost less than the build amortization. (b) Your products have liability surface — anything ingested, applied to skin, or used by children should default to human in the loop, not bot. (c) Your knowledge base is in 8 different team members' heads and not written down — fix the documentation problem first; the bot will only amplify the gaps.
## A Detail That Saved the Founder a 1-Star Review
On day 6, a customer messaged "tum log ne mera order kahan bheja, address galat hai". The intent classifier flagged "order_status" — not a complaint. The retrieval pulled chunks about shipping zones. The composer drafted a reply explaining how to update the address with Shiprocket. The escalation guard rule on "anger language" caught the word "tum log" combined with sentiment > 0.7 and routed to a human. The founder personally replied within 4 minutes. The customer became a repeat buyer for Bhai Dooj. Without rule #7, that would have been a public 1-star tweet.
## How We Cross-Linked Into the Stack
This bot extends the same pattern we shipped earlier in 2025 for a [Surat textile WhatsApp + OpenAI support bot](/blog/whatsapp-openai-customer-support-bot-6-hours-stack-gotchas), and complements our [Tally + Razorpay reconcile workflow](/blog/n8n-tally-prime-razorpay-auto-reconcile-daily-settlements) that the same D2C team uses for finance ops. Hrishikesh led architecture, Manvi ran the QA suite. For festive D2C teams comparing buy vs. build, our AI automation team ships these in 3–5 working days.
For founders looking at ecosystem fit, our in-house product TalkDrill runs a similar Anthropic + Postgres + pgvector retrieval stack to deliver English-fluency feedback to 5,000+ Indian users — same architecture pattern, different domain.
## FAQ
### Why Claude Haiku 4 over GPT-4o-mini if it costs more?
Brand voice. The founder approved 88% of Haiku 4 replies blind versus 72% for GPT-4o-mini. For a premium D2C brand on Diwali, the 27% cost premium is the cheapest insurance against off-brand replies you can buy.
### Can the bot handle Hindi support?
Yes — Haiku 4 is fluent in Hinglish and Devanagari. We tested 90 Hindi tickets; brand-voice scores were 0.3 lower than English (4.1 vs 4.4) but acceptable. For pure Hindi voice (telephone), we use Sarvam + Claude as documented in our [Hindi voice bot post](/blog/hindi-voice-bot-tier-2-insurance-twilio-sarvam-claude-sonnet).
### How do you keep the knowledge base fresh?
A nightly cron pulls Notion + Shopify + Freshdesk macros, diffs against the previous snapshot, re-embeds only changed chunks. Total nightly cost: ₹2.40. Founder gets a Slack message each morning summarizing what changed.
### What happens to the human support team?
They moved from "answer 280 tickets/day" to "handle 80 escalated tickets, all of which are real". Reported job satisfaction went up. The seasonal temp hiring (3 people) was replaced with one full-time senior. Net cost saving: ₹2.1 lakh during the festive fortnight alone.
### Can the bot answer questions about specific orders?
Yes — the composer has a tool-calling step that hits the Shopify Admin API for any "where is my order" intent, then summarizes status (packed / shipped / out-for-delivery) with the AWB number. Took 90 minutes to wire on day 2.
### What's the latency on a worst-case query?
P95 was 2.4s end-to-end (intent + retrieval + reranker + compose + Freshdesk write). P50 was 1.1s. The reranker adds ~280ms — we considered dropping it, kept it for the brand-voice quality lift.
### How do you measure "brand voice"?
Two ways. Daily, the founder reviews 20 random sampled replies on a 1–5 scale; we keep a 5-week rolling average and alarm on a 0.3 drop. Quarterly, we run a blind A/B against the human team's last 100 replies and ask 3 brand-voice judges to label.
Want this Diwali support bot live this week?
We ship D2C support bots on Claude Haiku 4 + Freshdesk + WhatsApp in 3–5 working days. Fixed price ₹85k–₹1.4 lakh depending on KB size and channels. Includes the 11-rule escalation guard, the eval harness, and 30 days of post-launch tuning. Suitable if you take ≥ 60 inbound support messages a day across WhatsApp + chat + email.