A 3-Day Build: Customer-Support Chatbot for a Diwali D2C Brand on Claude Haiku 4 + Freshdesk + WhatsApp

Q: Can the bot handle Hindi support?

Yes. Haiku 4 is fluent in Hinglish and Devanagari. We tested 90 Hindi tickets; brand-voice scores were 0.3 lower than English (4.1 vs 4.4) but acceptable. For pure Hindi telephone voice we use Sarvam + Claude.

Q: How do you keep the knowledge base fresh?

A nightly cron pulls Notion, Shopify, and Freshdesk macros, diffs against the previous snapshot, and re-embeds only changed chunks. Nightly cost: ₹2.40. The founder gets a Slack summary each morning.

Q: What happens to the human support team?

They moved from answering 280 tickets/day to handling 80 escalated tickets, all of which are real. Job satisfaction went up. Seasonal temp hiring of 3 people was replaced with one full-time senior. Net saving: ₹2.1 lakh in the festive fortnight.

Q: Can the bot answer questions about specific orders?

Yes. The composer has a tool-calling step that hits the Shopify Admin API for any where-is-my-order intent, then summarises status with the AWB number. Took 90 minutes to wire on day 2.

A Bengaluru D2C candle brand — 18 SKUs, 11k Instagram followers, ₹3.4 crore FY25 revenue — was 16 days from Diwali with a 4-person support team and a Freshdesk inbox already at 280 unresolved tickets. Their founder texted us on a Saturday: "Diwali bhaag raha hai, 3 din me kuch live kar do." We shipped a Claude Haiku 4 support chatbot wired into Freshdesk and WhatsApp Cloud API by Tuesday evening. By Diwali eve it was handling 71% of inbound support without a human. Build cost: ₹78,000. Run cost during the festive fortnight: ₹6,400.

3 days

Saturday brief to Tuesday production

71%

Auto-resolved tickets in Diwali fortnight

Hard escalation triggers (brand-voice safety)

₹6.4k

Run cost — 14-day Diwali window

## The Answer in 60 Words Inbound message hits WhatsApp or the website widget. We classify intent in 1 Haiku call (40ms, ₹0.02), retrieve 4 chunks from a 220-page Postgres+pgvector knowledge base, generate a brand-voice reply with Haiku 4 (avg 1.1s, ₹0.18), and write a Freshdesk ticket either way. 11 escalation rules block the bot from answering anything that could damage the brand — refunds, defects, allergic reactions, influencer queries. ## Why This Matters Now Diwali shopping behaviour in 2025 shifted measurably toward DM and WhatsApp — Meta's own [India 2025 festive report](https://about.fb.com/news/2024/10/festive-shopping-india-meta/) noted a 38% YoY jump in business-initiated WhatsApp conversations during the festive fortnight. Most D2C brands hire 2–3 temp support staff for the spike, train them in 2 days, and lose them after Bhai Dooj. A Claude Haiku 4 bot with a tight retrieval pipeline costs less than 1.5 days of one temp staffer's salary and does not quit on Day 8 when the founder yells about a misshipped order. ## The Client (Specific Details) - Sector: Premium soy-wax candles + diffusers, online-only - Location: Bengaluru HQ, fulfilment in Bommasandra - Channels: WhatsApp (45% of inbound), Instagram DM (28%), website chat widget (17%), email (10%) - Stack on day 0: Shopify, Freshdesk Growth tier, Razorpay, Shiprocket - Pain: 280 unresolved tickets going into Diwali, founder doing 2 am replies, 1-star reviews citing "no response" - The trigger: founder calculated 1 lost order = ₹2,400 LTV; missed-response rate during last Diwali was estimated at ~14% ## Why Claude Haiku 4 Specifically We tested 3 models on the brand's 60-question evaluation set: GPT-4o-mini, Claude Haiku 4, and Llama 3.1 70B via Together.

Model	Cost / 1k convs	P50 latency	Brand-voice score (human)	Hallucination rate
Claude Haiku 4	₹260	1.1s	4.4 / 5	2.1%
GPT-4o-mini	₹190	0.9s	3.6 / 5	3.4%
Llama 3.1 70B (Together)	₹140	1.4s	3.1 / 5	5.7%

Brand voice is where Haiku 4 wins. The founder reviewed 40 sample replies blind — Haiku produced replies that "sound like our team", GPT-4o-mini sounded like "every other support bot". The 27% cost premium was an easy yes. Anthropic's [Haiku 4 launch notes](https://www.anthropic.com/news/) document the same observation: improved tone control on shorter outputs. ## The Architecture (One Diagram, in Words) Inbound message → channel adapter (WhatsApp Cloud / IG DM / Freshdesk web widget) → unified webhook → intent classifier (Haiku, 50-token output) → retrieval over Postgres + pgvector (4 chunks, 800 tokens) → composer call to Haiku 4 with system prompt + brand FAQ + escalation rules → 11-rule guard checks the output → Freshdesk ticket written either way → reply sent.

Claude Haiku 4 (claude-haiku-4-20250514)

Two calls per turn — intent classifier (cheap) + composer with retrieval context. Total avg 1.1s, ₹0.20 per conversation.

VEC

Postgres + pgvector for retrieval

220-page knowledge base (FAQs, shipping policy, fragrance notes, ingredient list, COD policy). 1,840 chunks of 280 tokens. text-embedding-3-small at ₹3 to embed the entire base.

Freshdesk as system of record

Every interaction creates or appends to a ticket via the Freshdesk API. Bot replies are tagged "ai-handled". Human takeovers are seamless — agent sees the full bot transcript.

11 escalation triggers

Hard-coded rules that route the conversation to a human regardless of bot confidence. Refund requests, allergy mentions, complaints with anger language, influencer queries, B2B leads, etc.

## The 11 Escalation Triggers (Brand-Voice Safety Rules) These are the exact rules the bot runs after every generated reply, before sending. Any rule firing = handoff.

Refund or return language — refund, return, paisa wapas, money back, exchange — bot never quotes refund policy unilaterally
Defect or quality complaint — broken, leaking, melted on arrival, fragrance off, kharab — needs photo + Shiprocket lookup
Allergic reaction or health concern — itchy, allergic, headache, asthma, baby — legal liability, never bot-handle
Bulk or B2B inquiry — bulk, corporate, gifting 100+, hotel order, wedding favour — high LTV, founder talks personally
Influencer or PR pitch — collab, barter, PR, media — marketing team handles
Wrong order delivered — wrong product received — needs order lookup + photo + reverse pickup
Anger language with a 0.7+ sentiment score — bot routes silently, agent picks up before customer escalates publicly
Customer asks for owner / manager / escalation — explicit request, always honour
Order over ₹15,000 — premium customer, deserves human
Conversation past 5 turns without resolution — bot isn't going to win on turn 6
Mention of legal, consumer court, NCDRC, social media post — immediate de-escalation, founder loop-in

The 11th rule was added on day 4 — a customer threatened to "post on Twitter and tag everyone". The bot calmly continued generating replies. The founder caught it because he was watching the dashboard. We added the rule at 11 pm that night. Manvi, our QA lead, ran a regression suite of 80 hostile messages before we redeployed. ## The Retrieval Pipeline (Where Most RAG Bots Fail) The naive answer is "embed everything, top-k retrieve, stuff context, generate". The naive answer fails on D2C support. We chunked the 220-page knowledge base 3 different ways and reranked. First pass: semantic chunks of 280 tokens with 60-token overlap. Second pass: each chunk got a "what-question-does-this-answer" rewrite added to its embedding (HyDE-style), so user questions match better. Third pass: a small reranker (cohere/rerank-multilingual-v3.0 at ₹0.08 per query) picked the top 4 from a top-12 retrieval. Final result: 89% of generated replies cited a chunk that a human would have cited.

# Python sketch — runs in a FastAPI handler
  
  from anthropic import Anthropic
  import psycopg2
  from cohere import Client as Cohere
  
  anth = Anthropic()
  cohere = Cohere()
  
  def answer(user_msg: str, conv_history: list) -> dict:
      # 1. Classify intent (cheap)
      intent = anth.messages.create(
          model="claude-haiku-4-20250514",
          max_tokens=40,
          system="Classify support intent. Return one of: order_status, product_info, shipping, fragrance_query, complaint, refund, bulk, other.",
          messages=[{"role": "user", "content": user_msg}],
      ).content[0].text.strip()
  
      if intent in ("complaint", "refund", "bulk"):
          return {"action": "escalate", "intent": intent, "reason": "intent-rule"}
  
      # 2. Retrieve + rerank
      q_emb = embed_text(user_msg)
      candidates = pg_vector_search(q_emb, k=12)
      reranked = cohere.rerank(
          model="rerank-multilingual-v3.0",
          query=user_msg, documents=[c.text for c in candidates], top_n=4
      )
      context = "".join([candidates[r.index].text for r in reranked.results])
  
      # 3. Compose with brand voice
      reply = anth.messages.create(
          model="claude-haiku-4-20250514",
          max_tokens=400,
          system=BRAND_VOICE_PROMPT + "Context:
" + context,
          messages=conv_history + [{"role": "user", "content": user_msg}],
      ).content[0].text
  
      # 4. Run the 11 escalation guard rules
      if guard_should_escalate(reply, user_msg):
          return {"action": "escalate", "intent": intent, "draft": reply}
  
      return {"action": "send", "reply": reply, "intent": intent}

The two cheap improvements that mattered most: HyDE-style "question-rewriting" of each chunk (took 90 minutes, raised retrieval precision by 14 points) and the Cohere reranker (₹0.08 per query, raised final answer quality by another 9 points in human eval). ## The 3-Day Build Plan

Day 1 morning — Knowledge base ingest

Founder dumped 220 pages from Notion + Shopify + a CA-supplied PDF policy. We extracted, cleaned, chunked, embedded with text-embedding-3-small (₹3 for the whole base), wrote into Postgres + pgvector. Used the HyDE rewrite trick on every chunk before the second embedding.

Day 1 evening — Intent classifier + first generated reply

Wrote the Haiku 4 intent classifier with a 60-example few-shot dataset. Composed first end-to-end answers on 12 test questions. Founder reviewed, approved tone on 9, asked us to "tone down the cheerfulness" on 3. Updated brand voice prompt.

Day 2 morning — Channel adapters

WhatsApp Cloud API webhook, Instagram DM via Meta's Messenger API (founder's IG was already a Meta business account), Freshdesk inbound email parser, plus a JS chat widget for shopify.com. All four route to the same /webhook/inbound endpoint with a normalized payload.

Day 2 evening — 11 escalation rules + Freshdesk integration

Coded the 11 guard rules. Wired Freshdesk: every conversation creates a ticket; bot replies append a private note "AI-handled, confidence X". Escalations re-assign to the support queue with a tag "needs-human".

Day 3 morning — Eval harness + 240-question regression

Built a Jupyter notebook that runs 240 historical tickets through the bot, scores each on intent-correct, retrieval-relevant, brand-voice-pass, escalation-trigger-correct. Founder ran 40 spot-checks. We tightened 4 prompt rules.

Day 3 afternoon — Soft launch on 10% traffic

Coin flip in the webhook — 10% of inbound got the bot, 90% went to humans as before. Founder watched the Freshdesk dashboard for 4 hours. Zero complaints. Bumped to 50%.

Day 3 evening — Full rollout + monitoring

100% of inbound goes through the bot. Grafana dashboard tracks: avg latency, % escalated, tickets per channel, P95 cost per conversation. PagerDuty alarm on retrieval-precision drop or escalation rate > 50%.

## The Cost Breakdown (Festive Fortnight, 14 Days) For context: 3 temp support agents for the same period would cost ₹84,000 in salary + ₹6,000 in laptop loaner + onboarding time. Bot served 71% of those 3,400 conversations end-to-end. ## The Pre-Launch Checklist

220-page KB ingested with HyDE-style chunk rewrites
Cohere reranker key tested with 50 sample queries
11 escalation rules unit-tested against 80 hostile messages
Brand-voice prompt approved by founder on 40 spot-checks
Freshdesk ticket creation tested across all 4 inbound channels
WhatsApp Cloud API templates approved by Meta (utility category)
Coin-flip rollout flag tested in production at 10% / 50% / 100%
Grafana dashboard live with latency + escalation-rate widgets
PagerDuty alarm on retrieval precision drop > 12 points overnight
Founder trained on the kill switch (env DISABLE_BOT=1)

## When Not to Build This Skip if (a) your support volume is under 50 inbound messages a day — humans cost less than the build amortization. (b) Your products have liability surface — anything ingested, applied to skin, or used by children should default to human in the loop, not bot. (c) Your knowledge base is in 8 different team members' heads and not written down — fix the documentation problem first; the bot will only amplify the gaps. ## A Detail That Saved the Founder a 1-Star Review On day 6, a customer messaged "tum log ne mera order kahan bheja, address galat hai". The intent classifier flagged "order_status" — not a complaint. The retrieval pulled chunks about shipping zones. The composer drafted a reply explaining how to update the address with Shiprocket. The escalation guard rule on "anger language" caught the word "tum log" combined with sentiment > 0.7 and routed to a human. The founder personally replied within 4 minutes. The customer became a repeat buyer for Bhai Dooj. Without rule #7, that would have been a public 1-star tweet. ## How We Cross-Linked Into the Stack This bot extends the same pattern we shipped earlier in 2025 for a [Surat textile WhatsApp + OpenAI support bot](/blog/whatsapp-openai-customer-support-bot-6-hours-stack-gotchas), and complements our [Tally + Razorpay reconcile workflow](/blog/n8n-tally-prime-razorpay-auto-reconcile-daily-settlements) that the same D2C team uses for finance ops. Hrishikesh led architecture, Manvi ran the QA suite. For festive D2C teams comparing buy vs. build, our AI automation team ships these in 3–5 working days. For founders looking at ecosystem fit, our in-house product TalkDrill runs a similar Anthropic + Postgres + pgvector retrieval stack to deliver English-fluency feedback to 5,000+ Indian users — same architecture pattern, different domain. ## FAQ ### Why Claude Haiku 4 over GPT-4o-mini if it costs more? Brand voice. The founder approved 88% of Haiku 4 replies blind versus 72% for GPT-4o-mini. For a premium D2C brand on Diwali, the 27% cost premium is the cheapest insurance against off-brand replies you can buy. ### Can the bot handle Hindi support? Yes — Haiku 4 is fluent in Hinglish and Devanagari. We tested 90 Hindi tickets; brand-voice scores were 0.3 lower than English (4.1 vs 4.4) but acceptable. For pure Hindi voice (telephone), we use Sarvam + Claude as documented in our [Hindi voice bot post](/blog/hindi-voice-bot-tier-2-insurance-twilio-sarvam-claude-sonnet). ### How do you keep the knowledge base fresh? A nightly cron pulls Notion + Shopify + Freshdesk macros, diffs against the previous snapshot, re-embeds only changed chunks. Total nightly cost: ₹2.40. Founder gets a Slack message each morning summarizing what changed. ### What happens to the human support team? They moved from "answer 280 tickets/day" to "handle 80 escalated tickets, all of which are real". Reported job satisfaction went up. The seasonal temp hiring (3 people) was replaced with one full-time senior. Net cost saving: ₹2.1 lakh during the festive fortnight alone. ### Can the bot answer questions about specific orders? Yes — the composer has a tool-calling step that hits the Shopify Admin API for any "where is my order" intent, then summarizes status (packed / shipped / out-for-delivery) with the AWB number. Took 90 minutes to wire on day 2. ### What's the latency on a worst-case query? P95 was 2.4s end-to-end (intent + retrieval + reranker + compose + Freshdesk write). P50 was 1.1s. The reranker adds ~280ms — we considered dropping it, kept it for the brand-voice quality lift. ### How do you measure "brand voice"? Two ways. Daily, the founder reviews 20 random sampled replies on a 1–5 scale; we keep a 5-week rolling average and alarm on a 0.3 drop. Quarterly, we run a blind A/B against the human team's last 100 replies and ask 3 brand-voice judges to label.

Want this Diwali support bot live this week?

We ship D2C support bots on Claude Haiku 4 + Freshdesk + WhatsApp in 3–5 working days. Fixed price ₹85k–₹1.4 lakh depending on KB size and channels. Includes the 11-rule escalation guard, the eval harness, and 30 days of post-launch tuning. Suitable if you take ≥ 60 inbound support messages a day across WhatsApp + chat + email.

Book a 20-min Call

Tags:

ChatbotClaude HaikuFreshdeskWhatsAppD2CDiwaliRAG

Share this post:

Hrishikesh Baidya

CTO at Softechinfra specializing in Python, system architecture, and building secure, scalable software solutions.

Back to Blog

Model

Cost / 1k convs

P50 latency

Brand-voice score (human)

Hallucination rate

Claude Haiku 4

₹260

1.1s

4.4 / 5

2.1%

GPT-4o-mini

₹190

0.9s

3.6 / 5

3.4%

Llama 3.1 70B (Together)

₹140

1.4s

3.1 / 5

5.7%

# Python sketch — runs in a FastAPI handler from anthropic import Anthropic import psycopg2 from cohere import Client as Cohere anth = Anthropic() cohere = Cohere() def answer(user_msg: str, conv_history: list) -> dict: # 1. Classify intent (cheap) intent = anth.messages.create( model="claude-haiku-4-20250514", max_tokens=40, system="Classify support intent. Return one of: order_status, product_info, shipping, fragrance_query, complaint, refund, bulk, other.", messages=[{"role": "user", "content": user_msg}], ).content[0].text.strip() if intent in ("complaint", "refund", "bulk"): return {"action": "escalate", "intent": intent, "reason": "intent-rule"} # 2. Retrieve + rerank q_emb = embed_text(user_msg) candidates = pg_vector_search(q_emb, k=12) reranked = cohere.rerank( model="rerank-multilingual-v3.0", query=user_msg, documents=[c.text for c in candidates], top_n=4 ) context = "".join([candidates[r.index].text for r in reranked.results]) # 3. Compose with brand voice reply = anth.messages.create( model="claude-haiku-4-20250514", max_tokens=400, system=BRAND_VOICE_PROMPT + "Context: " + context, messages=conv_history + [{"role": "user", "content": user_msg}], ).content[0].text # 4. Run the 11 escalation guard rules if guard_should_escalate(reply, user_msg): return {"action": "escalate", "intent": intent, "draft": reply} return {"action": "send", "reply": reply, "intent": intent}

A 3-Day Build: Customer-Support Chatbot for a Diwali D2C Brand on Claude Haiku 4 + Freshdesk + WhatsApp

Want this Diwali support bot live this week?

Hrishikesh Baidya

Related Posts

UPI Collect Is Dead: We Migrated 4 Indian Apps to Intent + QR Flows — Here's the Playbook

Prompt Eval Pipelines: 200 Changes a Week Without Breaking TalkDrill

Scaling PenLeap: 60 to 600 Concurrent Writers, Same Number of Servers

Want More Insights?

A 3-Day Build: Customer-Support Chatbot for a Diwali D2C Brand on Claude Haiku 4 + Freshdesk + WhatsApp

Want this Diwali support bot live this week?

Hrishikesh Baidya

Related Posts

UPI Collect Is Dead: We Migrated 4 Indian Apps to Intent + QR Flows — Here's the Playbook

Prompt Eval Pipelines: 200 Changes a Week Without Breaking TalkDrill

Scaling PenLeap: 60 to 600 Concurrent Writers, Same Number of Servers

Want More Insights?