TalkDrill 2025: 5,000 Users, 9 Lessons We Got Wrong, and 3 Engineering Bets That Worked
A year-end review of TalkDrill, our in-house English speaking app for Indian adults. 5,000 active users, 9 lessons we learned the hard way (retention, voice latency, billing), and 3 stack changes that actually worked.
Vivek Kumar
December 27, 202513 min read
0%
[TalkDrill](https://talkdrill.com) crossed 5,000 active users in late 2025 — our in-house English-speaking app for Indian adults. Two years ago this was a side project to test our voice-AI pipeline; this year it became a real product with real retention curves and real billing problems. I am writing this on the 27th of December because the year is too fresh to gloss over the misses. Nine things we got wrong and three engineering bets that actually worked. If you are building an AI mobile app in India, parts of this should save you 6-9 months.
5,000+
Monthly Active Users (Dec 2025)
9
Lessons We Got Wrong
3
Engineering Bets That Worked
82%
Voice cost reduction (start to end of year)
## Lesson 1 — Day-1 retention is a lie if you measure it wrong
We celebrated 71% day-1 retention in February. By April we realised we were measuring "opened the app on day 1" not "completed a session." Real day-1 session-completion was 38%. The difference matters because the people who only opened never came back; the ones who completed a session retained 4x better.
Fix: changed the retention dashboard to require session completion (5+ minutes of actual conversation) as the unit. Real day-7 retention is now ~22%, real day-30 is ~9%. Smaller numbers, but they correlate with revenue. The vanity metric was hurting us because we were optimising for opens.
## Lesson 2 — Voice latency under 600ms is the floor, not the ceiling
Our first voice pipeline ran at ~1,400ms round-trip (audio in, transcript, LLM response, TTS, audio out). Users tolerated it for one session and never came back. We chased latency for 6 months. At 800ms retention improved 1.4x. At 600ms it improved another 1.6x. Below 600ms — diminishing returns.
The breakdown for our final pipeline:
The LLM streaming is what carried the rest of the gain. Once we were sending the first chunk of LLM output to TTS as it streamed (not waiting for the full response), the perceived latency collapsed.
## Lesson 3 — UPI billing is necessary; Google Play billing is not enough
For the first 9 months we ran on Google Play Billing only. International payments handled, India payments… mostly. Conversion was 3.1% trial-to-paid. In October we added a UPI flow via [Razorpay](https://razorpay.com) running in parallel — same paywall, two payment options. UPI conversion was 6.4% — more than 2x Play Store. Total trial-to-paid jumped to 4.7% (some users still chose Play Store for convenience, especially auto-renewing).
The lesson: India is UPI-first. Google Play Billing is convenient for the developer but introduces a 30% take and worse UX for Indian users. The 30% take we now pay only on the subset that prefers Google Play; the rest go through Razorpay at a much lower take rate.
## Lesson 4 — The free tier was too generous (we lost ~₹6 lakh)
Our initial free tier offered 10 minutes of conversation per day. We thought it was a sampling allowance. It turned out to be enough for casual users to never need a paid plan. Voice costs at our scale ran ~₹0.40 / minute of conversation. 5,000 users x ~6 minutes/day average x ₹0.40 = ~₹3.6 lakh / month in voice costs against minimal paid conversion.
In August we cut free tier to 5 minutes / day (still allows trial; not enough for long-term free use). Voice costs dropped ~40% in 60 days. Paid conversion went up 1.3x. The lesson: the free tier exists to convince, not to sustain.
## Lesson 5 — Pronunciation feedback is the killer feature, not the chat
We thought users came for "talk to AI in English." They came for "tell me what I am saying wrong." The pronunciation-scoring agent (built on a fork of Claude Sonnet 4.5 with a custom phoneme-comparison layer) has the highest engagement minute-per-session. The free chat features have the highest abandonment.
In October we re-organised the home screen to lead with the pronunciation drill instead of the open chat. Day-7 retention went up 18%. The free chat is now a secondary tab.
## Lesson 6 — Indian English vs American English is a real product decision
Our initial TTS used an American voice. ~30% of feedback in the first 6 months mentioned "the AI does not sound Indian." We added an Indian English voice option in May using ElevenLabs' multilingual model. 78% of new users select the Indian voice on first launch. Retention on Indian-voice users runs 1.4x the American-voice users.
The miss: we should have launched with Indian voice as default from day 1. Hard to retrofit user preferences when the product is already known for the wrong default.
## Lesson 7 — In-app purchase pricing in INR matters
We launched at $4.99 / month equivalent (~₹419 at the time). Conversion was poor. In June we re-priced to ₹299 / month direct-INR pricing. Conversion went up 2.1x for the same product. The lesson: the dollar-equivalent number is anchoring on a Western SaaS price point that does not match Indian buying power. ₹299 is a one-Netflix-month price — it lands as "small subscription" in the user's mental model.
## Lesson 8 — Push notifications are a tax, not a tool
We sent daily "practice reminder" push notifications for the first 6 months. Click-through rate dropped from 14% to 3% over that period. Worse, our app uninstalls correlated with notification frequency — heavy notifiers churned 2x faster. We cut push frequency by 80% in September. Uninstall rate dropped 40%.
The lesson: notifications are spent capital. Use them for things that genuinely matter to the user — a streak about to break, a friend's invitation, a billing failure — not for "come back and practice today."
## Lesson 9 — App Store rating mistakes compound
Our app store rating dropped from 4.6 to 4.2 over Q2 2025 because of two specific issues: (1) a billing bug that double-charged ~80 users, and (2) a feature regression in the pronunciation scorer that misjudged Indian-accented English as "incorrect." Both got fixed within 2 weeks. The rating took 4 months to recover.
The lesson: rating is a lagging indicator. Reviews from angry users persist; happy users do not write reviews. Aggressively respond to every 1-3 star review with a fix and ask the user to update — about 30% will.
## The 3 engineering bets that worked
EDGE
Edge STT routing
Route Indian users to Mumbai-region STT, US users to N. Virginia. Saved ~120ms median latency. Trivial to implement, big retention impact.
CACHE
Prompt caching on Claude
Cached the system prompt + user profile context. Cost per session dropped 38%. Anthropic's prompt caching feature paid for itself the day we shipped it.
MIX
Hybrid TTS routing
ElevenLabs Flash for short responses (<200 chars), our self-hosted XTTS-v2 for longer. Cost per session dropped 51% on top of caching gains.
## The cost curve over 2025
This is the chart that matters most to me as a founder. Voice cost per session at the start of the year vs the end.
The cost reduction came from three sources: prompt caching (Anthropic feature shipped April), hybrid TTS routing (we built it in June), and Claude Sonnet 4.5 Indic-tier pricing (effective October). Each contributed ~30% of the saving.
## What 2026 looks like
Hit 15,000 monthly active users by mid-year
Migrate Hindi/Hinglish conversation paths to Gemini 3 Pro (better Indic, see our routing post)
Ship corporate plans for IT services companies hiring junior developers (English-prep at scale)
Launch a Mumbai self-host inference cluster for the pronunciation scorer (control voice costs further)
Get the rating back to 4.6+ (currently 4.4)
Stop building features. Ship 2 polish releases per month instead.
## When this story does not transfer
Our learnings are heavily India-specific. The free-tier sizing, UPI billing, Indian English voice, ₹299 pricing — none of these would be the right calls for a US-focused product. If you are building voice AI for the US market, the lessons that do transfer: latency under 600ms matters, prompt caching works, hybrid TTS routing is real, and the killer feature is rarely the one you thought.
## Real example — what TalkDrill changed about how we ship client work
The TalkDrill experience leaks into how we ship for clients on our [mobile development practice](/services/mobile-development) and [AI automation work](/services/ai-automation). Three concrete patterns: we now insist on session-completion as the retention metric in every client analytics setup; we recommend UPI-as-default for any India-focused mobile app; and we ship voice features only after the client agrees to a 600ms p95 latency SLO. None of these were true before TalkDrill.
## Common mistakes to avoid if you are building an AI mobile app in India
Symptom: "Our trial-to-paid conversion is 1.5%." Cause: free tier too generous, or pricing in dollars. Fix: cut free tier; price in INR with India-specific anchoring.
Symptom: "Day-7 retention looks great in our dashboard." Cause: measuring opens, not session completion. Fix: change the metric.
Symptom: "Voice feels slow even though our latency is 800ms." Cause: end-to-end latency vs first-byte. Fix: stream LLM output to TTS as it generates; perceived latency drops by 40-60%.
Symptom: "App store rating is dropping." Cause: a billing or feature regression that hit a few users hard. Fix: respond to every 1-3 star review with a fix and an ask to update.
Symptom: "Push notifications are not driving comebacks." Cause: too many, too generic. Fix: cut frequency by 80%; send only for things that matter to the specific user.
## Our take
Year 2 of TalkDrill was the year the product became real — real users, real revenue, real bugs, real cost curves. We made more mistakes than we admitted in our quarterly updates. The 82% cost reduction is the number I am proudest of because it came from boring engineering work compounded over 11 months, not a single magic optimisation. The retention curve is the number I am least proud of — we are still not where I want us to be, and 2026 is the year to fix it.
If you are building an AI mobile app for the Indian market — voice, language, or otherwise — the patterns above are 80% transferable. If you want to skip the 9 lessons we paid to learn, talk to us. We do 60-min build reviews specifically for early-stage Indian app founders.
## FAQ
### What is the most important lesson from year 2 of TalkDrill?
Voice latency below 600ms is the floor, not the ceiling. Get there or your retention will not work, no matter how good the LLM responses are. We chased it for 6 months and the retention payoff was the largest single product improvement of the year.
### Should an Indian app launch with UPI or Google Play Billing?
Both, in parallel. UPI gives you 2x the conversion rate for India users, lower take rates, and better UX. Google Play Billing is convenient for users who prefer subscription auto-renewal through their Google account. Offer both; let the user choose.
### How much does it cost per session for an AI voice app at TalkDrill scale?
Our December 2025 cost was ₹0.76 per session — down 82% from January 2025 ₹4.20. The reduction came from prompt caching (Anthropic), hybrid TTS routing (mix of ElevenLabs Flash and self-hosted XTTS-v2), and Claude Sonnet 4.5 Indic-tier pricing. At your stage, expect to start higher and improve over 9-12 months of focused work.
### Is Indian English voice really that important?
Yes. 78% of our new users select Indian English voice on first launch when given a choice. Retention is 1.4x higher on Indian-voice users vs American-voice. If you launch with American as default, you anchor users on the wrong choice and lose retention.
### What is the right free tier for an Indian voice AI app?
5 minutes / day in our experience. 10 minutes was too generous (let users sustain a free habit); 2 minutes felt restrictive (user did not get to evaluate the product). 5 minutes is enough for a meaningful trial and not enough for long-term free use.
### Should I optimise for App Store rating or for revenue?
Revenue, but rating is a lagging indicator that affects revenue downstream. A rating drop from 4.6 to 4.2 takes ~4 months to recover. Aggressive response to 1-3 star reviews recovers some of that — about 30% of unhappy reviewers will update their rating if you fix the issue and ask politely.
### Will TalkDrill open up B2B / corporate plans in 2026?
Yes — that is on the 2026 roadmap. The target buyer is Indian IT services companies hiring junior developers; English-prep at scale is a real ROI for them. Expect a corporate plan to launch in Q2 2026.
Building an AI mobile app in India? Get a 60-min build review.
We run a 60-min review with founders building voice or language AI apps for the Indian market. You bring a working build (or a detailed spec); we walk through cost, latency, retention, and pricing decisions. Typical cost: ₹35,000 fixed. Half-day deeper engagement also available.