auto-invoice. The workflow downloads the PDF attachment, ships the base64-encoded pages to Claude Sonnet 4.5 with a strict-schema vision prompt, parses the JSON, normalises the GST splits (CGST/SGST for intra-state, IGST for inter-state based on supplier GSTIN's first 2 digits), then emits a Tally XML voucher file uploaded to Google Drive and pinged to the accounts team for one-click import.
## Why this matters now — April 2026
Two changes in 2025-26 made this finally clean. Claude Sonnet 4.5 (released Sep 2025) handles multi-page PDF vision in a single API call — older models needed page-by-page processing with stitching logic. And TallyPrime 6.1 (Feb 2026) shipped an updated XML import schema with cleaner GST tagging — the same XML that used to fail with "unknown GST head" now imports clean.
A [Reddit thread on r/Indian_Accounting from March 2026](https://www.reddit.com/r/IndianAccounting/) — "anyone automating Tally entries from PDF?" — has 80+ comments on this exact pain. Most CAs are still doing it manually because the available tools (Tally Connector, ClearTax SmartImport) cost ₹25k–₹50k per seat. The n8n approach is one-time setup, ₹0/month seat cost, and runs on infra you already pay for.
## The workflow at a glance
- n8n v1.121+ with Gmail OAuth2 credential configured
- A Gmail account where vendor invoices land. Create a Gmail label called auto-invoice and a filter that auto-applies it to mails from your known vendors
- Anthropic API key with Claude Sonnet 4.5 access (Haiku is not accurate enough for line-item extraction)
- Google Drive access from the same Google credential, with a folder for generated Tally XMLs
- TallyPrime 6.1+ on a Windows machine, ODBC or import-folder configured
- A clean list of your existing Tally ledger names (used in the prompt to map vendor names to ledgers)
auto-invoice label and to include attachments.
{
"parameters": {
"pollTimes": { "item": [ { "mode": "everyMinute", "minutesInterval": 5 } ] },
"filters": {
"labelIds": ["Label_auto-invoice-id-from-your-account"],
"q": "has:attachment filename:pdf is:unread"
},
"downloadAttachments": true,
"options": {
"attachmentsPrefix": "attachment_",
"format": "full"
}
},
"id": "gmail-trigger-001",
"name": "Gmail Invoice Label",
"type": "n8n-nodes-base.gmailTrigger",
"typeVersion": 1.2,
"position": [240, 300]
}attachment_0, attachment_1, etc.). A Filter node downstream keeps only items with PDF binary, dropping spurious images/signatures.
After processing, the workflow marks the email read and adds a processed-invoice label (Gmail node, modifyLabels operation) so the trigger never reprocesses the same mail.
## Step 2 — Claude Sonnet 4.5 vision extraction
This is the core node. We send the PDF bytes (base64) along with a strict-schema prompt. The 2026-vintage Anthropic API accepts PDF directly as a document content block — no page-splitting needed.
The HTTP Request node body:
{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 3000,
"system": "You extract invoice line items from PDFs. Return strict JSON only — no prose, no markdown. If a field is missing, return null. If the document is not an invoice, return { 'error': 'not_an_invoice' }.",
"messages": [
{
"role": "user",
"content": [
{ "type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": "{{ $binary.attachment_0.data }}" } },
{ "type": "text", "text": "Extract this invoice. Schema:\n{\n vendor_name: string,\n vendor_gstin: string (15 char Indian GSTIN, or null),\n invoice_number: string,\n invoice_date: string (DD-MM-YYYY),\n place_of_supply: string (Indian state name),\n line_items: [{ description: string, hsn_sac: string|null, quantity: number, unit: string, rate: number, taxable_amount: number, gst_rate: number (0,5,12,18,28) }],\n tax_summary: { cgst: number, sgst: number, igst: number, cess: number },\n total_amount: number,\n amount_in_words: string,\n payment_terms: string|null,\n notes: string|null\n}\nKnown vendor → ledger mapping for our books:\n- Hetzner Online GmbH → 'Hetzner - Cloud Hosting'\n- Amazon Web Services India → 'AWS India'\n- Reliance Communications → 'Reliance - Internet'\n- Zoho Corporation → 'Zoho Mail Subscription'\nIf vendor not in this list, suggest a clean ledger name matching the pattern 'VendorName - Service'." }
]
}
]
}const OUR_STATE_CODE = '27'; // Maharashtra
const STATE_BY_CODE = {
'01':'Jammu & Kashmir','02':'Himachal Pradesh','03':'Punjab','04':'Chandigarh',
'05':'Uttarakhand','06':'Haryana','07':'Delhi','08':'Rajasthan','09':'Uttar Pradesh',
'10':'Bihar','11':'Sikkim','12':'Arunachal Pradesh','13':'Nagaland','14':'Manipur',
'15':'Mizoram','16':'Tripura','17':'Meghalaya','18':'Assam','19':'West Bengal',
'20':'Jharkhand','21':'Odisha','22':'Chhattisgarh','23':'Madhya Pradesh','24':'Gujarat',
'25':'Daman & Diu','26':'Dadra & Nagar Haveli','27':'Maharashtra','28':'Andhra Pradesh (old)',
'29':'Karnataka','30':'Goa','31':'Lakshadweep','32':'Kerala','33':'Tamil Nadu',
'34':'Puducherry','35':'Andaman & Nicobar','36':'Telangana','37':'Andhra Pradesh',
'97':'Other Territory','99':'Centre Jurisdiction'
};
const inv = $json;
const vendorStateCode = (inv.vendor_gstin || '').slice(0, 2);
const isIntraState = vendorStateCode === OUR_STATE_CODE;
// Normalise tax_summary based on supplier state
const ts = inv.tax_summary || { cgst: 0, sgst: 0, igst: 0, cess: 0 };
let cgst = 0, sgst = 0, igst = 0;
if (isIntraState) {
// Should be CGST + SGST. If Claude put it as IGST by mistake, redistribute.
if (ts.igst > 0 && ts.cgst === 0) {
cgst = ts.igst / 2;
sgst = ts.igst / 2;
} else {
cgst = ts.cgst;
sgst = ts.sgst;
}
} else {
// Inter-state — should be IGST. Consolidate if Claude split it.
igst = ts.igst > 0 ? ts.igst : (ts.cgst + ts.sgst);
}
// Sanity check
const taxableSum = (inv.line_items || []).reduce((s, li) => s + (li.taxable_amount || 0), 0);
const calculatedTotal = taxableSum + cgst + sgst + igst + (ts.cess || 0);
const totalDiff = Math.abs(calculatedTotal - (inv.total_amount || 0));
const requiresReview = totalDiff > 1; // ₹1 rounding tolerance
return {
json: {
...inv,
normalized: { cgst, sgst, igst, cess: ts.cess || 0, total: calculatedTotal, requires_review: requiresReview },
supplier_state: STATE_BY_CODE[vendorStateCode] || 'Unknown',
transaction_type: isIntraState ? 'intra-state' : 'inter-state'
}
};requires_review flag is the gate before we generate XML. If true, the workflow routes to Slack instead of Drive, and the finance team eyeballs the invoice. About 5% of invoices flag — usually because of round-off issues on the vendor side, not our extractor.
## Step 4 — Tally XML emission
TallyPrime 6.1 imports vouchers via XML in a specific envelope. The minimum schema for a purchase voucher:
Import Data
All Masters
Softechinfra Pvt Ltd
20260410
Auto-imported from n8n — INV-2026-0418
Purchase
INV-2026-0418
Hetzner - Cloud Hosting
Hetzner - Cloud Hosting
No
16980.00
Cloud Hosting Expense
Yes
-14390.00
Input IGST @ 18%
Yes
-2590.20
ISDEEMEDPOSITIVE controls debit/credit. The party ledger is credit (No → it is a credit entry), the expense and tax ledgers are debits (Yes). Amounts are positive on credit side, negative on debit side. This is Tally's convention, not standard double-entry vocabulary.
- ALLLEDGERENTRIES.LIST must balance: sum of all amounts = 0. Our Code node validates this before emitting.
- PARTYLEDGERNAME must exist in your Tally company already. If a vendor is new, the import fails. Hence the vendor-mapping section in the Claude prompt: we want Claude to use ledger names that exist, not invent new ones.
- For intra-state, use two tax entries: Input CGST @ 9% and Input SGST @ 9%. Tally will figure out the GST math if your ledgers are tagged with the right tax classification at master level.
A Function node generates this XML by interpolating values:
const inv = $json;
const n = inv.normalized;
const dateTally = inv.invoice_date.split('-').reverse().join(''); // DD-MM-YYYY → YYYYMMDD
const taxLines = [];
if (n.cgst > 0) taxLines.push({ name: 'Input CGST @ ' + (inv.line_items[0].gst_rate/2) + '%', amount: -n.cgst });
if (n.sgst > 0) taxLines.push({ name: 'Input SGST @ ' + (inv.line_items[0].gst_rate/2) + '%', amount: -n.sgst });
if (n.igst > 0) taxLines.push({ name: 'Input IGST @ ' + inv.line_items[0].gst_rate + '%', amount: -n.igst });
const expenseTotal = inv.line_items.reduce((s, li) => s + li.taxable_amount, 0);
const xml =
Import Data
All Masters
Softechinfra Pvt Ltd
${dateTally}
Auto-imported from n8n — ${inv.invoice_number}
Purchase
${inv.invoice_number}
${inv.suggested_ledger || inv.vendor_name}
${inv.suggested_ledger || inv.vendor_name}
No
${inv.total_amount.toFixed(2)}
${inv.expense_ledger || 'Indirect Expenses'}
Yes
-${expenseTotal.toFixed(2)}
${taxLines.map(tl => ${tl.name} Yes ${tl.amount.toFixed(2)} ).join('')}
;
return { json: { ...inv, tally_xml: xml }, binary: { data: { data: Buffer.from(xml).toString('base64'), mimeType: 'application/xml', fileName: ${inv.invoice_number}.xml } } };/Tally Imports/2026-04/ folder, and a Slack node drops a link into #accounts-imports: "{{ vendor_name }} invoice {{ invoice_number }} for ₹{{ total_amount }} ready to import. [Open]({{ drive_url }})".
## Cost comparison
For 80 invoices/month:
## When NOT to do this
Handwritten invoices. Claude Sonnet 4.5 vision is excellent on printed PDFs, but handwritten kirana-shop invoices are 60% accuracy max. For those, use a dedicated OCR service (Microsoft Form Recognizer, ~₹3/page) before Claude.
Invoices with critical regulatory data your team must verify. Some industries (pharma, defence) require human sign-off on every entry. The workflow still saves them time on data entry but you cannot auto-import — change the destination from Drive-folder to a Slack approval flow.
You do not use Tally. If you are on Zoho Books, Marg, or BUSY, the XML schema differs. Zoho Books takes the JSON directly via their API (cleaner integration, skip the XML entirely). The first three blocks of this workflow stay the same; only step 4 changes.
EXECUTIONS_DATA_PRUNE=true and EXECUTIONS_DATA_MAX_AGE=168.
Mistake 5 — Not validating amount-in-words. Claude reads "Rupees Seventy-Five Thousand Only" and the numeric reads ₹7,50,000. Always validate against the numeric. About 1% of vendor invoices have this mismatch — usually a typo on the vendor's side, but catching it saves you arguing about it in March.
## FAQ
### How accurate is Claude Sonnet 4.5 on Indian GST invoices?
Across 240 invoices in 4 months we have 0 line-item errors that made it to Tally. Pre-review, Sonnet flags about 5% as needing human eyes — mostly because of split tax codes, photocopied scans, or vendors who innovate on layout. The 95% direct-pass rate is the bar we measure against.
### What if the vendor sends an image instead of a PDF?
The workflow has a branch: if the attachment is JPG/PNG, it runs through the same Claude vision call but as an image content block. Accuracy on a clean photo is similar to PDF; a skewed phone photo drops to ~80%.
### Can it handle multi-currency invoices?
Yes. Add currency and exchange_rate to the schema. Claude reads the currency symbol; n8n looks up the daily INR rate from xe.com or an FX API. Tally records in INR with a foreign-currency note in the narration.
### How do I handle reverse-charge GST?
Mark the vendor ledger with Reverse Charge classification in Tally master. The XML import respects it; you do not need to change the workflow. But the Claude prompt should detect the "Reverse Charge Applicable" note common on unregistered vendor invoices, and flag it in the narration.
### What is the recovery if the Tally import fails?
The XML upload to Drive does not auto-import — your accountant clicks Import in Tally. If Tally rejects (usually a ledger-name mismatch), the file stays in Drive untouched. The Slack message includes the error pattern to look for. We have had 3 failures in 240 invoices, all ledger-name typos in the Claude output.
### Can I trigger from a folder watch instead of Gmail?
Yes. Replace the Gmail trigger with a Google Drive Trigger (watching the "Inbox" folder). The rest of the workflow is identical. We use this for clients whose vendors send invoices to a shared Drive folder instead of email.
### Does Anthropic store our invoices?
Per Anthropic's data-use policy as of April 2026, API calls are not used to train models and are retained for 30 days for abuse monitoring only (Tier 2 paid accounts can request zero retention). If your CFO needs zero retention, request it via your Anthropic account manager.
Want this PDF-to-accounting pipeline for your finance team?
We ship the full 12-node workflow above, wired to your Gmail, your existing Tally company ledgers, and your accounts team's Slack. Includes the GSTIN edge-case mapping for your top 50 vendors. Typical cost: ₹48,000–₹85,000 depending on invoice volume and how messy your vendor PDFs are. Suitable if you process 50+ invoices a month and your senior accountant is doing data entry instead of reconciliation. No slides — send us 5 of your real invoices and we will demo on them.
Book a 20-min Call
