Accounting
Turned month-end close into an AI-native service for 600 SMB clients
An outsourced accounting firm rebuilt bookkeeping and month-end close as an agent-run service. Auto-coding 71% → 93%, clients per reviewer 15 → 48, 140 new clients added with zero new bookkeepers.
The brief: An outsourced accounting firm ran month-end close for roughly 600 SMB clients on a team of forty offshore bookkeepers. Margins were thin, the team turned over twice a year, and growth meant hiring — which meant onboarding, which meant errors. The founders did not want a bookkeeping copilot. They wanted the close to mostly do itself, with their people moved up to review and advisory.
What we found in the audit
- The “service” was already a process: bank-feed categorisation, supplier-invoice matching, intercompany reconciliations, accruals, and a working-paper pack a senior signed before filing.
- About 70% of transactions were repetitive and rules-driven — the same vendors, the same coding, month after month. The remaining 30% was where the judgement (and the errors) lived.
- Quality was already defined: a senior reviewer re-checked junior work against a checklist. That checklist was, in effect, an eval rubric nobody had written down.
What we built
- Wrote the eval set first. 12,000 historical transactions across forty clients, re-graded by two senior reviewers, covering categorisation, VAT treatment, and “should this have been flagged to a human.”
- A categorisation and reconciliation agent. Routed pipeline — a cheap model for the repetitive 70%, a larger model gated by confidence for the ambiguous tail, deterministic rules for anything touching VAT or fixed assets. Direct integration into Xero and QuickBooks via their APIs, client-specific chart-of-accounts mappings included.
- A working-paper generator. The agent assembles the month-end pack — reconciliations, supporting schedules, and a plain-English note on every judgement it made and every item it escalated.
- A review console. Reviewers see only the exceptions and the agent’s reasoning, with one-click accept or correct. Every correction feeds the next eval run.
- Shadow then switch. Six weeks running every client in shadow, graded against the human close, before any client was moved over a cohort at a time.
What changed
- Transactions auto-coded within the firm’s accuracy bar: 71% → 93%, with the rest routed to a human.
- Average close time per client down from 5.5 hours of bookkeeper time to 1.2 hours of reviewer time.
- Clients per reviewer rose from 15 to 48 without a drop in the firm’s internal QA score.
- Restated filings in the two quarters after switchover: down 38% versus the same period the prior year.
- The firm took on 140 new clients in the following two quarters without adding bookkeepers — the unit of growth stopped being a headcount.
What we left behind
An eval suite the senior reviewers now own and extend, the agent and its integrations running on the firm’s own accounts, and a back office that scales with volume instead of headcount. The forty bookkeepers were not laid off — twenty-six retrained as reviewers and client advisers on higher-margin work; the rest were redeployed as the client base grew into them. The service is the agent. The people moved up.