Case study: AI across the operations stack

The brief: A European omnichannel retailer with about £180m in revenue had a familiar problem: a tidy data warehouse, an unloved CRM, three marketing tools that did not talk to each other, an ERP that nobody outside finance dared touch, and a board that wanted “AI across the business” by the next financial year. They had tried two packaged tools the year before. Neither stuck. They asked us to scope what one team could plausibly deliver in twelve months.

What the readiness audit found

Two weeks of diligence across the data team, the marketing team, the finance close team, the ecommerce team, and the operations team. The shortlist of opportunities scored against revenue, time-to-value, and how much each system was actually willing to be touched. Eight candidate agents made it onto the ranked list. We agreed on the top five for year one.

What we built

CRM lead-triage agent. Read every inbound enquiry, classified intent against the company’s real product taxonomy, routed to the right team, drafted the first-touch reply, and updated the deal stage with reasoning the salesperson could review. Wrote to twenty-two custom fields the previous packaged tool had ignored.
Marketing-calendar agent. Pulled performance data from the analytics warehouse, the ad-platform APIs, and the campaign-management tool. Drafted weekly performance summaries with rank-ordered recommendations on which campaigns to scale, pause, or rework. Pushed approved decisions back into the campaign tool with an audit log.
Ad-creative production agent. Took an approved brief, generated variant copy and image-prompt drafts against brand guidelines, queued the assets for human review, and pushed approved variants straight into the ad platform with proper UTM tagging and budget caps.
Finance-close exception agent. Read the daily reconciliation output from the ERP, flagged exceptions, drafted the explanation the finance team would normally write themselves, and linked supporting evidence from the document warehouse. Did not write to the ERP — every action remained a human-confirmed entry.
ERP order-fulfilment agent. Watched the order pipeline, predicted likely stock-out and SLA-miss events twelve to forty-eight hours ahead, and surfaced them to the operations desk with the recommended action and the supporting context already assembled.

Each agent shared a common spine — the same eval framework, the same observability stack, the same retraining cadence, the same routing layer between fast and slow models. Building five agents in one pattern is roughly twice the work of building one; building five agents in five patterns is closer to five times. The discipline paid back across every later release.

How we worked with their team

A four-person Cravings pod — applied-AI engineer, ML engineer, AI product manager, senior software engineer — embedded on the same standup as the in-house data and engineering teams for nine months. We brought the agent patterns. They brought the institutional knowledge: which CRM fields were load-bearing, which finance accruals were politically sensitive, which marketing channels the CMO actually cared about.

By month six, two of the in-house engineers had taken over primary ownership of two of the agents. By month nine, four of the five. The Cravings pod rolled off at month ten on a thin retainer to support quarterly evaluation reviews.

What changed

Lead-to-first-touch time: 4 hours → 11 minutes on triaged enquiries.
Marketing campaign decisions per week: doubled; the marketing team spent the recovered time on briefs rather than reports.
Ad-creative iteration cycle: 9 days → 36 hours from brief to live variant.
Finance close: 7 working days → 4. Exception-handling time per close dropped 58%.
Stock-out incidents flagged before they happened: 81% of the events that previously surfaced only after the SLA had already been missed.
Year-one total programme cost including Cravings fees, model spend, and tooling: £640k — comfortably below what the company had previously spent on two packaged tools that did not deliver, and inside the board’s year-one envelope.

Why this kind of work is hard for most teams

The five agents above touched five different systems — Salesforce-style CRM, an enterprise marketing suite, two ad platforms, an ERP, and the data warehouse. Each system has its own access model, its own audit requirements, its own “do not touch” fields that nobody documented but everyone knows about. The hardest engineering on this programme was not in the model layer. It was in the integration layer — the bit packaged vendors keep generic on purpose.

This is where Cravings tends to be useful. Most of our team have spent years inside ERPs, CRMs, ad stacks, and finance systems before they ever wrote a prompt. That history is the thing that makes a custom agent ship — not the model choice, not the framework, not the prompt cleverness. The boring system knowledge, in the right hands, is what AI enablement actually looks like.

What the readiness audit found

What we built

How we worked with their team

What changed

Why this kind of work is hard for most teams

We use cookies