The brief: A mid-market European insurance company had spent eleven months and just over £400,000 on its first AI hire. A talented Head of AI, recruited from a respected lab, with three concept demos to their name and not one of them in production. The board wanted to know what had gone wrong. The Head of AI wanted to know whether they could be unblocked. We were asked in for a fortnight to give an honest read.

What we found

The Head of AI was not the problem. They were excellent. They were also alone — one senior person doing the work of a team of five. They had been holding the model decisions, the prompt engineering, the data pipeline build-out, the security review responses, the vendor procurement, the product framing, and the change-management conversations with claims operations. Anyone in that role would have stalled. Most would have left already.

  • Three prototypes — claims triage, document extraction, customer-email drafting — each at 70%, none over the line.
  • No eval suite for any of them. Quality reviews happened informally in screenshots.
  • No staging environment. The demos lived on the AI lead’s laptop and a single sandbox tenancy.
  • No documentation. If the AI lead had resigned that month, the company would have inherited three half-built black boxes.

What we did

A two-week readiness audit, then nine weeks of focused build alongside the existing Head of AI. We did not replace them. We surrounded them.

  • Picked one prototype — claims triage — and shipped it. Eval suite, observability, on-call runbook, shadow mode for ten days, then auto-routing for the queues with the highest accuracy.
  • Wrote the architecture documentation that had not previously existed.
  • Set up the staging environment, the deployment pipeline, the secrets management, and the feature flagging that should have existed from week one.
  • Brought a second applied-AI engineer and an AI product manager onto the project as embedded Cravings staff so the Head of AI was no longer the only opinion in the room.

What changed

  • First production AI system live at week eleven — three months after we started, twelve after the Head of AI started.
  • −34% median claim-triage time across the queues the agent handled.
  • One promotion, one expansion — the Head of AI was promoted to VP and given budget to hire two engineers into a now-attractive role (working system, written runbooks, real backlog).
  • £180k total Cravings fee — versus the estimated £500k of runway already spent on the in-house team trying to get to the same place alone.

What we left behind

A working production agent, a documented eval suite, a runbook the on-call team has actually used, the second and third prototypes promoted to the in-house backlog with sized estimates, and a Head of AI who could finally hire. The Cravings pod rolled off at week fifteen and has not been re-engaged since. The next two systems were shipped in-house, which is exactly the outcome we wanted.