May 13, 2026 ·

Where AI agents are actually going next

From assistants to operators, single models to routed fleets, point automation to system-of-record participation — the trajectory we see from inside the build.

The interesting question about AI agents this year is not whether they will get smarter — they will. It is what happens to your business when an agent stops being a chat window and becomes a participant in the operations team.

We have spent the last two years building agents into the unglamorous corners of real businesses: the finance close, the marketing calendar, the ad-creative pipeline, the CRM queue, the ERP exception desk. Here is the trajectory we see, from the inside of the work.

From assistant to operator

The first generation of agents drafted things for humans. Email replies, support responses, marketing briefs, code suggestions. The human stayed in the loop on every action. That generation is already commoditised — the tooling exists, the patterns are well-known, the ROI numbers are documented.

The next generation does not draft for a human. It acts, and asks for help only when its own confidence falls below a threshold. That is a different category of system — not a UI assistant but an autonomous operator with a budget, a remit, an audit trail, and a manager who happens to be the human it occasionally escalates to. The hard problems shift accordingly. The agent’s prompt becomes the smallest part of the work. The largest parts are the evals, the policies, the retraining cadence, and the failure-mode register.

From one model to a routed fleet

Single-model agents are over. Production agents we build today are routed pipelines: a small fast model handles the easy 80%, a bigger smarter model is reserved for genuinely ambiguous decisions, and deterministic rules handle anything regulated or high-stakes. Choosing between providers is no longer a strategy question — it is a per-call routing decision based on cost, latency, and the kind of decision being made.

This change is invisible to the user. It is the largest factor in unit economics. The teams who get this right run their AI at 30–60% of the cost of the teams who do not.

From point automation to system-of-record participation

The agent that summarises a meeting is a point tool. The agent that summarises the meeting, files the action items in your CRM, schedules the follow-ups in the calendar of the right account owner, and updates the deal stage based on what was said — that is a participant. The latter is much harder to build and much more valuable. It also requires deep integration into systems that were not built with agents in mind: CRMs, ERPs, ad platforms, finance close software, helpdesks. The wiring is unglamorous. It is also the wiring nobody else is doing well.

From toy demos to evaluator-first design

The teams pulling ahead are the ones who treat the evaluator as a first-class artefact. The eval set is the contract: what good means, in writing, with examples, agreed across stakeholders. The agent is whatever passes that contract. Without it, every conversation about whether the agent is “ready” is a vibes argument. With it, the conversation is a single number against a single threshold.

If your team is still treating evaluation as something you do after the build, you are working a generation behind.

What this means for buyers

The market is splitting cleanly. There is a layer of packaged tools that will keep getting better at the generic problem they solve. There is a deeper layer of custom-built agents wired into the systems and data unique to your business, where the differentiation actually lives. Most of our clients run both — packaged tools where the work is generic, custom builds where the work is yours.

If you have not yet picked which side of that line each of your AI initiatives lives on, that is the conversation worth having this quarter. It does not require a vendor decision. It requires a written brief, an honest read of your operations, and a shortlist ranked by what each one is worth to you.

That is exactly what a Cravings readiness assessment produces. Two weeks. Written deliverable. Yours to keep whether or not we work together.

From assistant to operator

From one model to a routed fleet

From point automation to system-of-record participation

From toy demos to evaluator-first design

What this means for buyers

We use cookies