May 13, 2026 ·

The off-the-shelf AI agent trap

Packaged AI agents take twenty minutes to install and six months to remove. The five failure modes we see most often, and where custom is the cheaper answer.

Every vendor in your inbox right now is selling you “an AI agent for your business.” Most of them are a thin wrapper around a foundation model, a generic prompt library, and a handful of integrations the vendor’s solutions engineer wired up at a trade-show demo. They take twenty minutes to install. They take six months to remove. We have helped do the removal more times than we care to count.

Off-the-shelf agents are not useless. They are useful for the problem they were designed for — usually a generic problem at a generic company. They become a liability the moment they are dropped into a real operation with real data, real customers, and real consequences. Here is the pattern.

Where the chaos comes from

1. The agent does not know what it does not know. A generic support agent will confidently quote return policies it has hallucinated from training data. A generic sales agent will offer discounts that were never approved. A generic finance agent will categorise a refund as revenue. The off-the-shelf vendor’s marketing site says “tunes itself to your business in days.” In practice it tunes itself enough to pass a demo, not enough to pass an audit.

2. The integration is shallow on purpose. The vendor needs the agent to work at thousands of customers. So the connectors read a generic schema, write to a generic schema, and quietly ignore the seven custom fields that are how your business actually runs. Three months in, your team is doing the work the agent was supposed to do — reformatting outputs, re-coding categories, fixing the records the agent corrupted.

3. There is no eval suite you can trust. The vendor’s accuracy metric is computed against the vendor’s benchmark, not against your tickets, your invoices, your customer language. You do not know when the model upgrade in the next release will silently regress on your hardest cases — because nobody on your side ever wrote down what your hardest cases are.

4. The audit trail is “trust us.” The agent acted on a customer record at 03:17am. Why? What was in the context window? What tools did it call? You will get a screenshot, eventually, after a support ticket. You will not get a replay. For a regulated business this is not a workflow — it is a compliance finding waiting to happen.

5. The exit cost is the lock-in. Every month the agent runs, more of your operational logic ends up in the vendor’s prompt library — undocumented, unportable, and quietly proprietary. Two years in, you are running on rented operations, paying an enterprise tier to access a workflow you would now find hard to describe in writing without the vendor’s help.

What custom changes

A custom agent — the kind Cravings builds — starts from the opposite end. We write down what good looks like before we write any code. We design the integration against your real schema, not a generic one. We trace every model call from the first commit, so a year in you can replay any decision the agent ever made. The runbook the on-call engineer reads at 2am was written by people who knew they would have to read it themselves at 2am.

It costs more up front than a SaaS subscription. It costs less than the cleanup, the compliance findings, and the eventual rip-and-replace. It is also yours — the prompts, the evals, the architecture documentation, the deployment pipeline — under your accounts, in your repos, owned by your team.

When off-the-shelf is fine

There are still good places to use a packaged agent. A meeting-notes summariser that touches no production system. A sales-call transcript tool. A document-comparison helper. Anything where the agent is advisory and the cost of a wrong answer is “an engineer rolled their eyes for a second.” Drop those in, get the value, move on.

Reach for custom the moment the agent is reading from, writing to, or making decisions inside a system of record. That is where Cravings spends most of its time — and where the rip-out work tends to happen for the teams that did not.

Considering a packaged agent for a critical workflow? Book a readiness audit before you sign the order form — two weeks of our diligence will save you several quarters of cleanup.