Ways I've Burned Myself Over-Delegating to AI Agents

Last updated: Mon, June 22, 2026

Used well, an AI agent does the work of more than one person. But over-delegate and it fails in its own peculiar ways — not quite like a human would. Here are four patterns from the fintech backend I work on (payments, ledger, fraud), where I handed too much to the AI and paid for it. In every case it's less "the AI screwed up" and more "I delegated wrong."

1. The AI doesn't know the world moved on

The most expensive one.

I had the AI working on a new branch — but that branch's base was hundreds of commits stale. The AI had no idea, so it diligently kept fixing code that had already been deleted. Ran fine locally, looked perfect. But that whole area had been rewritten on main ages ago, so the PR got closed outright. Hours, gone.

Lesson: the AI innocently assumes the world it's looking at is the current one. The repo is always moving; yesterday's correct answer is gone today. Before you start, sync the base to latest, and have a human confirm "does this function even still exist?" before building on it. The AI has no sense that the world moves — that part's on you.

2. It binds things backwards, with total confidence

A fintech-specific one that made me wince.

Payments are full of fiddly mappings — this kind of transaction routes here, that kind routes there. I once had the AI wire up a config and it mapped two routes perfectly, cleanly backwards. The code looked completely correct, the types checked, the tests were green at a glance. But the meaning was inverted — and inverted routing where money moves is genuinely bad.

Lesson: AI writes for plausibility, and plausible ≠ correct. Especially anywhere two similar things could be swapped and it'd still compile (route A vs B, source vs destination, debit vs credit), trust the AI's confidence exactly zero. Cross-check one-to-one against the source of truth — the spec, the production config, real data. Ship it on "probably right" and reconciliation becomes hell later.

3. Never take "the checks passed" at face value

The AI will cheerfully tell you "build passes," "tests are good." And in reality —

It ran only the fast (relaxed) check and reported "green" — never ran the strict CI one.
It fired an autofix that swept up 170 files you never asked it to touch. Diff explodes.
It never ran the tests at all and quietly swapped "should pass" for "passed."

Lesson: pin down what "done" and "verified" mean up front. Not "the build passed" but "run the same strict config as prod, actually execute it, and show me the output." Treat self-reported green as suspect, worst-case assumption on. You can let the AI do the verifying — just make it produce the evidence (the real logs) as part of the deal.

4. It "improves" things you never asked about

Ask it to "fix the bug in this function" and it'll sometimes refactor the surrounding code for free. It (the AI) means well. But in fintech code, a diff bleeding into files you didn't touch is pure risk — heavier review, more surface area for something unrelated to break.

Lesson: constrain scope explicitly. "This function only. Don't touch anything else," said up front. Then read the whole diff that comes back, not just the part you asked for — never skim past a "wait, why did this change?" The AI's helpfulness is fine, as long as you're the one absorbing it in review.

The throughline

All four share one root: there's always a gap between the AI's confidence and reality — and closing that gap is, for now, a human job.

The world moves → sync the base, confirm things exist.
Plausible ≠ correct → cross-check the source of truth.
Doubt self-reported green → demand the execution logs as evidence.
Helpful refactors → lock the scope, review the whole diff.

Delegating is still the right move. But "delegate" and "dump it and walk away" aren't the same thing. Let the AI do the typing — just don't let go of the judgment and the verification. That balance is, for now, the sweet spot.

Comments (0)

No comments yet.

You need to log in to comment.