I went looking for a clean definition of AI-agentic development on Google last week. What I got back was a wall of LinkedIn posts written by people who, as far as I can tell, have never run an agent past a prompt that didn’t fit on one screen. Most of them used “agentic” as a synonym for “I asked ChatGPT a question.”

That’s a problem, because the term is doing real work in real engagements. When we tell a client we work in an AI-agentic way, that has to mean something specific - otherwise it’s the same kind of mush as “AI-powered” and we should stop using it.

Here is what it means when we use it.

The spectrum

There are four distinct modes of AI-assisted coding, and they get conflated constantly. From least to most autonomy:

1. Chat-assisted. You paste a problem into ChatGPT / Claude / Gemini. It gives you an answer. You copy it back into your editor. The human does all the routing, all the verification, all the integration. The AI is a stack-overflow replacement.

2. Editor-integrated autocomplete. GitHub Copilot, Cursor’s tab-completion, Codeium. The AI sits in your editor and suggests the next 1-40 lines. You accept or reject. The AI never sees beyond the file you’re in. The human is still in every decision.

3. Tool-using assistant. Claude Code, Cursor’s compose mode, Cline. The AI can read files, run grep, run tests, edit code across the project. It’s still inside your one prompt-loop - you ask, it does, you review. But it touches the whole codebase, not one buffer.

4. Agentic. The AI runs a multi-step plan, calls tools to verify its own work, iterates when verification fails, and produces a complete deliverable across multiple agent-turns without you reviewing each step. You review the output, not every move.

The leap from 3 to 4 is the one that matters. Most of the “AI-agentic” content online is conflating 1, 2, and 3.

What “agentic” actually requires

When we say agentic, we mean the system has all of these properties at once:

Autonomy across multiple steps. Not one prompt, one answer. The agent breaks a task into sub-tasks, plans the sequence, executes, and continues when sub-task N produces input for N+1. We are not steering each step.

Tool use that includes verification. The agent doesn’t just write code - it runs it. It writes a test, runs the test, sees the failure, fixes the code, runs the test again. It has access to a shell, a compiler, a test runner, often a browser, and treats those tools’ output as ground truth above its own confidence.

Real work over real time. Sessions are minutes to hours, not seconds. Output is shipped artifacts - a working parser, a deployed app, a passing test suite - not a code snippet you still need to wire in.

Recovery from failure. The agent can encounter an error, diagnose it, and try something else. If the first approach to parsing a binary format breaks at byte 9, it adjusts and continues. If it can’t, it tells you what it tried and what didn’t work - not a generic apology.

A human at the right level of abstraction. This is the part most “agentic” marketing skips. The human is steering at the goal level - here is the file format we need to parse, here are the reference implementations, here is what success looks like - and reviewing the final artifact. The human is not in every diff.

If any of those five properties is missing, what you have is tool-use with extra steps, not agentic development.

What it explicitly is not

It is not:

GitHub Copilot. Copilot is excellent autocomplete. There’s no plan, no verification loop, no autonomy past the next line. Calling that “agentic” empties the word.
“I had Claude write this function.” That’s a one-prompt interaction. If you copy-pasted the output, you did the agent’s job. The function might be excellent, but you’re working in mode 1.
A chat that uses tools. ChatGPT with code interpreter is closer, but if it stops after one tool call to wait for your next prompt, it’s a tool-using chat, not an agent.
A wrapper that calls the LLM 20 times. Multi-prompt orchestration without verification and autonomous recovery is a Rube Goldberg, not an agent.
An LLM with a vector database. That’s RAG. Useful, but orthogonal.

The category that gets the closest without being agentic is autopilot coding - letting Copilot or Cursor compose run unsupervised. The reason it’s not agentic is the verification step. Autopilot will happily write 500 lines of plausible-looking nonsense; an agent runs the tests and notices it’s nonsense.

Why the distinction matters

The reason this isn’t just terminology pedantry: the actual economics are different at each level.

Level 1 (chat-assisted) saves a developer maybe 10-20% on routine implementation. It does not change project economics. A 12-week build is still a 10-week build.

Level 2 (autocomplete) saves another 15-25%. Compounds with 1. Still doesn’t change the order of magnitude.

Level 3 (tool-using assistant) is where it starts to shift. We see 40-60% reductions on well-bounded tasks (“refactor this module,” “write tests for this file”) because the AI can see the whole context and the human is reviewing outputs, not individual lines.

Level 4 (truly agentic) is where order-of-magnitude compression starts to happen. Not because the AI is “smarter” - it’s the same model, the same context window. The difference is that the human-in-the-loop bottleneck is removed for sub-tasks where the human’s review would have been a rubber stamp. Six hours of agent work in a 90-minute review session beats six hours of co-piloted work in six hours of human attention.

This is the entire reason we built CADpeek in hours rather than months. The hardest part of the project - a Rust WebAssembly parser for a binary format with no JS implementation in existence - required hundreds of small decisions, each of which would have cost a human engineer a context-switch. Run the failing test, look at the hex dump, adjust the offset, run the test again, look at the next failure mode, repeat. That is not interesting work for a human. It is excellent work for an agent that can verify its own progress.

What still doesn’t work

Three places where agentic loops fall over today:

Novel architecture. If the agent has to invent a system design from a green field, the loop has nothing to verify against. There’s no test that says “this is a good architecture.” Humans still own architecture.

Cross-functional judgment. “Should this feature ship?” is not a tool-callable question. Product judgment, business tradeoffs, security risk assessment - all require humans.

Long-horizon coherence. Agents lose the thread on tasks that span days and dozens of sub-systems. Context windows are bigger than they used to be, but coherence over a week-long task is not solved. We work around this by structuring tasks to be self-contained within a session, with a written hand-off when one is needed.

If your project is mostly novel architecture and cross-functional judgment, agentic compression won’t help you much. If it’s mostly engineering work that already has a specification - parsers, integrations, custom tooling, migrations, the unglamorous middle of most software projects - agentic compression is the difference between a feasible quarter and an impossible one.

How to tell what someone actually means

When you read “AI-agentic” on a vendor website now, the test is straightforward:

Do they describe a verification loop? If not, they probably mean autocomplete.
Do they give a concrete example with timing? If not, they probably mean chat.
Do they describe the human’s role precisely? “We embed AI into your workflow” is marketing. “Our engineer reviews planning and final artifacts; the agent handles implementation sub-tasks under a tests-must-pass invariant” is engineering.
Do they ship anything? An agent that can’t produce a working artifact you can run is not yet doing useful work.

We use the term because we think it earns its weight when defined precisely. If you’ve read this far, you now have a sharper definition than most of the people selling agentic services in 2026. Use it skeptically - including on us.