Agentic coding: what it is and how engineering teams can use it today

15 min read
May 22, 2026

I’ve been running engineering teams long enough to remember when code completion just meant hitting Tab in an IDE.

Then GitHub Copilot arrived, and that changed. Now we’re at another inflection point, and it’s a bigger one.

Agentic coding (or agentic development) is the term people are using to describe AI that doesn’t just complete your sentences, but actually works through a task, end to end, on your behalf.

It plans. It writes. It runs tests. It reads the error output. It tries again. That’s a meaningful change from autocomplete.

75% of developers globally are expected to use AI coding tools by 2028. That number was under 10% in early 2023.

The tools are developing quickly, the terminology is getting muddled, and most of the content out there either oversells the autonomy or skips straight to tooling comparisons without helping you think through what actually changes for your team.

This article is for engineering leaders who’ve used Copilot or Cursor and are now asking: what comes next, and how do we approach it without losing control of our codebase?

Let’s get into it!

Key takeaways:

  • Agentic coding changes what engineers do, not whether you need them. The agent handles execution. Engineers still own the requirements, the architecture, and the review. That’s a different job, not a smaller one.
  • The discipline matters more than the tool. Clear task specs, rigorous review, and a well-maintained AGENTS.md separate teams that capture real value from those that generate rework.
  • Set up governance before you scale. Agents with file-write access and shell execution are a real attack surface. Sandbox execution, require human review before merge, and audit every dependency an agent introduces.

What agentic coding actually means (and what it is not)

Let’s start with a working definition.

Agentic coding is a software development approach where an AI agent is given a goal and takes a sequence of autonomous actions to achieve it: reading files, writing code, running commands, observing results, and iterating, all without a human confirming every step.

The word “agentic” is borrowed from AI research.

How to improve your development teams productivity

Looking for an AI-native development partner? Let’s talk

You’ll be talking with our technology experts.

An agent is a system that perceives its environment, decides what to do, acts, and updates its understanding based on what it observes.

Applied to software development, that means an AI that can open your codebase, understand its structure, write changes across multiple files, run your tests, and adjust based on failures.

That’s qualitatively different from what most teams are using today.

How it differs from AI code completion and Copilot-style tools

Copilot-style tools are reactive. You type, they suggest. The model sees a small window of context, predicts what comes next, and offers it as a completion.

Useful, but fundamentally passive.

88% of Copilot users report feeling more productive, and users complete tasks 55% faster on controlled tasks. Those are real gains.

Copilot-style tools vs Agentic coding

But there’s a ceiling.

The model doesn’t know your test suite failed. It doesn’t know that the function it suggested conflicts with something three files away. It can’t run anything. It waits for you.

Agentic coding removes that ceiling.

You hand the agent a task. It reads the relevant code, writes a plan, executes changes, runs the tests, reads the failure output, and tries to fix it, without you sitting in the loop for each micro-decision.

Why “vibe coding” is not the same thing

You’ve probably heard “vibe coding” too. Andrej Karpathy coined the phrase: you prompt, you watch, you keep going while the output feels right, momentum over inspection.

Vibe coding isn’t engineering.

We’re not vibe coders, and we don’t want to be. We’re engineers who use AI agents. We know exactly what we’re doing and why.
Mario Zderic
Co-founder and CTO at DECODE

It’s productive exploration, maybe fine for a prototype or a personal project, but not for production code that other engineers will maintain and users will depend on.

Agentic coding, done properly, keeps engineering judgment in the driver’s seat.

You define the goal precisely. You review the output rigorously. You own the architecture. The agent handles the tedious work.

The distinction isn’t the tooling, it’s the discipline you bring to it.

How agentic coding works in practice

Next, I’ll show you show agentic coding actually works in practice.

The agent loop: plan, act, observe, iterate

The architecture behind most agentic coding tools is a pattern called ReAct (Reason + Act), introduced in a 2022 paper by Yao et al.

The agent receives a goal, reasons about what to do, takes an action (read a file, write code, run a command), observes what happened, and reasons again.

In practice, a typical agent loop looks like this:

  • 1. Receive the task — a user-provided goal, with relevant context
  • 2. Plan — break down the goal into steps, identify the files and systems involved
  • 3. Act — execute a step: write code, edit a file, run a test, call a tool
  • 4. Observe — read the output: did the test pass, what error appeared, what changed
  • 5. Iterate — update the plan based on observations, move to the next step or retry

This cycle runs autonomously until the agent either completes the task or reaches a point where it needs human input.

The loop is visible, not hidden. Good agentic tools show each step, so you can see exactly what the agent did and why.

A METR study from 2025 measured what it calls the “50% time horizon” for frontier AI models: the task length at which a model succeeds roughly half the time.

For o3, that horizon is around 110 minutes of human work. A year earlier, it was well under 30 minutes. The pace of improvement is real.

Where the human still sits in the workflow

A lot of the anxiety around agentic coding is misplaced.

The question isn’t “will it replace engineers?” It’s “where does my judgment matter most, and how do I structure the workflow to preserve that?”

Here’s where human engineers remain essential:

  • Defining requirements clearly. The quality of the agent’s output depends heavily on the quality of the input. Vague tasks produce vague results. Precise specs produce precise code.
  • Reviewing outputs. You’re not reviewing every keystroke anymore, but you are reviewing every meaningful change before it lands. That review needs to be sharper, not looser.
  • Making architectural decisions. An agent can implement a solution. It shouldn’t be choosing the architecture unless you’ve constrained its options deliberately.
  • Catching what the agent can’t see. Security edge cases, implicit business rules, the thing that isn’t written down anywhere but everyone on your team knows.

Your time shifts toward thinking and reviewing, away from writing code. That’s a good trade if you use that time well.

The AI tooling landscape right now

The market is moving quickly. Here’s an honest snapshot as of mid-2026.

Autonomous agents: Claude Code and OpenAI Codex

Claude Code has accumulated over 101,000 GitHub stars and 15,500 forks since its general availability release, making it one of the most widely adopted AI coding tools in 2026.

It’s terminal-based, which matters: it runs in your environment, reads your codebase, writes and edits files, runs shell commands, and interacts with git. You see every action. You confirm or reject.

In early 2026, Claude Code expanded from a terminal assistant into a broader development platform.

Claude Opus 4.6 and Sonnet 4.6, released in February 2026, deliver extended thinking support that lets the model work through complex problems systematically.

Q1 2026 also shipped Dispatch: a task queue that lets you trigger Claude Code programmatically via API, treating it as a background worker rather than an interactive assistant.

For engineers on hard problems, this remains the high-end option for auditable, supervised autonomy.

OpenAI Codex has expanded well beyond a CLI tool. By March 2026, it had grown to more than 2 million weekly active users, and OpenAI was positioning it as a broader enterprise agent platform.

Goal mode, which lets Codex drive toward a specific objective for hours or even days, is no longer experimental and is available across the app, IDE extension, and CLI.

In March 2026, OpenAI announced plans to merge the ChatGPT desktop app, Codex, and Atlas into a single unified desktop application.

Agent-augmented IDEs: Cursor, and Google Antigravity

Cursor 3 released in April 2026 — a redesigned interface that shifts the primary model from file editing to managing parallel coding agents, with local-to-cloud agent handoff, multi-repo parallel execution, and a plugin marketplace.

Cursor 3.3, released May 7, 2026, introduced a new PR review experience and faster plan execution through parallel agents running simultaneously.

It’s still where a lot of engineering teams start, and it’s no longer simply a familiar IDE, but a full agent orchestration layer.

Google Antigravity, launched in public preview in late 2025 and still free as of May 2026, takes an agent-first approach with a Manager view that orchestrates multiple agents working in parallel across workspaces.

It’s the most notable new entrant in the space, particularly for teams already deep in the Google and Gemini ecosystem. Worth piloting, though it’s early.

The difference between IDE-based tools and terminal-based agents still comes down to control surface and integration depth.

Cursor is better for developers who want to stay in an editor. Claude Code is better when you need deep codebase reasoning and want explicit, auditable steps.

How AGENTS.md and structured instruction files fit in

OpenAI introduced the AGENTS.md specification in August 2025.

By December 2025 it had been adopted across tens of thousands of open source projects and tools including Claude Code, Codex, and Cursor, and it’s now governed by the Linux Foundation’s Agentic AI Foundation.

The concept is simple: a Markdown file at the root of your repo that tells the agent how your project works.

Think build commands, testing requirements, code conventions, and hard boundaries (files it shouldn’t touch, dependencies that need approval, patterns to avoid).

It’s like an onboarding document for your AI agents. Without it, every new agent session starts cold.

With a well-written AGENTS.md, the agent knows your environment from the first prompt.

For engineering teams, AGENTS.md is one of the highest-leverage things you can invest an hour in. Write it once, benefit from it on every agent task.

What changes for your engineering team when you adopt agentic coding

This is the question most guides skip. The tooling is the easy part. Adapting your organization is much harder.

Skill shifts: prompt engineering, spec quality, and code review discipline

Your developers need to get better at three things:

Writing precise task specs. An agentic tool is only as good as the task you give it. “Add authentication” will produce something.

“Add JWT-based authentication to the user service using our existing middleware pattern, with token refresh handled in the auth module, and tests covering the happy path and token expiry” will produce something much better.

Spec quality is now a core engineering skill.

Prompt engineering for agents. This is different from Copilot prompting. You’re not completing a thought, you’re delegating a task.

The best prompts include: the goal, the constraints, the approach you expect, what success looks like, and what not to do.

Sharper code review. Here’s the risk nobody talks about loudly enough. GitClear found that AI-assisted code churn (code added then reverted within two weeks) rose significantly between 2020 and 2024.

That’s code that looked fine at first, got merged, and had to be undone. As agents write more code faster, review discipline needs to go up, not down.

Your reviewers need to understand what the agent was trying to do and whether it actually did it correctly, not just whether the syntax is valid.

Governance and security at agent scale

Agents with file-write access and shell execution are a real attack surface. That’s not a reason to avoid them, but it is a reason to structure your adoption carefully.

Here’s a couple practical controls to put in place:

  • Sandbox agent execution. Agents should not run against production environments. Keep them in isolated dev environments with explicit, limited permissions.
  • Require human review before merge. No agent-written code should land in your main branch without a developer approving it. The agent writes the PR, a human approves it.
  • Audit dependencies carefully. AI-generated code has a documented hallucination problem with package names. Require supply chain review for any new dependency an agent introduces.
  • Version-control your AGENTS.md. This file governs agent behavior across your team. Treat it like source code, not documentation.

The OWASP Top 10 for Agentic Applications published in December 2025 is the most practical reference I’ve found for the specific threat surface.

Read it before you go wide with agent deployments.

How team roles evolve (and which ones do not disappear)

I want to be direct here: agentic coding doesn’t make developers redundant. It changes what developers do.

The engineers who adapt well are those who:

  • Can break down complex problems into well-specified subtasks.
  • Review agent output with the same rigor they’d apply to a junior engineer’s PR.
  • Understand system architecture well enough to catch when an agent’s solution is locally correct but globally wrong.
  • Can write and maintain AGENTS.md-style instruction files for their domains.

The roles under the most pressure are narrow, well-defined, repetitive tasks: boilerplate generation, simple CRUD additions, writing tests for specified behavior.

Agents can do those competently today. But complex architecture, cross-cutting concerns, security reasoning, and stakeholder-facing judgment remain human work.

McKinsey estimated software engineering could see 20-45% productivity gains from generative AI. The teams that capture those gains will be the ones who invest in the organizational adaptation, not just the tooling.

A practical adoption path for engineering teams

Here’s what I’d actually do if I were starting this from scratch today.

Start with one tool, not five. Pick either Claude Code or Cursor based on your team’s workflow.

Don’t evaluate six tools simultaneously. Get genuinely good with one before expanding.

Pick a pilot cohort. Choose three to five developers who are curious, not skeptical, and who already use Copilot or a chat-based AI tool.

Run a four-week experiment on real work, not demos.

Define your first agentic task type. Start with something bounded: writing unit tests for existing code, generating boilerplate for a new service, or migrating a module to a new API.

Make sure it has a concrete scope with clear success criteria.

Write your AGENTS.md before the pilot starts. Write down your build commands, test commands, code conventions, and any hard restrictions.

This forces you to articulate your standards explicitly, which is valuable regardless of AI.

Set review expectations upfront. Every piece of agent-generated code gets a full review before it merges.

Brief the team on what to look for: correctness, adherence to your patterns, introduced dependencies, test coverage.

Run a retrospective after four weeks. What worked, what didn’t, where did the agent save real time, where did it create rework?

Use that data to decide whether to expand.

One honest caveat: most of the productivity data that exists, including 62% of developers using or planning to use AI tools in 2024, was collected for autocomplete and chat-style tools, not agentic workflows.

Rigorous peer-reviewed studies on agentic coding productivity specifically don’t yet exist in large numbers.

That means your own team’s retrospective data is genuinely valuable. Make sure to collect it.

What to watch: where agentic coding is heading

SWE-bench Verified scores are the closest thing we have to a standardized benchmark for agentic coding ability: real GitHub issues, measured by whether the agent’s fix actually passes the original test suite.

The leaderboard leader (Claude Mythos preview) sits at 93.9% as of May 2026. A year and a half ago those numbers were below 20%. The pace of improvement is real.

One caveat worth noting, though: OpenAI stopped reporting SWE-bench Verified scores in early 2026 after confirming that frontier models had been trained on the test data.

Claude Opus 4.5 scores 80.9% on Verified and 45.9% on SWE-bench Pro, which uses contamination-resistant tasks. Same model, same type of task, 35-point drop.

Treat the numbers as a directional signal, not a precise measure.

Multi-agent workflows are the next frontier.

Instead of one agent working through a task sequentially, a coordinator agent decomposes work and dispatches specialist agents in parallel: one for writing code, one for writing tests, one for security review.

Some teams are already running this pattern – including our teams here at DECODE.

As teams move in this direction, the cognitive load of oversight increases with the number of agents. That’s the part most discussions skip over.

The 2025 DORA report found that AI acts as a multiplier of what’s already there — it strengthens high-performing teams and exposes weaknesses in teams with weak processes.

AI adoption now correlates positively with throughput, a reversal from 2024. It still correlates with higher instability. Faster output is only part of the picture.

The teams investing in this now, with clear processes and honest retrospectives, will be much better positioned than those waiting for the tooling to mature.

Agentic coding: FAQs

Not your codebase, but your processes.

The biggest changes are in how you write task specs, how you structure code review, and how you govern what agents can access.

Adding an AGENTS.md file is the most useful structural change to the repo itself.

Yes, with appropriate controls.

Agents should run in sandboxed environments, with no direct access to production systems.

Every agent-generated change should go through your standard review process before merging. The risk isn’t the tool itself, it’s skipping reviews and best engineering practices.

That number is about right.

Treating AI output as a first draft that needs review is just good engineering practice. And it’s how you make the most of agentic coding tools.

Run your own retrospectives. Your data will tell you more than any vendor benchmark.

Looking for a team that’s already thought this through?

If you’re at the point where you’re evaluating how agentic workflows fit into your development process, you’ve probably already run the initial experiments.

You’re now asking the harder questions: How do we scale this without losing quality? What does code review look like now? How do we govern this across a team of twenty engineers?

Those are the right questions, and they’re the ones most guides don’t answer.

At DECODE, we work with engineering leaders running exactly these decisions.

Our teams work on one project at a time, which means when we’re thinking about how agentic tooling fits your specific codebase and workflow, that’s where our focus is, not spread across five parallel engagements.

We don’t have a pitch deck on “AI-powered development.” We have practical experience integrating agentic tools into real product engineering work, with the governance and review discipline that production code demands.

If that’s what you’re working through, you’re in the right place.

Categories
Written by

Mario Zderic

Chief Technology Officer

Mario makes every project run smoothly. A firm believer that people are DECODE’s most vital resource, he naturally grew into his former role as People Operations Manager. Now, his encyclopaedic knowledge of every DECODEr’s role, and his expertise in all things tech, enables him to guide DECODE's technical vision as CTO to make sure we're always ahead of the curve. Part engineer, and seemingly part therapist, Mario is always calm under pressure, which helps to maintain the office’s stress-free vibe. In fact, sitting and thinking is his main hobby. What’s more Zen than that?

Related articles