Phases Are a Roadmap; Maturity Levels Are Capability States

A maturity model becomes confusing when it mixes two different things:

the sequence of adoption steps an organization takes; and
the capability state the organization can demonstrate.

The first is a roadmap. The second is an assessment.

For AI-assisted engineering, this distinction matters. A team may be “in Phase 3” because it is trying to standardize agent-assisted delivery across several repositories. That does not mean it has achieved Level 3 maturity. It may still lack reliable CI, clear issue templates, useful repository context, or a shared review discipline. The phase says what the team is attempting. The maturity level says what is actually true.

Three Concepts

I separate the model into three layers:

phases = transformation/adoption sequence
levels = capability assessment states
readiness dimensions = evidence that maturity is real

Phases are useful for planning. Levels are useful for audit and expectation setting. Readiness dimensions prevent the maturity label from becoming ceremony.

The danger is that maturity labels become aspirational branding. “We are Level 4” sounds better than “we have scattered repository instructions, weak traceability, and no policy gate for infrastructure changes.” But the second statement is more useful.

The Five Levels

The levels I use are:

Individual AI Assistance: developers use AI locally, but the organization has no reliable agentic engineering substrate.
Repo-Ready Agentic Workflow: selected repositories are prepared for human-supervised agents.
Team-Managed Agentic SDLC: the team has a shared workflow, shared templates, shared review expectations, and metrics.
System-Aware Governed Agents: agents can reason across repositories, services, infrastructure, ownership, and policy boundaries.
Bounded Autonomous Delivery: selected low-risk workflows can run inside verified safety envelopes.

These are not levels of model intelligence. They are levels of engineering system readiness.

The Adoption Phases

The adoption phases look similar, but they are not the same:

Phase 0   no organized AI engineering practice
Phase 1   individual usage
Phase 1.5 GitHub-native, human-supervised agent workflow
Phase 2   repo readiness
Phase 3   team-managed agentic SDLC
Phase 4   system-aware governance
Phase 5   bounded autonomy

Phase 1.5 is the important bridge. It is not a formal maturity level. It is a transition pattern: the human uses GitHub Issues and pull requests as durable context, then supervises an agent through implementation, review, and CI.

That bridge matters because it turns ephemeral chat into auditable engineering artifacts. The issue contains intent and acceptance criteria. The commits show what changed. The pull request captures rationale, validation, risk, and rollback. Review comments become structured feedback. CI becomes evidence.

Without that bridge, the organization jumps from personal assistance to platform claims too early.

What Evidence Looks Like

A Level 2 repository should not be called agent-ready because somebody added a single instruction file. It should be called agent-ready when an agent can:

discover how to build and test the repository;
identify important boundaries and invariants;
implement a small issue without unrelated changes;
produce a reviewable pull request;
include test evidence and rollback notes;
stop when the requested change crosses a forbidden boundary.

A Level 3 team should not be called mature because it has a few good examples. It needs shared issue taxonomy, risk classes, review rubrics, metrics, and a Jira/GitHub sync convention that avoids duplicate truth.

A Level 4 platform needs more than repo-level context. It needs system context: service ownership, dependency maps, deployment topology, dashboards, runbooks, Terraform scopes, AWS tag alignment, and policy gates.

Why the Distinction Prevents False Progress

The most common failure pattern is cherry-picking a later-phase practice without the earlier evidence.

Review agents before repo readiness can help, but they can also produce noisy critique against vague standards. Auto-merge before strong CI is not maturity. Agent Terraform plans without policy-as-code and IAM boundaries are a liability. Cross-repo planning without a system graph is local optimization with better prose.

Some cherry-picking is reasonable. Structured GitHub issues are useful early. AGENTS.md or CLAUDE.md files are useful before the team agrees on a full model. Review-agent rubrics can expose missing standards. But these are experiments, not maturity claims.

A Better Question

Instead of asking “which phase are we in?”, ask:

What can we safely delegate today, and what evidence proves it?

If the evidence is weak, keep the autonomy narrow. If the evidence is strong, increase the delegation boundary one change class at a time.

That is slower than the usual AI adoption story. It is also much closer to how production engineering actually works.