Pilots, RAID, Misalignment, and Rollback

AI adoption is often presented as a forward-only rollout: enable tools, train developers, collect success stories, expand usage. That is not how I would run it for engineering work that touches production systems.

Agentic engineering should start with bounded pilots, SMART goals, RAID logs, misalignment signals, and rollback paths.

That sounds bureaucratic only if the alternative is imagined as frictionless. In practice, the alternative is usually invisible risk.

A Reasonable First Pilot

A first pilot should be boring:

scope: 1-2 low-risk repositories
duration: 2-4 weeks
task classes: docs, tests, small service changes, non-production config
excluded: IAM, production Terraform, auth, data migrations, CI guardrails
participants: 2-4 engineers plus relevant reviewers

The pilot should not try to prove that agents can do everything. It should prove where the workflow breaks.

Can issues be specified clearly enough? Can agents find repository context? Are tests reliable? Are pull requests reviewable? Does CI catch scope violations? Do reviewers trust the evidence? Does Jira/GitHub status stay coherent? Do humans remain accountable?

SMART Goals

Examples:

Within two weeks, bring two pilot repositories to Level 2 readiness by adding agent instructions, issue templates, PR templates, documented build/test commands, CODEOWNERS validation, and repo-local invariants.

Within four weeks, complete five GitHub issue to agent-assisted PR workflows where each PR includes issue link, implementation notes, test evidence, risk/rollback section, and no unresolved critical review findings.

During the pilot, collect cycle time, review rework rate, CI failure rate, missing-context defects, and coordination time for every pilot issue.

By pilot end, identify at least five required CI or policy guardrails before allowing any auto-merge or non-production deploy autonomy.

These goals are intentionally modest. They measure the delivery system rather than the model demo.

RAID Logs

RAID means risks, assumptions, issues, and dependencies. It is a useful format because agentic engineering fails through hidden assumptions.

Examples:

Type	Item	Owner	Next action
Risk	Agent introduces architecture drift through local optimization	Architect / tech lead	Require architecture context and review rubric
Risk	Jira and GitHub status diverge	EM / PM	Define minimal sync contract
Risk	Review burden increases	Team	Track rework and narrow task classes
Assumption	Pilot repos have reliable CI	Repo owner	Verify before pilot
Issue	Repo lacks documented test commands	Repo owner	Add instructions
Dependency	Security approval for agent scope	Security	Define forbidden change classes

The RAID log should be attached to the pilot, not kept in a private notebook. It is part of the learning record.

Misalignment Signals

Different stakeholders fail in different ways.

Engineers see large unreviewable diffs, hidden failing tests, drive-by refactors, and PRs without rationale. EMs see status drift, review burden, and unpredictable cycle time. PMs may worry that GitHub is replacing product planning. Architects see local optimization across system boundaries. SREs see rollback and telemetry gaps. Security sees secrets, IAM expansion, and unsafe tool access.

These signals should be named before the pilot begins. Otherwise every stakeholder discovers the failure in their own vocabulary and the conversation becomes political.

Rollback and Pivot

Every phase needs a rollback or pivot option:

Phase 1 rollback: restrict AI to local assistance only.
Phase 1.5 rollback: stop agent-created commits and PRs, but keep structured issues and PR templates.
Phase 2 rollback: remove the agent-readiness badge, but keep documentation improvements.
Phase 3 rollback: disable review agents or keep them advisory only.
Phase 4 rollback: freeze cross-repo actions, but keep the system graph for humans.
Phase 5 rollback: disable auto-merge or deploy permissions, but keep audit and eligibility checks.

Rollback is not failure. It is how the organization proves it still controls the system.

The Pilot Exit Review

At the end of a pilot, ask:

What tasks worked?
What tasks failed?
What context was missing?
What guardrails caught issues?
What escaped review?
How much manual toil changed?
Did review quality improve or degrade?
Should the team advance, pause, or narrow scope?

The answer may be “do less.” That is a legitimate outcome. Safe delegation capacity grows by widening boundaries only when the evidence justifies it.