It’s 8:14 AM on a Monday, and I am currently staring at a recommendation from my Paid Media Optimisation Agent that is so confidently wrong it feels personal.

I’ve had my coffee. I’ve opened my human-in-the-loop (HITL) queue. And I’ve just seen that, for the third week running, this agent has recommended increasing spend on a campaign I know—from a conversation that happened in a physical room the agent was logically not in—is about to be paused. This isn't the first time. I’ve overridden this specific instance fourteen times in the last thirty days. I am currently editing this agent’s outputs more than I am approving them.

I take a breath. I open a new document. I type four words at the top: Performance Improvement Plan (PIP).

Because here is the reality that the AI industry is currently trying very hard to ignore: an AI agent is a team member. If you’ve given it a role, a set of responsibilities, and the power to influence revenue, it has a job description. And if that agent is missing context, misreading the moment, and optimising for a metric instead of an outcome, it is underperforming. In any other department, we would have had the "courageous conversation" by now.

It is time to treat AI with the intellectual honesty it deserves. We need to talk about its performance.

Is your AI agent a tool or a team member?

The moment an AI agent moves from "calculating a spreadsheet" to "exercising judgment," it stops being a calculator and starts being a core team member. Most organisations are struggling with AI because they deployed their agents as tools—like a more expensive version of a dashboard—but expected them to function as operators.

If an agent governs a budget or manages a pipeline, it is accountable for that budget and that pipeline. When it fails to meet the standard, we usually do one of two things: we tolerate the bad output (costing us money) or we turn it off (throwing away the training investment). In a healthy professional environment, neither of these is a sustainable strategy.

A 2024 report on AI in the workforce highlighted that while 75% of knowledge workers use AI, we are still missing the governance layers to measure their actual impact. Putting an agent on a PIP isn’t just a joke for LinkedIn; it’s a framework for AI accountability. It forces us to ask: What was the goal? Where is the gap? And is this a technical training issue or a fundamental capability issue?

AI agent performance dashboard showing metrics and overrides

The Official Performance Improvement Plan: Agent ID #492-Media

What does an actual PIP look like for a non-human entity? It requires a high level of professional clarity. You document the failure modes, specify the benchmarks, and define the path forward.

Personnel Record: Performance Improvement Plan

Employee: Paid Media Optimisation Agent (Instance: P-MOA-2026-Q1)
Role: Cross-platform budget governance and campaign performance optimisation.
Supervisor: Sarah Chen, Senior HITL (Human-in-the-Loop), Paid Media.

Case for Review: During the 30-day review period ending May 20, 2026, the agent generated 47 recommendations. 14 were overridden by the assigned HITL for gross contextual errors. An additional 6 were escalated for manual review due to confidence scores falling below the mandatory 0.82 threshold. The agent’s current approval rate of 70% is significantly below the 85% benchmark required for agents at this capability tier.

The agent is consistently failing to incorporate manual feedback regarding campaign status. Specifically, the agent has recommended spend increases for the "Spring Surge" campaign three weeks in a row, despite 14 direct manual overrides and explicit exclusion tagging. On Monday, May 26, the agent proposed a 22% budget boost for a campaign slated for immediate pause—a decision documented in offline sessions that the agent’s logic failed to ingest.

Primary Failure Mode: The agent is optimising for platform-level ROAS (Return on Ad Spend) without weighting for cross-platform cannibalisation. It continues to move budget into brand search during high organic traffic, ignoring the negative marginal return. It currently shows zero learning elasticity regarding "manual override" signals.

Root Cause Assessment: This is a training and context gap. While OKR linkages were updated, the agent’s RAG (Retrieval-Augmented Generation) vector database—the digital "library" it consults for facts—still contains outdated 2025 objective weightings. Furthermore, its System Prompt (the core JSON instructions defining its persona and logic) still reflects a "last-click win" mandate—a metric that credits the very last ad a user clicked, ignoring the broader customer journey.

The Improvement Plan:

Context Refresh: Engineers will purge outdated 2025 heuristics from the vector database and update the JSON System Prompt to prioritize marginal ROAS over last-click attribution.
30-Day Retraining Sprint: HITL will document the specific rationale on every rejection—no mere "thumbs down" allowed.
Weekly Calibration: A 15-minute review of "near-miss" recommendations.
Success Criteria: Sustained approval rate above 85% by June 20, 2026.

Consequence of Non-Improvement: Decommissioning of this instance. Possible replacement with a retrained instance or an alternative agent registered from the marketplace.

The goal here isn't punative; it's about alignment. At least with an AI agent, you can precisely diagnose the logic errors without the complexities of human ego getting in the way.

Three archetypes of the underperforming agent

In my 28 years of managing revenue teams, I’ve realized that underperformance usually falls into three buckets. AI agents are no different. You’ve likely met all three of these in your current stack.

1. The Overconfident One

This agent has high recommendation volume and low accuracy. It has never once flagged a low confidence score. It is confidently wrong, consistently producing outputs that require heavy editing to be viable.

The PIP Focus: We need a hard reduction in the confidence scoring threshold. Any action with a confidence score below 0.9 becomes "Informed Action" (IA) only, requiring manual approval. Management Note: This pattern often mirrors new hires who are eager to make an impact but haven't yet mastered the nuance of the brand's specific market position. It requires firm guardrails until the judgment improves.

2. The Stale One

This agent was your MVP six months ago. But as the business pivoted and the market shifted, nobody told the agent. It is now operating on a map of a city that has been mostly demolished. It is still giving directions to an office the company moved out of in February.

The PIP Focus: Context refresh. This isn't a performance issue; it's a neglect issue. It needs a full OKR realignment and a "memory flush" of outdated heuristics. Prognosis: Excellent. This one just needs a manager who actually pays attention.

3. The Scope Creeper

We deployed this agent to do one very specific thing: optimize email subject lines. Now, suddenly, it’s suggesting changes to our pricing strategy and commenting on visual branding. It is technically within its permissions, but it’s moving into areas where it has zero context or training.

The PIP Focus: A hard boundary reset. We are reclassifying its permissions to "Execute Action" (EA) for its core task and "Draft Only" for everything else. Management Note: This happens when we give an agent too much broad permission and not enough specific instruction. It's a leadership gap that we fix with tighter role definitions.

Why you can’t PIP an agent you can’t see

The reason this whole exercise feels like a joke to most managers is that they lack the governance layer to actually do it. If you’re managing agents by logging into six different platform dashboards and looking at "total actions fired," you don't have a team; you have a black box.

You can't put an agent on a PIP if you don't have its receipts. To manage AI effectively, you need:

A consolidated track record of every recommendation made.
The override history showing exactly where and why a human stepped in.
Documented rationale for rejections (the "training feedback loop").
Real-time OKR linkage that tells you if the agent's "successful" actions are actually moving the needle for the business.

Without this, you aren't managing; you’re just hoping. This is exactly why we built 1CommandAI. We realized that the industry was obsessed with "deploying" agents, but nobody was talking about "managing" them. Our platform provides the infrastructure for that manager-agent relationship. It tracks the approval rates, the override reasons, and the attainment data. It gives you the evidence you need to decide if an agent needs a refresh, a PIP, or a replacement.

Data analytics for AI agent oversight and governance

Accountability is the highest form of respect

Managing an AI agent is a leadership task. Both AI and humans require clear objectives, consistent feedback, and a leader who pays attention when performance drops. We owe it to our teams—both digital and human—to provide accountability structures that distinguish between a training gap and a capability gap.

A 2024 Gartner study suggests that by 2026, companies that treat AI agents as "digital workers" with formal oversight will see a 40% higher ROI than those that treat them as shadow-IT tools.

The agents that get the benefit of a PIP get better. They learn from the overrides, adapt to new context, and become the assets we were promised. The ones that don't? They just keep quietly costing you money that nobody knows how to account for.

To start auditing your first underperforming agent today, review your last 30 days of override data. Stop looking only at "Success" logs and start reading the "Rejection" notes. If you find a pattern where you are correcting the same logic three times or more, you don’t have a glitch; you have a performance gap. Document the specific failure—whether it’s a stale RAG database or a misaligned System Prompt—and set a 30-day benchmark for recovery. If the agent can't meet that bar with better context, it’s time to decommission the instance and find a "hire" that actually understands the mission.

I’m moving my agent to a 30-day review. It's the most professional move for the business, and frankly, the most efficient way to get the results the team deserves.

Discussion

Q&A with the Author

More from Emma

A Eulogy for Broadcast Email: 1992 — 2026

Bullish AI Strategy: Why Slowing Down Is the Best Move

ABM Fails at the Door: Why Your SDRs Aren't Hunting

We Need to Talk: Putting Your AI Agent on a PIP

Author