You open the AI coaching app again. You type your leadership challenge, receive thoughtful prompts, reflect on your answers. It feels productive. It feels like progress.
But when someone asks if it's actually helping, you pause. Is it? You've been paying for this for months. You engage regularly. You feel... something. But you couldn't point to a single concrete change in how you lead.
The investment continues. The engagement continues. And the nagging question continues: is this genuinely developing you, or just providing the illusion of progress through regular activity?
BEHIND THE CURTAIN
What most people don't see when they use coaching-AI or human-is the evaluation mechanism that should be running silently in the background. Think about grilling competitions. When judges evaluate your brisket, they're not relying on vague impressions. Behind their scoring is a specific measurement system: tenderness metrics, moisture retention levels, bark formation standards. The competition doesn't work without this hidden scorecard.
Effective coaching operates the same way. Research analyzing nearly 20 years of coaching studies-39 separate trials with over 2,500 participants-found that workplace coaching produces a statistically significant effect with a standard effect size of .59. That's moderate impact, not transformative magic. But here's what makes that number meaningful: it only shows up when there's a measurement mechanism actually capturing the change.
The measurement system includes baseline data (where you started), defined observable outcomes (what specific behaviors will change), consistent tracking intervals (weekly check-ins, not vague "when I remember"), and attribution methods (how you distinguish coaching effects from everything else happening in your life).
This mechanism doesn't appear automatically. It has to be deliberately built before coaching begins. It's the invisible infrastructure that separates genuine development from expensive journaling.
THE WRENCH IN THE WORKS
Here's where the system breaks down for most people: they start coaching without ever installing the measurement mechanism.
A 2024 study examining goal-setting in practice found something startling. Less than 25% of people identified how goal achievement would be measured when they began. And 93.5% lacked any details regarding monitoring progress. Not "they tracked poorly"-they had no tracking plan whatsoever.
You wanted to become "a better leader-more decisive, better at delegating." You jumped straight into AI coaching sessions. But there was no measurement mechanism running in the background. No baseline count of how many decisions you made without second-guessing yourself. No record of delegation attempts. No team feedback about whether they felt trusted with responsibilities.
Without this mechanism, your brain fills the evaluation gap with feelings. The coaching feels productive because you're engaging regularly. You feel like you're developing because you're thinking about leadership more often. Your confirmation bias highlights the one time you delegated successfully while overlooking the three times you took the task back.
The coaching might be working. Or it might not be. But without the measurement mechanism, you literally cannot tell the difference between progress and the illusion of progress. The evaluation system never got installed, so there's nothing generating reliable data about actual change.
WHAT NO ONE TOLD YOU
Here's the piece almost no one mentions about measuring coaching effectiveness: there's a direct correlation between how quantifiable your goal is and whether you'll actually monitor progress toward it.
Research on goal monitoring found that the more participants thought about their goal in quantifiable terms, the more likely they were to track progress-and the easier they found monitoring to be. It's not just that measurable goals are better. It's that quantifiability itself predicts whether the measurement mechanism will function at all.
"Become a better leader" doesn't trigger monitoring behavior. Your brain doesn't know what to count. But "make decisions without reversing them"-that's quantifiable. Your brain can count decision reversals. The measurement becomes almost automatic.
And here's the second overlooked piece: you need both objective behavioral metrics AND subjective experiential measures. Neither alone gives you the complete picture.
Think about evaluating a quarterback in college football. Completion percentage is objective-you can count it. But composure under pressure? That's subjective-you have to watch and assess. If you only had completion stats, you'd miss quarterbacks who look terrible on paper but perform when it matters. If you only had scout impressions, you'd miss quarterbacks who look great but produce mediocre results.
Your coaching evaluation needs the same dual approach. Behavioral metrics: number of tasks delegated weekly, number of decisions made without second-guessing, percentage of delegated tasks completed without your intervention. Experiential metrics: your own confidence rating in decisions, team feedback about whether you seem more certain, your subjective sense of delegation comfort.
Studies on objective versus subjective measurement confirm that both methods have utility, with appropriate usage dependent on what you're trying to understand. Objective measures add rigor but can miss personal experience not accessible through counting alone. Subjective measures offer deeper understanding but need to be structured to reduce bias.
Most people try to evaluate coaching using only feelings ("do I feel more confident?") or only activity ("am I engaging regularly?"). Both miss half the picture. The forgotten factor is that meaningful evaluation requires building BOTH measurement systems before you begin.
THE FLIP THAT FIXES IT
The standard approach to coaching follows this sequence: engage with coaching → hope it works → try to figure out if it worked → maybe continue or maybe quit based on vague impressions.
But research on coaching effectiveness reveals something counterintuitive: when you reverse this process and establish the measurement scorecard FIRST, then engage with coaching while tracking specific metrics, you actually get clear evidence of impact in a defined timeframe with far less uncertainty.
Here's what the reversed method looks like:
Before your next coaching session:
Identify 2-3 behavioral metrics for each leadership goal. For "better at delegating": (1) number of tasks delegated weekly, (2) percentage of delegated tasks completed successfully without your intervention, (3) number of follow-up interventions you make on delegated work.
Identify 2-3 experiential metrics for the same goal: (1) your weekly confidence rating (1-10 scale) in delegation decisions, (2) monthly team survey responses about feeling trusted with responsibilities, (3) your subjective stress level about work you've delegated.
Establish your baseline by measuring where you are RIGHT NOW before the next coaching session. Count last week's delegations. Survey your team today. Rate your current confidence.
Define what meaningful improvement looks like in 90 days. Not perfection-specific, realistic movement. Maybe delegation increases from 2 tasks/week to 5 tasks/week. Maybe your confidence rating moves from 4/10 to 7/10.
Set up the tracking mechanism: calendar reminder every Friday to log the week's behavioral counts and experiential ratings. Five minutes, simple spreadsheet.
Only then continue coaching.
The coaching content might be identical. But now the measurement mechanism is running in the background. After 90 days, you won't be guessing. You'll have data showing either positive movement (justifying continued investment) or absence of change (providing clear basis to discontinue or fundamentally modify your approach).
The flip is this: most people start consuming coaching hoping to eventually figure out if it worked. The reversed approach installs the evaluation system first, making the answer clear and evidence-based from day one.
THE UNCOMFORTABLE TRUTH
If you actually implement this measurement framework, you have to accept something uncomfortable: you might discover the AI coaching isn't working for you.
After 90 days of rigorous tracking, your behavioral metrics might show no movement. Tasks delegated per week: still 2. Decisions reversed: same frequency as before. Team feedback: unchanged. That data won't care about how much you've invested or how productive the sessions feel. It will simply show that observable behavior hasn't shifted.
Or you might discover something equally uncomfortable: you can't clearly attribute the changes that DID happen to the coaching rather than other factors. Maybe your delegation improved-but you also got a new team member who's more competent. Maybe you're more decisive-but you also started sleeping better. The measurement mechanism reveals not just what changed, but how difficult it is to isolate coaching as the cause.
Research on coaching effectiveness is clear that coaching CAN work-meta-analyses demonstrate moderate effect sizes across thousands of participants. But that same research shows effectiveness varies significantly by individual and context. What works for most people might not work for you. What works in one leadership context might not transfer to yours.
The honest implication: establishing clear metrics means accepting clear answers, even when those answers are "this isn't working" or "I can't tell." You'll lose the comfortable ambiguity of feeling productive without having to prove it.
THE CHALLENGE
Here's your test: before your next AI coaching session, build your evaluation scorecard.
Pick one leadership goal. Write down 2-3 behavioral metrics you can count and 2-3 experiential metrics you can rate. Measure your baseline this week-actually count the behaviors, actually collect the ratings. Define what meaningful 90-day improvement would look like. Set up your Friday tracking reminder.
Then continue your AI coaching for exactly 90 days while tracking these metrics weekly. No judgment, no adjustment to the metrics midway through, no explaining away unfavorable data. Just clean tracking.
At day 90, look at your data. Calculate whether each metric moved in the direction you defined as improvement. Be honest about whether you can attribute the changes (if any) to the coaching versus other life factors.
The specific challenge: can you implement this framework for just 90 days without abandoning it when the data gets uncomfortable? Can you let the metrics tell you the truth instead of letting your feelings tell you what you want to hear?
Most people won't do this. They'll continue with vague engagement and vague evaluation, protecting themselves from clear answers. The question is whether you're willing to trade comfortable ambiguity for uncomfortable clarity.
WHAT YOU'LL PROVE
If you actually track your behavioral and experiential metrics for 90 days, you'll prove something valuable regardless of the results.
If the metrics show clear positive movement: You'll have concrete evidence that justifies your coaching investment. Not feelings, not engagement frequency, not how productive sessions seem-actual data showing specific behaviors changed and subjective experience improved. You'll know exactly which aspects of your leadership developed and by how much. You can confidently continue, knowing you're getting measurable return.
If the metrics show no movement or negative movement: You'll have equally valuable evidence that this approach isn't serving you. You can make a clear-eyed decision to stop, modify how you're using the coaching, or switch to a different development method-all based on data rather than guilt about sunk costs or vague hope that more engagement will eventually work.
If the metrics are mixed or attribution is unclear: You'll have proved something crucial about the complexity of personal development-that simple cause-and-effect is often impossible to establish, and that your investment decisions should account for that ambiguity rather than pretend it doesn't exist.
But here's what you'll prove beyond the specific coaching question: you'll demonstrate that you can evaluate your own development with the same rigor you'd expect from any other professional investment. The same standards you'd apply to evaluating grilling competition techniques (specific metrics, not vague impressions) or building a model train layout (clear plan, not hopeful tinkering) can apply to your own growth.
After 90 days of measurement, you won't wonder whether AI coaching is helping. You'll know. And that clarity-whether it confirms continued investment or provides permission to stop-is worth more than another three months of productive-feeling ambiguity.
What's Next
In our next piece, we'll explore how to apply these insights to your specific situation.
Comments
Leave a Comment