Most leadership programmes scale fast on assumptions that only unravel in the room. A genuine pilot surfaces hidden beliefs, resistance and the workplace conditions that make behaviour change stick, before the full investment lands. Jimmy Burroughes argues against biased, stacked pilots and shows how to test important reality, not optimism.

Most leadership programmes are designed and decided before anyone enters the training room. The content is chosen, facilitators are booked, the business case clears the board or approvals committee, and the goal is to get moving. If somebody in the room asks whether to run a pilot or go straight to full deployment, the answer is almost always the same and framed around agility, delivery, and the aforementioned let’s get moving.

That instinct isn’t unreasonable. Pilots can feel like delay when boards are watching and budget cycles are running. Not to mention most decent L&D and OD teams are responding to a perceived business need and there is pressure to deliver.

Assumptions tend to be wrong in ways that only become visible once the programme is already running at full scale

Of more concern is the assumption that sits behind the decision to start at scale, one that rarely gets named openly: that the organisation believes it already knows enough about its own people to design something that will work first time. It understands its management culture. It knows where the resistance will sit. It has done a training needs analysis. This is often correlated with the tenure of L&D people who have consistently rolled out programs across the organisation, often with similar results. These assumptions tend to be wrong in ways that only become visible once the programme is already running at full scale.

What a pilot is actually for

The usual argument for piloting is a financial one. Test before you commit the full spend, whether that’s taking people out of work, engaging a facilitator/L&D team or outsourcing to a vendor. This is reasonable to a point, but it missed the more important factor: that cultural texture is frequently not visible from a scoping conversation only. There are unspoken beliefs managers hold about what development is actually for, who it is aimed at, and whether feedback is something that genuinely happens here or something people just comply with.

Another thing to watch is the degree to which line managers of the participants in the programme, and at times the participants themselves, will reinforce new behaviours or quietly undermine them because the programme represents a challenge to how they have always operated, their unwritten norms. None of this tends to surface in a needs analysis. It bubbles up when you are running sessions with real people and watching what actually happens in the room… or doesn’t.

A pilot is how you find out what you did not know you did not know, before you commit the whole investment. That is a different purpose from testing whether the content is good or if people like it. The CIPD’s research on the effective transfer of learning makes the point clearly: the conditions required for behaviour change in the workplace are extremely context specific. The same programme, delivered in two different organisations, can produce markedly different results. Not because of what is taught but because of the environment that surrounds the teaching. A pilot surfaces those conditions where a rollout skips it and blindly assumes the risks.

The stacked pilot

One of the most frequent pilot mistakes made is the stacked pilot. Partly because the intention behind it is so understandable.

This is a pattern observed across multiple organisations rather than an isolated case. Senior sponsors want the pilot to succeed because they are often already committed to the programme in principle. Facilitators want to perform well. The natural instinct of anyone organising a test is to select the environment most likely to produce a good result. What they produce instead is a proof of concept for a version of the world that the rollout will not encounter.

A genuine pilot should be as representative as the organisation can make it. That means typical managers, not the most capable ones. A business unit with real pressure on it, not the most receptive team in the building. Facilitator support that reflects what will be available at scale, not what can be assembled for a flagship cohort. If the programme only functions when everything is operating in its favour and at 11 out of 10, discovering that in a pilot is an enormously valuable finding. Discovering it after a full rollout is a much more expensive ownership. It also damages the reputation of L&D and personal development as a whole, as well as impacting performance and engagement.

There is a version of this objection heard from time to time, which is that using unrepresentative conditions in a pilot is unavoidable because you cannot risk the programme failing in front of a sceptical audience. This is a fair argument, however it reflects a category confusion about what a pilot is. If it cannot fail, it is not a test. And the test should teach the business what is required to make it succeed.

The context assumption

Every organisation holds beliefs about its management culture that have not been tested recently, if ever. Things like, “Leaders here are broadly self-aware. People will engage with feedback if it is handled well. The business is ready for this kind of change”. These beliefs are usually held by people senior enough to be insulated from the conditions that would challenge them. TalentLMS has found that there is a perception gap: 90% of managers say they have a good understanding of their direct reports’ skills, compared with 69% of employees who say their manager has a good understanding of their skills.

Equally dangerous is carrying out a diagnostic with a consultancy who do not know or understand your business, peddling a one size fits all diagnostic purporting to tell you everything you need to know. Or worse, not carrying out any diagnostic at all and relying on gut feel.

A non-stacked pilot breaks that insulation. It exposes the programme to managers who reflect the actual workforce and generates granular information about what people are and are not willing to do versus what they are or are not currently doing. Resistance, when it appears, is data. It might indicate a belief about leadership that the programme needs to address more directly. It might indicate a structural condition that will undermine new behaviours regardless of how good the facilitation is. It might indicate that the change being asked of people is larger than the design anticipated, or that the communication surrounding the programme has created the wrong expectations entirely.

All of that can be adjusted before the full investment is committed. None of it can be retrieved afterwards. Too many programmes have kicked off with great intentions only to find the lay of the land was not as expected when they started walking it.

The selection shortcut

Who goes into the pilot matters more than most organisations think about. The tendency is to select on the basis of availability or the path of least resistance. That might be the business unit whose director is enthusiastic, the managers who can be released, or whoever can be assembled quickly enough to meet the timeline.

A more deliberate, tested and proven approach applies two criteria at once. First, where is the management capability gap most significant, and where would improved leadership produce the most impactful difference in team performance? Second, within that area, which individuals show real readiness to reflect and change, not just willingness to attend? The intersection of business impact and individual coachability surfaces people who will affect the organisation if they develop, and who are actually capable of development within the timeframe. They are not necessarily the easiest group to work with. They are the most informative. And the selection discipline that produces them in a pilot can be carried directly into the full rollout. The added benefit is that it tends to produce better outcomes than the path-of-least-resistance approach it replaces.

A diagnostic to run before you commit

Before approving a full rollout, it is worth asking a handful of honest questions. These are not intended to delay the programme, but to find out whether the pilot has done what a pilot is supposed to do.

  • Can you name the specific management behaviours the programme is trying to change, and describe in concrete terms how you will know whether they have shifted at 90 days?

    If the answer drifts towards completion rates or feedback scores, the measurement hasn’t been designed yet.

  • What did people resist during the pilot, and do you actually know why?

    Resistance that gets managed rather than understood tends to reappear at scale in a form that is harder to address.

  • What did line managers of pilot participants do once the sessions ended?

    If the honest answer is that nobody tracked this, then the single biggest predictor of whether behaviour change will stick has gone unmeasured.

  • What would you change about the programme design before running it with the full population?

    If the answer is nothing, it is worth asking whether the pilot was genuinely representative or whether it was, in some of the ways described above, arranged to succeed.

  • Who selected the pilot cohort, and on what basis?

    The logic that built the pilot group tends to replicate itself at scale unless someone has made a deliberate decision to change it.

These are not comfortable questions in most organisations. They are considerably less uncomfortable than a full rollout that delivers the same untested assumptions to the entire management population and leaves the L&D team explaining to the board why the investment didn’t move anything.

The point about scale

L&D leaders who run genuine pilots tend to arrive at the broader investment conversation in a stronger position than those who report completion rates and satisfaction scores. They have behavioural data. They have specific evidence of what delivering the pilot surfaced that the design had not anticipated. They have an answer to the question boards eventually ask, which is not whether the training happened but whether anything actually changed.

A controlled pilot isn’t delay. It is the information that makes the scale worth committing to in the first place, and the design confidence to do it properly when you do.


Jimmy Burroughes is Founder of JBL High Performance and AidenCoach