Why simulation alone doesn't reduce risk
Phishing simulations measure exposure, not improvement. Programs that stop at simulation see click rates plateau within a year — and miss the attacks that actually cause losses.
If you've run a security awareness program for two years or more, you've probably seen the curve: click rates fall sharply in year one, then flatten. By year two, you're testing the same population with refreshed templates, and the metric stops moving. Some quarters it ticks back up.
That plateau is not a problem with your team. It's a structural limit of simulation-only programs. Here's why — and what actually moves the needle past it.
What simulation actually measures
A phishing simulation answers exactly one question: given this specific message, would this specific person click?
That's useful. It's diagnostic. It tells you who needs more help and where your highest-exposure groups are. But it doesn't measure improvement — at least, not in any meaningful sense. Click rate moves because:
- People learn to recognize that type of message
- People become more cautious in general for a few weeks after a campaign
- Cohort composition changes (new hires, attrition)
- Templates rotate in difficulty over time
None of those represent durable behavior change. They represent transient awareness.
What durable change actually looks like
Real risk reduction in social engineering shows up as three things, in order:
1. Increased report rate. Click rate falling is good. Report rate rising is better. A workforce that consistently reports suspicious messages — even messages that turn out to be legitimate — is a workforce that's developing the right reflex. Report rate is a leading indicator; click rate is a lagging one.
2. Faster reporting. Time-to-first-report on a campaign tells you something subtle. A program with strong culture sees first reports within minutes of the first sends. A program with weak culture sees the first report after the third clicker has already submitted credentials.
3. Behavior change on real attacks. The hardest to measure but the only one that ultimately matters. When a real BEC attempt comes in, does someone catch it? When an unexpected MFA push hits, does someone deny and report?
The first two are measurable inside a simulation program. The third requires telemetry from the broader environment — SIEM data, helpdesk tickets, real reported emails.
Why simulation-only programs plateau
A few mechanics drive the plateau:
Template fatigue. Most programs run on a small library of templates. Over a year, your population sees most of them, recognizes the pattern, and clicks less — without actually getting better at recognizing new attack patterns. The DBIR's pretexting data this year is the proof point: real attacks are getting more contextual and multi-message, while many simulations are still single-template clickbait.
No remediation loop. The most common failure mode: someone clicks, gets a "gotcha" page, and is sent back to work. No follow-up training. No second test 30 days later to see if the behavior changed. The click was diagnosed; nothing was treated.
Wrong metrics in the dashboard. If your monthly report leads with "click rate down X%," you'll optimize for click rate. Click rate is gameable: you can drop it by sending easier templates, narrowing the audience to known cautious populations, or reducing send frequency. The number goes down; risk doesn't.
No connection to broader controls. Simulation in isolation doesn't tell you whether your MFA rollout is working, whether your email gateway is sandboxing the right things, or whether your detection rules fire on real attacks. It's one signal of one slice of the problem.
What to add to break the plateau
If you've been running simulation alone and want to actually reduce risk:
Adaptive remediation. When someone clicks, they should be auto-enrolled in a short, targeted training module that addresses what they missed — not a generic "phishing 101" video. The closed loop is the program.
Multi-channel testing. Email is one channel. Real attacks come through SMS, voice, third-party tools (Slack, LinkedIn), and physical channels too. A rotation that includes voice and SMS phishing tests the actual threat landscape.
Multi-message scenarios. Modern pretexting is multi-message. A simulation that sends one email and measures the click is testing yesterday's attack pattern. Sequenced campaigns — pretext → urgency → ask — match real BEC much better.
Tie metrics to behavior, not just click rate.
- Report rate (and trend over time)
- Time-to-first-report
- Repeat-clicker rate (the small group of people clicking everything)
- Percentage of population on phishing-resistant MFA
- Reported real attacks (from helpdesk tickets, SIEM)
A dashboard with five of these tells a richer story than one with just click rate.
Differentiate by role. The CFO and the warehouse staff face different attacks. A program that tests both with the same content is doing neither well. Audience targeting on simulations and tailored content for high-risk roles (finance, executives, IT admins) materially improves outcomes.
The honest framing
Simulation is a lens, not a treatment. It's how you see who's at risk and what attacks land. The treatment is what you do after the click — and the culture you build around reporting and verification.
Programs that conflate "we run simulations" with "we have an awareness program" are running diagnostics without therapy. The plateau is the data telling them so.
Sources: Verizon 2025 Data Breach Investigations Report — Executive Summary, IBM Cost of a Data Breach Report 2025.