Every Control Is a Hypothesis
I was a physics student before I was a compliance specialist, and the transition has taken me a decade to make peace with. Under the surface, the two crafts are the same — and treating controls like hypotheses changes almost everything.
I was an undergraduate physics student before I was a compliance specialist, and the transition has taken me almost a decade to make peace with.
The pieces of the two disciplines look so different from the outside. Physics is white coats and equations and measurement to seven decimal places. Compliance is policy memos and approval workflows and the occasional shouting match with a sales VP.
But underneath, the work is the same. We're both running experiments against the world to see if our model of how things should behave matches how they actually behave. We just call our equipment different names.
In physics, a model that hasn't been tested isn't science. It's speculation.
In compliance, a control that hasn't been tested isn't assurance. It's also speculation. We just don't call it that, because the word "control" sounds reassuringly definite.
I want to argue, in this essay, that the single biggest improvement most GRC teams could make to their craft is to stop treating controls as facts and start treating them as hypotheses. The shift is conceptual, not technical, and once you make it, almost every downstream practice changes shape.
What we get wrong about controls
A control, on paper, looks like a statement of fact.
"Segregation of duties is enforced between the requestor and approver of vendor master changes."
Read that sentence and notice how confident it sounds. The verb is in the present tense. The voice is active. The structure invites belief. By the time the auditor walks in, the control feels like a description of the world, not a claim about it.
But every word in that sentence is a hypothesis. Is segregation enforced? Between whom, precisely? On what subset of changes? Through what mechanism — system, policy, or hope? And does the mechanism behave the same in the third week of December, when half the team is on PTO and someone is trying to push through a year-end vendor change?
A control is a load-bearing assertion about how a process behaves under stress. It is exactly as good as the evidence that supports it. Most controls have evidence good enough to support the claim "this happens on Tuesdays in March." Few have evidence good enough to support the claim "this happens always."
That gap is where audit findings live.
The scientific method, applied to a vendor onboarding control
Let me show you what this looks like in practice. Suppose I want to test the control above. The naive approach is to pull a sample of vendor changes, look at the approver and requestor fields, and check that they're different humans. If 25 out of 25 are different, the control passes.
The scientific approach is different. It starts with a question that has the potential to falsify the control.
"Under what conditions would this control fail?"
I sit with this for ten minutes before I touch a system. The conditions that come to mind, roughly in order of how nervous they make me:
- 1A user with both the requestor and approver role in the application
- 2A shared service account that one person uses under multiple names
- 3A workflow bypass available to system administrators
- 4An emergency change procedure that explicitly skips segregation
- 5A vendor master change route through a different system entirely (the integration with the procurement platform, say) where segregation isn't enforced
- 6A help desk ticket that allows manual edits to the vendor master
The naive test catches none of these. The naive test catches a normal Tuesday in March.
What I want is a test design that specifically goes hunting for the failure modes. Sample selection isn't random — it's adversarial. Three of my 25 samples come from the period when the controller was on vacation. Two come from year-end. Two come from changes initiated via help desk ticket. One comes from a system-administrator override, if any exist.
If none of those samples surface segregation issues, the control survives the experiment. Now I'll write it down as a fact.
Why this matters at scale
Here's the part that ought to make every CAE nervous.
If you accept that controls are hypotheses, you have to also accept that a control library is a list of un-falsified claims, not a list of safeguards. Some of them are well-tested. Many of them have been tested only against the boring case. A few of them have never been seriously tested at all.
In most organizations, the ratio of well-tested to barely-tested controls is somewhere between 20:80 and 30:70, in favor of the barely-tested.
That doesn't mean those controls are wrong. It means we don't actually know. We're operating on confidence we haven't earned.
The implication for AI-assisted testing is significant. When an LLM can read a control description and propose adversarial test cases — failure modes the human would have taken hours to brainstorm — it changes the cost structure of doing controls testing properly. We can finally, for the first time in the history of our profession, treat every control like a real hypothesis instead of just the high-risk ones.
That's what the EvidenceHunter agent in modern GRC platforms is doing. The agent reads the control, generates the failure modes, designs samples that target each failure mode, and surfaces the evidence that would either confirm or break the control. The auditor's job moves up the value chain: from "did the control pass?" to "did we test the right hypothesis?"
The reframe
Try this for a week. Every time you read a control statement, mentally insert "we hypothesize that" at the start of it.
"We hypothesize that segregation of duties is enforced between the requestor and approver of vendor master changes."
Suddenly it's a sentence that wants to be tested. It feels naked. It invites disagreement. You start asking, almost automatically, "under what conditions would this fail?" And the answer to that question is your test plan.
The point
That's the shift. From facts to hypotheses. From defending control existence to interrogating control behavior. From confirmation to falsification.
Physics took this turn somewhere around the 17th century. Compliance is in the middle of it right now. The teams that lean into it are the ones whose findings actually move organizations. The teams that don't will keep writing reports that say "control operating effectively" — right up until the day it isn't.
