The Jagged Frontier Has a Floor Problem
A few weeks ago, a landmark study finally landed in Organization Science after circulating as a Harvard working paper for a couple of years. It's called "Navigating the Jagged Technological Frontier," and if you work in AI or think seriously about where knowledge work is headed, it's worth your time.
The researchers — a team from Harvard, MIT, Wharton, and BCG — ran a preregistered field experiment with 758 consultants at Boston Consulting Group. They gave participants realistic consulting tasks, randomized access to GPT-4, and measured what happened. The results were striking in both directions.
For tasks inside AI's capability frontier: consultants using AI completed 12.2% more work, finished 25.1% faster, and produced outputs rated more than 40% higher in quality by expert evaluators. Remarkable numbers. Reproducible numbers. This is the headline most people remember.
But here's the part that deserves equal attention. For tasks outside the frontier, AI users performed 19 percentage points worse than the control group. The AI didn't just fail to help. It actively degraded performance. And crucially, the workers didn't know they'd crossed the line until the damage was done.
The researchers call this the "jagged" nature of the frontier. The boundary between where AI helps and where it hurts isn't smooth or obvious. Tasks that look similar in difficulty sit on opposite sides of the line. A worker moving through a real-world workflow can cross back and forth across that frontier without any warning signal.
That's the paper's core insight, and it's an important one.
But I think it stops one layer short of the real problem.
The Frontier Isn't the Issue. The Floor Is.
The jagged frontier describes AI capability in aggregate terms. For some tasks, AI is reliably helpful. For others, it isn't. The prescription the authors offer is essentially a calibration problem: train workers to recognize where they are relative to the frontier, and adjust their reliance on AI accordingly.
That's a reasonable organizational response. But it assumes that when workers are on the right side of the frontier, AI output can actually be trusted. And for a specific category of task, that assumption is wrong regardless of what side of the frontier you're on.
Math is not a capability problem. Math is an architecture problem.
When the task involves numerical computation — drug dosing, bond pricing, real estate underwriting, actuarial calculations, financial projections — "mostly right" isn't a point on the capability spectrum. It's a category failure. A consultant who produces a creative strategy brief that's 85% as good as a top performer's is still useful. A pharmacist who calculates a pediatric dose that's 85% correct has harmed a patient.
The Dell'Acqua paper found that AI universally helped consultants with complex, high-end knowledge tasks. But consulting is a field where outputs are directional. A market entry recommendation can be better or worse. An analysis can be more or less insightful. The task has no ground truth.
Computation does.
Stochastic Systems Don't Belong in Deterministic Problems
IBM researcher Raffi Khatchadourian tested 74 configurations across 12 frontier AI models to measure something specific: decision determinism. Given the same input, does the model produce the same answer?
Even at temperature zero, the best models achieved only 88.5% decision determinism. Under stress conditions — data quality issues, redeployment — the numbers dropped further. His conclusion was unambiguous: determinism cannot be solved through model selection or prompt engineering. It requires architectural separation.
That's the floor problem. Not whether AI can help with math tasks inside its capability frontier. But whether the architecture underneath is capable of producing the guarantees that math-dependent decisions actually require.
The jagged frontier paper is about the ceiling: how high AI lifts performance when it's working. The determinism research is about the floor: what happens when math is embedded in a probabilistic system and nobody notices the difference.
Both problems are real. The floor problem is the one that gets people fired. Or hurt.
What the Paper's Behavioral Findings Actually Suggest
One of the more interesting findings in the Dell'Acqua study is the distinction between what the researchers call "Centaurs" and "Cyborgs." Centaurs divide labor: they do some tasks themselves and hand others to AI based on their read of where AI adds value. Cyborgs integrate AI continuously, blending human and machine effort throughout the work.
Both strategies produced gains inside the frontier. Neither reliably protected against degraded performance outside it.
What struck me about this finding is what it implies about the failure mode. It's not that workers were careless. They were experienced consultants doing real work. They developed working theories about when to trust AI. Those theories were wrong often enough to show up as a 19-point performance gap.
Now apply that dynamic to a financial analyst running bond pricing scenarios. Or a hospital pharmacist using an AI assistant for dosage calculations. Or an insurance actuary building reserve models. These workers will also develop working theories about when to trust AI output. They will also be wrong. And unlike a consulting engagement, the errors won't show up in a quality rubric. They'll show up in a P&L, a regulatory action, or a patient outcome.
The lesson from the jagged frontier research isn't that we should be cautious about AI in knowledge work. It's that we need to be architectural about where we deploy it. For tasks where ground truth exists and precision is required, "trust but verify" isn't a sufficient answer. The architecture has to guarantee the floor.
What This Means for How AI Gets Built
The jagged frontier will keep moving. Every few months, tasks that used to sit outside AI's capability expand to fall inside it. That's progress, and it's real.
But some tasks shouldn't be on the frontier at all. Not because AI can't assist with them, but because the execution layer underneath the AI needs to be deterministic by design. Not probabilistic-with-safeguards. Deterministic by architecture.
That's what we built TrueMath to be. Not an AI math assistant. An infrastructure layer that sits beneath the AI, handles the computation deterministically, stores every calculation for audit and recall, and locks business logic so that the same inputs produce the same outputs every time — not 88.5% of the time.
The consultants in the BCG study got better at knowing when to trust their AI and when to push back. That's a reasonable adaptation to a jagged frontier. But knowledge workers in regulated fields, in clinical settings, in financial compliance, can't manage the floor through calibrated skepticism. They need the floor to be guaranteed.
That's not a capability question. It's an architecture question.
And architecture doesn't move with the frontier.
Reach out: bill.kelly@truemath.ai
Learn more: truemath.ai
Sign up for early access: https://app.truemath.ai/signup