Strategy

Death on the yellow brick road, security edition

a16z drew the map for AI application companies surviving next to the labs. We scored Setu against all seven of its tests, and we are going to show the ones we fail as plainly as the ones we pass.

Setu Research

May 30, 2026·11 min read

Death on the yellow brick road, security edition

a16z published a piece this month, "Avoiding Death on the Yellow Brick Road," about how AI application companies survive next to the frontier labs. The argument is simple and, we think, correct: there is a wide, smooth road of horizontal, low-complexity problems where raw model capability wins, and OpenAI and Anthropic own that road. The companies that live are the ones building in the rest of Oz — the vertical, complex, messy workflows where domain depth and integrated systems compound into something a general model can't reach.

Read it for security and the labs become the platforms. CrowdStrike, Microsoft, Wiz, Palo Alto Cortex. They have the distribution, the data positions, and the gravity to absorb any horizontal feature into the console you already own. The yellow brick road, in security, is every capability a platform can ship as a checkbox next quarter. If your product lives on that road, the platform is your roadmap, and it ends the way the article warns.

We've spent the last year building Setu off that road. The a16z piece gives us a way to grade that claim, and there is a version of this exercise that vendors love to do — declare a clean pass on every test and move on — and a version that survives a diligence review. This is the second one. On three of the seven we are genuinely off the road. On two we are partly sitting on it and have sometimes talked as if we weren't. Two more — the metric we judge ourselves by and the cost moat we keep describing — aren't done yet, but both are scoped and on our roadmap. We will mark each one.

The three tests

1. Tools and steps — pass, with one honest exception

The first test asks how many steps a workflow has and how messy its inputs are. "Summarize my alerts" is one step against forgiving input — that's the road, and a platform model does it for free. Real security value lives in the workflows with many steps and brutal inputs.

Setu's core is that kind of workflow. Telemetry arrives from a Palo Alto XDR tenant, an IDS Next SQL warehouse, a fleet of HikVision cameras over ISAPI — each a different schema, each partly broken. It gets normalized to OCSF, resolved into entities, joined into a graph, aggregated into co-occurrence, and propagated into a risk score. Identifiers drift between databases. The same account is a finding in one tenant and the payments pipeline in the next. None of that is one step, and none of the inputs are forgiving. This part passes.

Here is the exception we won't paper over. The narrative layer — the single model call that turns a cluster of events into a readable briefing — is sitting squarely on the road. A platform can write that sentence as well as we can, possibly better. We have at times pitched the narrative as if it were the product. It is not the moat; it is the last inch of a mile of messy work, and if we ever let the briefing carry the value story we will lose that argument to whoever has the bigger model. The plumbing under the narrative is what passes the test. The narrative itself does not.

2. System, not tool — pass for the graph, fail for some of what we ship around it

The second test is the sharpest: would the customer still need you if a platform shipped your feature? A tool dies on that question. A system survives it, because the system owns the workflow end to end.

The graph is a system. It owns the path from raw ingestion through normalization, the entity registry, the co-occurrence aggregation, the dispatches feed, and the audit trail, and it is built from one customer's specific environment in a way a platform's cross-tenant model never sees. On the graph, we pass.

But not everything we ship is the graph, and we should stop implying it is. A connector that pulls one vendor's data into a table and shows it back to you is a tool. We have some of those. The moment Cortex renders the same view natively inside the console the customer already owns, that connector is dead, and no amount of calling it "part of the platform" changes the outcome. The test isn't whether we describe something as a system; it's whether it would survive the platform shipping the feature. Several of our surfaces would not. The discipline we owe ourselves — and have not always kept — is that a connector earns its place only by feeding the graph. If it stands alone, it is on the road, and we should either wire it into the graph or stop pretending it's defensible.

3. The P&L test — this is the one we currently fail

The third test asks what the customer judges you on. Labs are judged on benchmarks. Off-the-road companies are judged on the customer's own outcome.

Our buyers are on the right side of this by nature. A cooperative bank cares whether it passes an RBI audit and whether credential sharing got remediated. A pharma manufacturer cares about 21 CFR Part 11 and CDSCO defensibility, not anyone's score on a public detection set. We are not fighting to be measured on P&L; our customers already measure us there.

The failure is on our side of the wire. We instrument the wrong number. Today our health signals tell us the connector ran — the adapter executed, the poll returned. They do not reliably tell us that findings landed, that risk actually propagated, that an analyst closed something. We have watched a connector report perfectly healthy while zero rows reached the store downstream. "The adapter ran" is an engineering metric wearing an outcome's clothes, and for a company whose whole thesis is that it should be judged on customer outcome, measuring uptime instead of outcome is not a small gap — it is the test, failed today.

It is also a gap with a fix that is scoped rather than aspirational, and on our roadmap. The plan is an outcome layer that splits "the adapter ran" from "rows actually landed and risk moved," and surfaces the metrics the customer keeps their own books in: findings closed, dwell time cut, credential-sharing incidents remediated, audit items retired. The first piece — separating connector liveness from data-arrival, so a healthy adapter can no longer mask an empty pipeline — is the nearest-term item. Until those land we will say plainly that we are claiming the P&L high ground while still keeping our own books in uptime. The difference between this gap and a hand-wave is that it is named and scoped.

The four moats

Data and learning — real, with the cross-customer half constrained

The strongest moat in the article is the flywheel that compounds with exposure. The within-customer flywheel is real and running: the graph's understanding of one environment deepens in production, and per-tenant isolation makes that a sovereignty story too. A general model trained on the public internet inhabits none of these environments.

The honest qualifier is on the other half. The "we learn across pharma, banking, and OT" cross-customer flywheel is exactly the kind of claim that sounds like a moat and mostly isn't, because the same per-tenant isolation that makes our within-customer story credible also forbids freely moving learned signal between tenants. Whatever crosses tenants has to cross as carefully abstracted structure, not data, and that is slower and thinner than the unconstrained version implies. The within-customer flywheel we will claim. The cross-customer one we are still earning, and we should size it honestly rather than wave at it.

Model variability — real, and one of the genuinely unglamorous wins

The article rewards absorbing the work the labs won't: re-running evaluations, recalibrating prompts, eating the migration when a model is deprecated. We do route across vendors and tiers, with the evaluation and re-routing inside the product, so that when the model under us changes the customer's workflow doesn't. This one we will claim without an asterisk, precisely because it's the boring work that doesn't make a slide. It is also not uncopyable — anyone willing to do the unglamorous part can — so it is a moat in the sense of "incumbents won't bother for your one vertical," not "they can't."

Cost — a moat on our roadmap, not one we hold yet

Tiering intelligence by task is the moat on-prem, cost-sensitive customers care about most, and it is the one where we are furthest from our own story. We have the pieces: a local-inference path, the ability to run bulk work on the customer's own hardware, frontier models reserved for the hard cases. What we do not yet have is a deliberate, priced, measured tiering policy that turns those pieces into an advantage a buyer can see on an invoice. Right now cost-efficiency is incidental, not engineered. Labs price the floor; a system off the road should price the inverse — the lowest cost for the intelligence a task needs — and we have written that sentence more often than we have shipped the mechanism behind it.

So the honest status is: the pieces exist, the policy that turns them into a moat is on our roadmap. Concretely, a routing policy that classifies each task by required intelligence and sends it to the cheapest tier that clears the bar — local for bulk, mid for the routine, frontier only for the genuinely hard — with the per-tier cost made visible to the buyer. We list this as a moat we intend to deserve, not one we currently hold.

Governance — the one we hold most firmly, no asterisk

This is the part of Oz we actually own. The article names HIPAA, SEC, and FINRA as governance no horizontal player can hold across every vertical at once. Our version is the one a global platform will not credibly touch: CDSCO, DPDP, 21 CFR Part 11, RBI norms for cooperative banks. The control plane — permissions, audit, guardrails, the compliance record — is use-case specific by definition, and it is the thing the customer is paying for. No platform owns the Indian regulated-vertical control plane across pharma, banking, and hospitality at the same time, and reversing a decade of cross-tenant cloud architecture to try is the kind of cost that keeps them out. This is counter-positioning in the strict sense. It is the moat we would lead with.

The scorecard, restated

Putting it together, without the soft pedal.

What genuinely puts us off the road:

The graph as an end-to-end system (test 2, for the graph itself).
The multi-step, messy-input core pipeline (test 1, minus the narrative).
The within-customer data flywheel.
Model-variability absorption.
The Indian regulated-vertical governance control plane — our strongest single position.

What is sitting on the road, or claimed bigger than it is:

The narrative layer — a platform writes that sentence too.
Standalone connectors that don't feed the graph — tools the platform can reclaim.
The cross-customer flywheel — constrained by the same isolation that makes the rest credible.

What we don't hold yet, but is scoped and on our roadmap — and which we should not claim as done until we do:

Outcome instrumentation. We measure that the connector ran, not that the customer's risk moved. On the test our whole thesis rests on, that is the gap that matters most, which is why it leads the roadmap, with the liveness-versus-arrival split as the nearest-term item.
Cost tiering as a priced, engineered advantage rather than an incidental property — a task-by-intelligence routing policy with per-tier cost made visible.

The reason to write it this way is the same reason the moats conversation is worth having honestly at all: fake passes absorb the effort that the real gaps need. If we tell ourselves we already pass the P&L test, nobody builds the outcome metrics. If we call every connector a system, nobody does the work to wire it into the graph. The map a16z published is useful exactly because it tells us where to dig, and the holes are more instructive than the high ground — provided you put a date on filling them.

The next wave of security software won't be built on the smooth road where the platforms walk. We are genuinely building off it — and we are two deliberate pieces of work short of being able to say so without a footnote. Both are scoped, both are on our roadmap, and naming them is the difference between a plan and an excuse.

With credit to a16z's "Avoiding Death on the Yellow Brick Road," which framed this better than we had.

Setu Research

Setu Security Research