Technical

The honest case for graph physics in identity security — and where it stops

A structural prior for cold-start identity security, with the GNN roadmap that earns its way on top.

Setu Research

April 12, 2026·8 min read

The honest case for graph physics in identity security — and where it stops

There's a popular line in vendor decks that "modern attacks demand learned graph neural networks." It's a real trend and a defensible one. But there's a deployment reality nobody on stage talks about: the day a new tenant turns Setu on, we have zero labels, zero incident history, and zero training data. A GNN trained on someone else's environment is a worse prior than no model at all — security graphs do not transfer; what's anomalous in one tenant is the org chart in another.

So we picked a different starting point: a small, well-understood family of unsupervised graph methods rooted in the spectral structure of the identity-permission graph. This post explains exactly what those methods do, what they don't do, and the path by which the same system gets smarter as it accumulates analyst feedback.

We're not going to oversell this. The math is real but it's a structural proxy, not a probability over attacker behavior. We'll say so up front.

What the heat kernel actually computes

For an identity-permission graph $G$ with normalized Laplacian $L$ , the heat kernel is

$H(t) = e^{-tL}$

Apply $H(t)$ to an indicator vector that places a unit mass on a single source identity, and you get a vector over all nodes whose value at node $j$ measures how much "heat" has spread from the source to $j$ after diffusion time $t$ .

Important: this is not a probability that node $j$ is compromised. The heat kernel is a smoothing operator on graph signals. It tells you about the structural coupling between source and destination through the topology — short paths, dense connectivity, bottlenecks. It says nothing about whether an attacker could exploit that coupling, what it would cost them, or whether the destination has defenses that would stop them.

We've seen vendor blogs (including, frankly, our own earlier drafts) call $H(t)[j]$ a "probability of compromise reaching node $j$ ." That's a category error. The honest framing is: $H(t)[j]$ is a structural exposure score under a uniform-conductance, time-symmetric diffusion model. It's a useful prior over which nodes warrant attention. It is not, and was never going to be, an attacker model.

Why a structural prior is worth shipping anyway

Two reasons.

First, day-one signal at zero training cost. A new tenant connects identity, endpoint, and access telemetry. Within hours, the graph is built and the diffusion has run. Every identity has a structural-exposure score relative to the crown-jewel set the customer has marked. Analysts have something to triage. None of this required a labeled incident, a baseline window, or a model fine-tune.

Second, structural priors are real, even if imperfect. Two decades of graph-mining research have shown that propagation-style scores correlate with what security people call "blast radius" — the set of assets reachable from a compromised seed under realistic adversary movement. Personalized PageRank has been the workhorse here since the mid-2000s, and BloodHound's reachability analysis has shipped attack-path scoring built on the same principle since 2016. The heat kernel is a smoother, frequency-decomposable cousin of PPR. Different math, similar role.

We're not claiming the heat kernel is more predictive of breaches than PPR or BFS reachability. We're claiming it gives us a continuous, multi-scale exposure surface that the next layer of the system — learned edge weights, learned attention, learned temporal dynamics — can build on without throwing away the day-one signal.

Where we get specific (and where we deliberately do not)

We compute $H(t)$ via Chebyshev polynomial filters on the Laplacian, following Hammond, Vandergheynst & Gribonval (2011). This is well-established mathematics from the spectral graph signal processing literature. We are not claiming novelty in the polynomial approximation itself — we are using a known efficient method to apply a known operator to a domain (enterprise identity graphs) where, to our knowledge, it has not previously been productized.

We use the random-walk normalized Laplacian on the directed permission graph. We do this knowing that it sacrifices some of the clean spectral interpretation that holds for symmetric Laplacians. The trade-off is real: lateral movement is directional (low-priv → high-priv), and a fully symmetrized graph would erase that. We chose directionality over spectral elegance. Future versions will revisit this when we have enough labeled traversal data to learn the directional weighting end-to-end.

Edge weights today follow a recency prior: $w_{ij} = \exp(-\alpha \cdot \text{days\_since\_last\_use})$ with $\alpha$ a tenant-tunable hyperparameter (default $0.01$ , half-life ≈ 70 days). This is a hand-designed prior, not a learned weight. We've sometimes been asked why we don't differentiate AssumeRole from CanReadFile more aggressively, and the honest answer is that we will, once we have enough triaged-alert data per tenant to learn type-specific weights without overfitting. That's the GNN bridge — not a vague future ambition, a specific roadmap item.

Where rules show up (because we do use them)

Some rhetoric we've used in the past — "physics, not rules" — was a false dichotomy. The threshold that turns a structural-exposure score into an alert ("flag identities whose 90-day exposure exceeds the 99.7th percentile of their tenant's distribution") is a rule. The dormancy gap that triggers a déjà-vu detection ("60 days since last seen") is a rule. The flood gate ("max 100 first-seen alerts per tenant per hour") is a rule.

The right framing is: physics gives us a continuous score; rules turn that score into alerts. Both layers are tunable per tenant, and both are observable as Prometheus metrics so a CISO can see exactly what their thresholds are catching and missing. If we don't disclose those thresholds in the product, we fail an audit.

What about the freshness gap?

Fair criticism: a daily offline diffusion job misses an attack that grants a permission at 09:00 and exploits it at 11:00. We address this in two layers.

The eigenbasis-derived structures (community decomposition, low-rank approximations) refresh nightly. They have to — a new identity that touches no existing nodes can rewrite local community boundaries, and there's no shortcut for that.

But the diffusion itself does not need to wait for the basis refresh. New permission grants update edge weights in the live graph, and per-source diffusion is recomputed on demand at sub-second latency for any identity an analyst opens. The cached precomputation is for the 99% of nodes nobody is looking at right now; the per-event lookup is for the 1% under active investigation.

Real freshness gap: a brand-new identity created at 09:00 and used at 11:00 won't appear in last night's basis. Its community membership is unknown until tonight's refresh. We mitigate this by treating any identity younger than the basis age as "uncategorized" and routing it through a faster, smaller, neighborhood-only diffusion. The fix is not perfect; the gap is documented in the product UI.

Service accounts and the homophily assumption

Spectral methods assume some degree of community structure on the graph — that "neighbors look like neighbors." Enterprise identity graphs often violate this: a CI/CD service account legitimately spans engineering, ops, and finance. Louvain community detection on raw permission graphs produces unstable clusterings that bounce as service accounts join or leave a permission group.

Our answer is type-stratified community detection. Human identities, service accounts, and machine identities run their community detection independently, and the cross-type edges are added back into the diffusion at a down-weighted rate. This is not magic — it's a principled response to a known structural heterogeneity. A pure GNN with attention over edge type would learn this stratification implicitly; we encode it explicitly because we can't yet train.

What we will earn the right to do

The ML literature has moved a long way past where we sit today. APPNP, GAT, graph diffusion convolutions (GDC), TGN — each of these takes one of the design choices we currently hand-make (edge weights, aggregation, propagation, time) and replaces it with a learned function. We're not pretending we already shipped that. The roadmap, in the order we expect to ship:

Learned edge weights from analyst triage. Closed and dismissed alerts are weak labels. Train a logistic edge-weight model per tenant. Converts our biggest hand-designed knob into an online-learned one. Quarter-scale milestone.
Learned Chebyshev coefficients (ChebNet-style filter). Same Chebyshev machinery we use today, but the coefficients are learned rather than fixed to the Mexican-hat kernel. Minimal infrastructure change, real ML upgrade.
GAT-style attention over edge type. Where AssumeRole and CanReadFile get learned, type-specific weights from data. This is the layer that closes the most-cited gap from our critics.
Temporal-graph network layer. Once we have six-plus months of per-tenant event history with kill-chain stage labels from analyst feedback, a TGN-style temporal layer becomes trainable per tenant. Until then, windowed snapshots with edge timestamps as features are the right intermediate step.
LLM narrative on the GNN-flagged subgraph. This one is closer than the temporal layer. The GNN picks the subgraph; the LLM writes the analyst-facing narrative ("Identity A reactivated after 14 months of dormancy, then assumed Role B which was granted yesterday, then read a file in the crown-jewel set"). We have the subgraph extraction; the narrative layer is two prompts and a guardrail away.

What we are not going to do

Pretend we are doing GNNs when we are not. The roadmap above is two-to-three quarters of work, and faking it gets us caught by the next reader who has read a graph ML paper since 2017.
Drop the spectral foundation. Cold-start unsupervised scoring is a real product requirement (week-one deployment, no labels). The GNN layer goes on top, not instead of.
Claim 63% submodularity guarantees on directed-flow edge removal where the precondition does not provably hold. (Earlier writing of ours invoked CELF's approximation guarantee. Edge removal in directed diffusion is not generally submodular — locking door A reroutes flow through door B. We use CELF as a fast greedy heuristic and present the empirical results, which is the honest thing to do.)
Claim the heat kernel is a probability measure over attacker behavior. It is a structural proxy. We say so in the product, in the docs, and now here.

Why this matters to a CISO

A vendor who tells you they have AI for security but cannot tell you (a) what their model would do on day one of your deployment, (b) where their training data came from, or (c) which knobs are hand-set versus learned, is selling you a black box. Black boxes get turned off after the third false positive.

We are betting that an explainable, continuously-improving system that starts with classical methods and earns its way to learned ones — with every transition observable and auditable — beats a pre-trained black box that arrives with confidence and no provenance. If you've sat through a procurement cycle where the auditor asked "show me what the model is keying off," you know which one survives that question.

Closing

The field is moving toward learned, multi-scale, feature-rich graph propagation, and that direction is right. Setu's contribution is not to argue otherwise. It's to ship the unsupervised version that works on day one, expose every threshold to the operator, and earn the right to the learned version through accumulated tenant-specific feedback rather than borrowed pretraining.

PageRank-style propagation gave security teams a way to score nodes by structural exposure. Heat-kernel diffusion extends that to multi-scale, frequency-decomposable exposure surfaces — which, as a starting prior, matches the shape of modern attack graphs (sparse, long-range, noisy) better than tabular ML on flat features. GNNs will extend it further, when the data and the analyst feedback to train them honestly exists. We're building the data pipeline that makes that transition possible.

That is the case for graph physics in identity security. That is also exactly where it stops.

Setu Research

Setu Security Research