AI Security

What If Your Security Tools Already Knew?

Josh Rickard's LLM prompting techniques are brilliant. They're also a symptom of a deeper problem—security tools that can't think for themselves.

Setu Research

February 12, 2026·9 min read

The Prompting Tax

Josh Rickard recently published a piece on THOR Collective Dispatch describing how he uses LLMs for security work. It's one of the best practical guides we've seen on the subject, and if you're a security practitioner who hasn't adopted these techniques yet, you should read it immediately.

His core insight is role-stacking: instead of asking an LLM a flat question, you tell it which expert perspectives to adopt simultaneously. "Think as a SOC analyst experienced in alert triage, as a threat hunter familiar with MITRE ATT&CK, and as someone who has dealt with alert fatigue firsthand." The results are dramatically better than a naive prompt.

He's right. Role-stacking works. Being embarrassingly specific about your tech stack works. Iterating on prompts until the output is useful works.

But here's what struck us reading the piece: every technique Josh describes is compensating for context that his security tools should already have.

When he tells the LLM to think like a SOC analyst triaging 50,000 alerts per day across 200 detection rules, he's manually injecting the operational context that his SIEM doesn't surface. When he specifies that 5 rules generate 60% of the alert volume and consume 15% of analyst time, he's providing the quantitative relationships that his tools track in isolation but never connect.

We don't think this is Josh's problem to solve. We think it's ours.

The Context Gap in Security Operations

Let's break down what a security analyst actually needs when an alert fires:

Question	Where the Answer Lives	Is It Accessible?
What happened?	SIEM alert / log event	Yes
Who is this identity?	Identity provider (Okta, Azure AD)	Requires pivot
What else can this identity access?	Cloud IAM, SaaS entitlements	Requires 3–5 tool pivots
How sensitive are those resources?	Data classification, asset inventory	Often unknown
What's the ECI if this is real?	Nowhere—computed in the analyst's head	No
What's the fastest remediation?	Scattered across IAM, PAM, endpoint tools	Requires tribal knowledge

The analyst knows, from experience, that they need all six answers. The SIEM gives them one. The LLM, with enough role-stacking and context-loading, can reason about the others. But it's reasoning from the analyst's manually provided description, not from live data.

This is what we call the context gap: the distance between the alert that fires and the understanding required to act on it. Every minute an analyst spends bridging this gap—pivoting between tools, mentally modeling access paths, estimating severity—is a minute not spent on the actual threat.

Josh's LLM techniques compress this gap. They don't close it.

Role-Stacking Is Graph Traversal in Disguise

Here's an observation that might resonate with anyone who's tried role-stacking for security analysis: the perspectives you're asking the LLM to adopt map directly to nodes and edges in an identity exposure graph.

When you say "think as a SOC analyst," you're asking for event-level interpretation. When you add "as a threat hunter familiar with ATT&CK," you're asking the LLM to traverse from the event to known attack techniques to likely next steps. When you add "as someone who has dealt with alert fatigue," you're asking for a meta-analysis of signal-to-noise ratios across your detection surface.

These aren't arbitrary perspectives. They're different traversal patterns across the same underlying data:

SOC analyst view: Event → Identity → Immediate context (is this normal?)
Threat hunter view: Event → Technique → Adjacent techniques → Campaign pattern
Alert fatigue view: Rule → Historical true/false positive ratio → Tuning recommendation
Exposure view: Identity → All access paths → All reachable resources → ECI score

The first three can be approximated with an LLM and good prompting. The fourth requires a graph.

What "Embarrassingly Specific" Actually Means

Josh's second key technique is being "embarrassingly specific" about your environment—your tools, your tech stack, your constraints. He's absolutely right that this transforms LLM output quality. But consider what you're actually doing: you're manually reconstructing your environment's topology in natural language so the LLM can reason about it.

A prompt like:

"We run CrowdStrike Falcon on all endpoints, Okta for SSO with 340 human users and roughly 2,100 service accounts, AWS across 14 accounts with centralized CloudTrail in us-east-1, and our SIEM is Splunk Enterprise generating 50K alerts/day..."

This is a partial, point-in-time snapshot of your security posture, typed into a chat box. It's already stale by the time you finish writing it. It doesn't include the service account that was created yesterday with admin access to production, or the Okta group membership change that expanded 47 users' access scope last Tuesday.

What if your tools maintained this context as a living, continuously updated graph? What if, when an alert fired, the system could automatically traverse from the alert to every relevant relationship—the identity involved, everything it can access, the sensitivity of those resources, the ECI if compromised, and the specific permission changes that would reduce exposure—without anyone typing a prompt?

From Prompting to Platform: What This Looks Like in Practice

Let's take Josh's alert fatigue example: 50,000 alerts per day, 200 detection rules, 5 rules generating 60% of volume.

The LLM approach (what Josh does today)

Copy alert samples and rule definitions into an LLM
Role-stack: SOC analyst + threat hunter + alert fatigue expert
Ask the LLM to identify rules that are likely generating false positives
Iterate on the prompt with specifics about your environment
Get recommendations for rule tuning
Manually validate recommendations against your SIEM data
Implement changes

This works. It's a significant productivity improvement over doing it purely manually. Josh is right to advocate for it.

The graph-native approach (what Setu does)

The identity exposure graph already knows every identity, every access path, and every resource's sensitivity
When an alert fires, it's automatically correlated against the identity's ECI score
A service account with ECI 12 triggering a low-confidence rule? Auto-suppressed with audit trail
A human identity with ECI 89 triggering the same rule? Escalated immediately with full access path context
Rule tuning recommendations are generated from actual identity-to-resource relationships, not LLM reasoning about described environments

The difference isn't that one approach uses AI and the other doesn't. Both do. The difference is where the context comes from: a human's prompt, or a continuously updated graph of every identity, permission, and resource in your environment.

The Five Things LLMs Can't Do (That Graphs Can)

We're not anti-LLM. Setu uses language models extensively—for natural language querying of the exposure graph, for generating human-readable remediation playbooks, for explaining complex attack paths to non-technical stakeholders. But there are specific things that no amount of prompt engineering can achieve:

1. Discover unknown identities

LLMs reason about what you tell them. They can't enumerate the 2,100 service accounts in your Okta tenant, discover the 340 OAuth grants between your SaaS applications, or find the API key embedded in a CI/CD pipeline that has production database access. Discovery requires API integration and continuous enumeration, not inference.

2. Compute real-time ECI

"What's the worst case if this identity is compromised?" requires traversing every access path from that identity through role bindings, group memberships, resource policies, and transitive trust relationships. This is a graph computation, not a language task. An LLM can describe what ECI means. It can't compute it from live IAM data.

3. Track permission drift

The difference between Tuesday's access state and Thursday's access state is the difference between "this identity is low risk" and "someone just granted it admin access to production." Detecting drift requires continuous snapshots and delta computation across your entire identity surface. This is infrastructure work, not reasoning work.

4. Prioritize by actual exposure

When you have 50,000 alerts, prioritization is everything. LLMs can help you think about prioritization frameworks. But ranking alerts by the actual ECI of the identities involved—using live graph data, not described environments—produces a fundamentally different and more accurate priority stack.

5. Close the loop

The hardest part of security operations isn't identifying the problem. It's remediating it efficiently without breaking production. "Remove this role binding to reduce ECI by 34 points across 12 identities" is a recommendation that comes from graph analysis, not from an LLM that was told about your environment in a prompt.

A Better Pairing: LLMs + Live Context

The ideal isn't LLMs versus graphs. It's LLMs with graphs.

Imagine Josh's workflow, but instead of manually context-loading every prompt:

The LLM has live access to the identity exposure graph via tool calls
When analyzing an alert, it can query: "show me the ECI, all access paths, and recent permission changes for this identity"—and get real data back, not its own inference
Role-stacking becomes unnecessary because the system already has the SOC, threat hunter, and exposure perspectives encoded in the graph structure
"Embarrassingly specific" is the default, because the graph IS your environment

This is the direction we're building toward. Not replacing the analyst's judgment—augmenting it with the context that today requires six tool pivots and a well-crafted prompt to approximate.

What Josh Got Right

We want to be clear: Josh Rickard's article is excellent practical advice for the world as it exists today. If your security stack doesn't provide unified identity context—and most don't—then LLMs with role-stacking are the best force multiplier available. His techniques should be standard practice for every security team.

But we also think his article is one of the clearest articulations of why the current security tooling paradigm is broken. The fact that skilled practitioners need to manually reconstruct their environment's topology in an LLM prompt to get useful analysis is a product failure, not a user skill issue.

The question isn't "how do we prompt better?" The question is: why don't our security tools already know what the LLM needs to be told?

Summary

Role-stacking is a brilliant technique for getting more out of LLMs. It's also a map of exactly what security tools should provide natively: multi-perspective, identity-aware, ECI-quantified context for every alert, every identity, every access path.

Josh showed us the prompts. We're building the platform that makes them unnecessary.

Setu Research

Setu Security Research