Detection Engineering

Your SIEM sees events; your graph sees campaigns

A three-tier walkthrough on one realistic campaign — SIEM-only, graph-only, graph+GNN — with concrete time-to-detect numbers.

SR
Setu Research
April 12, 2026·10 min read

Your SIEM sees events; your graph sees campaigns

A modern enterprise SIEM ingests anywhere from 100 GB to 50 TB of security telemetry per day. The view that telemetry produces is, structurally, a flat stream of events ordered by time. Each event is rich; the relationships between events are the analyst's responsibility to reconstruct, mostly in their head, mostly under pressure, mostly during the 30 minutes between an alert firing and a containment decision being needed.

A campaign — by which we mean a coordinated set of attacker actions over hours or days, carried out across multiple identities, hosts, and systems — is by definition a structure across many events. The SIEM's primary view does not show campaigns. It shows events that, if the analyst is good and lucky, can be assembled into a campaign retrospectively.

This is the gap the entity graph closes. This post walks through one realistic campaign in detail and shows what each layer (SIEM-only, graph-only, graph + GNN) actually surfaces.

The campaign

A scenario assembled from elements of three publicly disclosed 2024 incidents. Names are generic.

Day 1, 14:47. A helpdesk technician at MidCo receives a phone call from someone claiming to be Sara Linden, a senior systems engineer. The caller passes basic verification ("what's the name of your manager", "what office are you based out of") and requests a password reset. The technician resets Sara's AD password and SMS-verifies a new MFA device.

Day 1, 15:12. "Sara" logs into Okta from an IP geolocated to a residential ISP in a country Sara has never traveled to.

Day 1, 15:14. "Sara" enumerates her group memberships in Okta. She is a member of eng-aws-readonly, eng-vpn-users, and (via nested membership) infra-okta-admins-emergency-access.

Day 1, 15:18. "Sara" assumes the infra-okta-admins-emergency-access role and grants herself a temporary OIDC client.

Day 1, 15:24. The OIDC client is used to federate into Azure AD as a global administrator.

Day 2, 02:14. The Azure AD global admin role is used to enumerate service principals in the tenant. A service principal named legacy-mailbox-export-svc is discovered with Mail.Read.All on the entire tenant.

Day 2, 02:38. The legacy service principal authenticates and begins downloading mail from 14 executive mailboxes over the next 90 minutes.

Day 2, 04:07. Exfiltration completes. No data-loss-prevention alert fires because the egress is via the legacy service principal's whitelisted Microsoft Graph endpoint.

Day 2, 09:00. Sara arrives at work and discovers her actual password no longer works. She calls the helpdesk. A ticket is opened. The investigation begins 18 hours after the campaign started and after exfiltration completed.

What the SIEM-only SOC sees

The SIEM ingests every event in this timeline. Each event individually is unremarkable.

TimeEvent in SIEMDefault rule fires?
14:47AD password reset by helpdesk roleNo (legitimate workflow)
15:12Okta login from new geographyMaybe (geo-velocity rule) — common for mobile users; tuned down in most environments
15:14Okta group membership enumerationNo (administrative API call)
15:18Okta admin role assumption with emergency-access flagMaybe (rare event rule) — depends on tuning
15:24Azure AD global admin authentication via OIDCNo (legitimate federation)
02:14Azure AD service principal enumerationNo (admin tool usage)
02:38Service principal authentication and Graph API queriesNo (legitimate scoped permission)
04:07Mail download via Graph APINo (within whitelisted scope)

In the best-tuned SOCs, the geo-velocity rule and the emergency-access flag rule both fire. These are two events out of a few hundred during the campaign window. Both are routine enough that they are typically auto-closed by tier-1 analysts within minutes. The campaign is not visible in the alert stream.

A retrospective investigation, after Sara reports her account compromise the next morning, will eventually piece the timeline together by manually correlating across Okta logs, Azure AD logs, and Microsoft 365 audit logs. Median time-to-reconstruction for a campaign of this complexity, per industry incident-response data, is 6–18 hours. Total adversary dwell from foothold to exfiltration completion: 13 hours. The campaign is over before reconstruction begins.

What the graph-only SOC sees

A SOC running an identity graph (no learned model, just deterministic graph queries) sees the same events but stores them as edges and attributes on a graph. Two queries that the graph SOC has running continuously surface the campaign while it is happening:

Query 1: privilege elevation paths from low-trust contexts.

MATCH path = (i:Identity)-[:AUTH]->(s:Session)-[:ASSUMED*1..3]->(r:Role) WHERE s.geo_anomaly_score > 0.7 AND r.tag CONTAINS 'admin' AND duration_between(s.start, last(path).timestamp) < 30 minutes RETURN path ORDER BY r.privilege_level DESC

This query fires at 15:18 when Sara's anomalous-geography session reaches infra-okta-admins-emergency-access. The path returned is three edges: Sara → anomalous Okta session → emergency-access role. Severity high because the destination role is admin-tagged.

Query 2: lateral movement from suspicious sessions.

MATCH (suspicious:Identity)-[:AUTH]->(:Session)-[:OPS*1..5]->(target:Asset) WHERE suspicious.suspicion_score > 0.5 AND target.tag CONTAINS 'crown-jewel' RETURN target, count(*) AS path_count

At 02:38, when the legacy service principal begins downloading executive mailboxes, this query surfaces the path: anomalous-session-Sara → admin-role → Azure AD global admin → service-principal-enumeration → legacy-service-principal → executive-mailboxes. The destination is crown-jewel-tagged. The path is six edges, traversed in under twelve hours.

The graph SOC's alert fires at 02:38, three hours after the first inkling and 90 minutes before exfiltration completes. Containment decision can be made within minutes: revoke the legacy service principal's permissions, kill the Azure AD admin session, force Sara's account back through identity verification.

This is the graph-only case, with no machine learning involved. The detections are deterministic queries over a graph that captures the relationships the SIEM throws away. Adversary dwell from foothold to detection: 12 hours. Adversary dwell from foothold to exfiltration completion in the SIEM-only case: 13 hours, with 6+ hours of additional retrospective reconstruction. Even the deterministic graph case, with no AI, beats the SIEM case by an order of magnitude on time-to-detect.

What the graph + GNN SOC sees

Add a learned layer on top of the graph and the detection moves earlier.

A GNN trained on the tenant's accumulated analyst feedback learns, over months, that:

  • Helpdesk-mediated password resets followed within minutes by anomalous-geography logins are precursors to escalation 12% of the time (vs. 0.1% baseline for either signal alone).
  • Okta admin role assumptions immediately following a password-reset event are precursors to lateral movement 8% of the time.
  • The specific combination of eng-vpn-users membership plus access to infra-okta-admins-emergency-access via nested membership is structurally unusual relative to the tenant's other engineering identities.

The GNN's anomaly score for Sara's session at 15:14 — when she enumerates her group memberships — is already in the top 0.3% of all sessions for the tenant in the past 30 days. The alert fires at 15:14, four minutes after the anomalous geography login and an hour before any admin role assumption. Adversary dwell from foothold to detection: 27 minutes.

Three caveats on this number, in the spirit of not overclaiming:

  • The GNN requires accumulated tenant-specific labels. A tenant in their first month of deployment does not have this signal. The graph-only deterministic queries above remain the day-one fallback.
  • The GNN's 12% / 8% precursor rates are illustrative. Real precursor rates depend on the tenant's specific telemetry mix and triage culture.
  • The GNN is trained on the same graph the deterministic queries run against. It does not replace the graph; it is an additional layer that consumes the same substrate. If the graph is wrong (entity resolution failure, missed edges), the GNN inherits the failure.

Why the graph layer is the load-bearing one

Notice the structure of the argument. The interesting time-to-detect numbers are:

  • SIEM-only: 6–18+ hours (post-incident reconstruction)
  • Graph (deterministic): ~12 hours
  • Graph + GNN: ~30 minutes (after the GNN has been trained on tenant-specific feedback)

The biggest jump — from "no detection during the attack" to "detection during the attack" — comes from adding the graph, not from adding the model. The GNN improves time-to-detect from hours to minutes once the graph exists. Without the graph, no model has the structural inputs to make the detection at all.

This is why we lead with the graph in product positioning and lead with the model only after the graph is established. The model is a force multiplier; the graph is the prerequisite. SOCs that skip directly to "give me a model" without first building the graph end up with a model trained on flat events, which is essentially a souped-up rule engine and inherits all the rule-engine limitations described in our companion post on detection engineering.

What this requires, operationally

Three operational properties an identity graph has to have to serve as the substrate for the queries above:

Entity resolution across heterogeneous identifier spaces. Sara's AD account, her Okta user, her Azure AD identity, and the OIDC client she created all need to resolve to the same logical entity. This is hard. Vendors who claim "graph-based detection" without solving this problem are claiming a graph they don't actually have.

Temporal correctness over multi-month horizons. The graph needs to know that a role membership existed yesterday but not today. This requires monotonic upsert semantics on entity attributes — first_seen and last_seen that survive distributed out-of-order ingestion, edges that accumulate weight over co-occurrence rather than being overwritten. Most graph databases retain only current state and require add-on work to make temporal queries correct.

Sub-second per-source diffusion lookups. When an alert fires on identity X at 02:38, the SOC analyst cannot wait minutes for a graph traversal to compute. The product has to precompute multi-source structural exposure offline (refreshed nightly) and recompute per-source diffusion on-demand at sub-second latency for any node under active investigation.

These are engineering properties. They are not algorithmic novelties. But they are the work that distinguishes a product that "has a graph" from a product that has a graph SOCs can actually run detections against during incidents.

Closing

The SIEM is not going away. It will continue to be the source of normalized, queryable, retained event data. What it will not continue to be is the primary surface where SOCs reason about campaigns. That surface is migrating to the graph, slowly today, faster as the entity-resolution and temporal-correctness engineering matures.

The campaign described above is composed entirely of routine events, each individually within normal operational tolerances. The pattern is only visible if the events are seen as edges of a graph, not as rows in a stream. Once the events are edges, deterministic queries surface the campaign. Once a learned layer sits on top of those queries, the surfacing happens earlier in the campaign timeline.

Events tell you what happened. Graphs tell you what's happening. The difference is the difference between forensics and detection, and detection is what determines whether the executive mailboxes get exfiltrated.

SR

Setu Research

Setu Security Research