Extracting Evidence with extract()
Sometimes the evidence is not sitting neatly between two predictable pieces of text.
It might be hidden inside a command line, URL, subject line, file path, alert field or custom log message.
This is where extract() becomes useful. Instead of relying on a fixed string layout, defenders can use a regular expression capture group to pull out the value they need.
In this Agent Foskett Academy lesson, you will learn how defenders use the KQL extract() function to find IP addresses, domains, identifiers and other investigation evidence inside Microsoft Defender XDR and Microsoft Sentinel telemetry.
Lesson overview
Learn how extract() helps defenders pull evidence from flexible text patterns using regex capture groups.
Why extract() matters
This is useful when the evidence is present, but the surrounding text may change between events.
Instead of manually reading long strings, defenders can extract the exact indicator they need into a clean column.
Investigation scenario
Some command lines contain IP addresses. Some URLs contain suspicious domains. Some email subjects contain case numbers and tracking identifiers.
The formats are not perfectly consistent, so the analyst uses extract() to pull the useful evidence out with regex capture groups.
Step 1 — Extract an IP address from a command line
- 1
- 2
- 3
- 4
- 5
- 6
- 7
DeviceProcessEvents | where Timestamp > ago(7d) | where ProcessCommandLine matches regex @"\d{1,3}(\.\d{1,3}){3}" | extend ExtractedIP = extract(@"(\d{1,3}(?:\.\d{1,3}){3})", 1, ProcessCommandLine) | project Timestamp, DeviceName, AccountName, FileName, ExtractedIP, ProcessCommandLine | sort by Timestamp desc
Step 2 — Extract a domain from a URL
- 1
- 2
- 3
- 4
- 5
- 6
UrlClickEvents | where Timestamp > ago(30d) | extend ClickedDomain = extract(@"https?://([^/]+)", 1, Url) | project Timestamp, AccountUpn, ClickedDomain, Url, ActionType, ThreatTypes | sort by Timestamp desc
Step 3 — Extract an identifier from an email subject
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
EmailEvents | where Timestamp > ago(30d) | extend TicketNumber = extract(@"Ticket[- ]?(\d+)", 1, Subject) | where isnotempty(TicketNumber) | project Timestamp, SenderFromAddress, RecipientEmailAddress, TicketNumber, Subject, DeliveryAction | sort by Timestamp desc
What extract() does
The capture group is usually wrapped in brackets. That is the part of the pattern you want KQL to return.
Step 4 — Extract a suspicious file extension
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
DeviceFileEvents | where Timestamp > ago(14d) | extend FileExtension = extract(@"\.([A-Za-z0-9]+)$", 1, FileName) | where FileExtension in~ ("exe", "dll", "ps1", "vbs", "js") | project Timestamp, DeviceName, ActionType, FileName, FileExtension, FolderPath | sort by Timestamp desc
Step 5 — Extract values from additional fields
- 1
- 2
- 3
- 4
- 5
- 6
- 7
DeviceEvents | where Timestamp > ago(7d) | where AdditionalFields has "RemoteIP" | extend RemoteIP = extract(@'"RemoteIP"\s*:\s*"([^"]+)"', 1, tostring(AdditionalFields)) | project Timestamp, DeviceName, ActionType, RemoteIP, AdditionalFields | sort by Timestamp desc
Step 6 — Combine extract() with summarize
- 1
- 2
- 3
- 4
- 5
- 6
- 7
UrlClickEvents | where Timestamp > ago(30d) | extend ClickedDomain = extract(@"https?://([^/]+)", 1, Url) | where isnotempty(ClickedDomain) | summarize ClickCount = count(), Users = dcount(AccountUpn) by ClickedDomain | top 25 by ClickCount desc
Investigator notes
Keep the regex as simple as possible. Start with one value, confirm the extracted column is correct, then build the rest of the investigation around it.
What you learned
Continue your investigation
Continue learning with Extracting Evidence with parse, Working with parse_json(), KQL Threat Hunting Guide and Microsoft Security.
Develop IT. Protect IT. GEMXIT PTY LTD | GEMXIT UK LTD