CE SentinelOne Assistant : New Features

Mar 31
9 min read

Part 1:

https://www.cyberengage.org/post/meet-the-ce-sentinelone-assistant-i-built-it-for-myself-but-you-can-try-it-too

1. DFIR Investigation Tab

The DFIR Investigation tab is the biggest addition to the CE S1 Assistant since launch. It takes a completely different approach to the problem — instead of helping you write queries to find things, it analyses logs you already have.

Here is the workflow it was built around.

You get an alert.
You open SentinelOne Deep Visibility and run a query across the affected endpoint.
You export the results as a JSON file.
You upload that file to the DFIR tab.
The tool processes it, anonymises all sensitive data, validates the result.
You get a full incident report back — verdict, confidence, attack chain, indicators, immediate actions.

The analysis that used to take an analyst an hour or two of manually correlating events gets done in seconds. And because the data is anonymized before it leaves your browser session, you can run it against real incident data without exposing confidential information.

What the Upload Does?

When you upload a Deep Visibility JSON export, the tool runs through four stages . These stages happen automatically — you do not configure them.

Stage	What Happens
1. Load & Parse	Reads your JSON file. Handles both raw array exports and wrapped exports. Detects multi-endpoint files early.
2. Anonymise	Replaces real usernames, hostnames, internal IPs, and client names with numbered labels (USER-001, ENDPOINT-001, etc.). Catches names anywhere they appear — in file paths, command lines, process names.
3. Validate	Scans every field of every event for any real value that survived anonymisation. If anything is found, the submission is blocked and you are told exactly what needs fixing.
4. Build Brief	Constructs a structured incident brief from the cleaned data — process timeline, DNS activity, network connections, file operations, and SentinelOne behavioural indicators — then sends it for analysis.

The Anonymisation System — Why It Matters?

This is the part I want to explain in detail because it is the part most people do not expect.

When you are dealing with a real incident, the logs contain real data. Real names in file paths. Real hostnames. Confidential things in process arguments or script variables. If you paste that directly I, you have just sent potentially sensitive personal and business data.

The anonymisation system I built specifically to prevent that.

What Gets Replaced?

Usernames — replaced with USER-001, USER-002, etc. (consistent across the entire file)
Endpoint hostnames — replaced with ENDPOINT-001, ENDPOINT-002, etc.
Internal IP addresses — replaced with INTERNAL-IP-001, INTERNAL-IP-002, etc.
Company names — replaced with COMPANY-001 fragments (you specify these if needed)
Credentials found in command lines — Bearer tokens, passwords, API keys — replaced with [REDACTED-CRED]
Cloud keys — AWS AKIA keys, Azure AccountKey, api_key= values — replaced with [REDACTED-CLOUD-KEY]

The replacement is case-insensitive and not path-based. If a username appears anywhere — in a fie path, in a process name, in a command line argument, in a script parameter — it gets caught and replaced. This works on macOS paths, Windows paths, and Linux paths without any configuration.

The Validation Block?

After anonymization, the tool runs a separate validation pass before it will allow the data to be analyzed . It scans every field of every event for any real value that survived — case-insensitively.

Hard Block: If any real username, hostname, or company name fragment is found anywhere in the cleaned data, the submission is blocked entirely. You see a list of exactly which values were found and in which events.

This is not a warning — it is a hard block. The data does not move until the validation passes. This is deliberate.

What You See in the UI?

After upload and anonymisation, the tool shows you a summary panel before it runs the investigation:

How many events were in the file
What was replaced — a map of real name → anonymised label (visible in your browser only, never stored, gone on refresh)
Event type breakdown — how many Process Creation, DNS, Network, File events
Any behavioural indicators SentinelOne flagged in the raw data
Any warnings (e.g. a hostname that looks like it might be a client machine)

The real values shown in the mapping panel exist only in your browser session. They are never sent anywhere, never logged, and disappear when you refresh the page.

What the Investigation Report Contains?

Once the brief is built and validated, investigation starts. The report that comes back is structured into specific sections:

Attack Chain — chronological reconstruction of what happened, from first event to last observed
Indicators of Compromise — hashes, IPs, domains, file paths, process names extracted directly from the logs
MITRE ATT&CK Techniques — technique IDs and names mapped to observed behaviours
Additional Logs Needed — if the log window was too short or missing event types, the report flags what you should pull next
Verdict — MALICIOUS / SUSPICIOUS / FALSE POSITIVE / INCONCLUSIVE
Confidence — percentage confidence in the verdict with reasoning
Key Findings — the three to five most important things the analysis found

-------------------------------------------------------------------------------------------------------------

2. Three Investigation Modes

The DFIR tab does not do one thing. It has three distinct investigation modes because different situations call for different approaches. You select the mode before submitting.

Mode 1 — Full Investigation

Use this when you want the complete picture. You do not know yet what happened, or you want confirmation of what you suspect. You upload the JSON and let the tool do the full sweep.

What it does:

Builds a complete incident brief from all events in the file
Perform full DFIR investigation.
Returns the structured report covering attack chain, IOCs, techniques, verdict, and actions

Best for:

· New alerts where you do not yet know the scope

· Confirming or ruling out a suspected compromise

· Generating a report you can share with the team or document in a ticket

Mode 2 — Targeted (Ask a Question)

Use this when you have a specific question about the incident and you do not need the full report. You upload the same JSON, type your question, and the tool pulls out only the events relevant to that question.

Examples of targeted questions:

"Was there any lateral movement in this data?"
"What persistence mechanisms were set up on this endpoint?"
"Did the suspicious process make any outbound connections?"
"Was there any credential access activity?"
"Did anything run from a temp directory?"

Instead of analysing the full event timeline, the tool uses a separate extraction function that filters and summarises only the events that are relevant to your specific question. This makes the response faster, more focused.

The answer format is always:

· YES / NO / CANNOT DETERMINE — a direct answer to your question

· Evidence — the specific events that support the answer

· Context — what those events actually mean in terms of attacker behaviour

Mode 3 — IOC Hunt

Use this when you have a list of IOCs — from a threat report, from a feed, from another analyst — and you want to know if any of them appear in your log data. You do not need a question. You just give it the IOCs and it tells you what matched.

Input format — one IOC per line, any mix of:

SHA256 / SHA1 / MD5 hashes
IP addresses
Domains and URLs (defanging is handled automatically — hxxp becomes http, [.] becomes )
File paths
Process names
Keywords

What you get back for each IOC:

· FOUND or NOT FOUND

· How many times it appeared in the log data

· First seen and last seen timestamps

· Which processes were associated with it

· Classification — TRUE POSITIVE / FALSE POSITIVE / SUSPICIOUS

This mode direct pattern scan against the event data.

-------------------------------------------------------------------------------------------------------------

3. Follow-up Chat

After an investigation runs — in any mode — there is a chat panel directly below the results. You can keep asking questions about the same incident without re-uploading anything.

The way this works is important. The context from the investigation — the anonymised brief, the original report, the IOC list if you ran an IOC hunt — is carried forward into the conversation.

What You Can Ask

The follow-up chat is not a general assistant. It is grounded in the incident data from your upload. Things it handles well:

Expanding on something in the report — "tell me more about that bypass"
Asking about specific techniques — "what does xattr -c actually do and why does it matter?"
Asking for a query based on findings — "give me an S1QL query to hunt for this on other endpoints"
Asking about a specific IOC — "what is the typical behaviour associated with this process name?"
Asking about next steps — "what else should I pull to confirm lateral movement?"
Asking for a summary in a specific format — "summarise this for a non-technical stakeholder"

Session Behaviour

The conversation context resets if you upload a new file or refresh the page
There is no history saved from the chat — if you need to keep something, copy it before navigating away

-------------------------------------------------------------------------------------------------------------

4. Sigma Rule Library

Sigma is the universal detection rule format for the security community. Rules written in Sigma describe attacker behaviour in a way that can theoretically be converted to any SIEM or EDR query language. There are thousands of community-contributed Sigma rules covering almost every known attack technique.

The problem for SentinelOne users is that Sigma rules need to be converted to S1QL before you can use them in Deep Visibility. The field names are different. The operators are different. Sigma uses logsource categories that need to be mapped to SentinelOne event types. Getting it right manually takes time and good knowledge of both formats.

The Sigma Rule Library solves this in two ways.

Part 1 — Pre-Converted Community Rules

The library contains Sigma rules from the upstream community repository that have already been converted to S1QL and verified to work. You browse, filter, find what you need, and copy the query directly.

Filter options:

Platform — Windows, Linux, macOS, or all
Severity — Critical, High, Medium, Low
Status — Verified, Unverified, Failed (so you know which ones have been tested)
MITRE ATT&CK tactic and technique
Free text search across rule titles and descriptions

Each rule card shows the Sigma rule title, the mapped MITRE technique, the severity level, the platform, and the converted S1QL query. You can expand the card to see the full query and copy it with one click.

The library syncs from the upstream Sigma repository automatically, so it stays current as new community rules are released.

Part 2 — Custom Sigma Converter

This is the part I find most useful day to day.

You paste any Sigma YAML — a rule from GitHub, a rule from a threat report, a rule a colleague sent you, or one you wrote yourself — and the tool converts it to S1QL immediately.

What the converter handles:

Field name mapping — Sigma's Image
Logsource translation — mapped to S1 event types
Operator conversion — contains, startswith, endswith, re translated to S1QL equivalents
Detection logic — all, any, not conditions preserved in the S1QL output
Validation — tells you if any part of the rule cannot be translated and why

Why this matters: Every time a new threat report drops with a Sigma detection rule attached, you no longer need to manually work out the S1QL translation. Paste it in, get the query, start hunting. The field mapping knowledge that used to live in your head is now handled automatically.

-------------------------------------------------------------------------------------------------------------

5. Query Feedback System

In the original launch article I mentioned a feedback system was coming. It is live now.

The problem it solves is straightforward. The natural language query generator is good, but it is not perfect.

Sometimes a query comes back with a wrong field name. Sometimes the operator is right but the filter logic is off. Sometimes the query works technically but misses what the analyst actually needed.

Before the feedback system, those issues would disappear. I would edit the query manually and move on. Nobody would know the generator got it wrong, and the same mistake would happen again for the next person who asked a similar question.

How It Works — Analyst Side

Every generated query now has a flag button next to it. If the query does not work — or does not do what you expected — you click it.

You are asked two things:

· What was wrong with the query?

· What did you actually need?

That feedback, along with the original input and the generated query, gets submitted and shows up in the admin review panel. You do not need to do anything else.

How It Works — Admin Side

The admin panel has a dedicated Feedback section that shows every submission. For each one you can see:

The original natural language input the analyst typed
The query the tool generated
The feedback text explaining what was wrong
Status — Pending, Reviewed, or Dismissed
Submission timestamp

Why This Matters Long-Term

The quality of the query generator improves over time because the failure signals are visible. Without feedback, you are flying blind — you know the tool is not perfect but you do not know where or why. With the feedback system, you see exactly which inputs produce bad outputs and you can fix the prompt or the field schema specifically for those cases.

----------------------------------------------------Dean----------------------------------------------------

I'm still working on a fully offline, self-hosted version — something you can spin up yourself on your own machine. No cloud, no dependencies. It's not ready yet but I'm heads down on it — watch this space.

CE SentinelOne Assistant : New Features

Recent Posts

1 Comment

Subscribe to our newsletter