Let’s Go Practical: Working with NetFlow Using nfdump Tools
- 4 hours ago
- 6 min read

Enough theory.
Now let’s actually touch NetFlow data.
If you’re doing DFIR, threat hunting, or even basic network investigations, one toolkit you must be comfortable with is the nfdump suite.
This suite gives you three extremely important tools:
nfcapd – the collector
nfpcapd – the pcap-to-NetFlow converter
nfdump – the analysis engine
-----------------------------------------------------------------------------------------------------------
nfcapd: The NetFlow Collector (Where Everything Starts)
nfcapd is a daemon, not a one-time command.
Its job is simple:
listen on a UDP port
receive NetFlow data from trusted exporters (routers, firewalls, switches)
write that data to disk in a compact binary format
It supports:
NetFlow v5, v7, v9
IPFIX
sFlow
So regardless of vendor or flow standard, nfcapd usually has you covered.
How Much Storage Do You Actually Need?
This is one of the first questions everyone asks.
A rough rule of thumb:
~1 MB of NetFlow data for every 2 GB of network traffic
Is this perfect? No.
Is it useful for planning? Yes.
Your actual numbers will depend on:
number of flows
traffic patterns
sampling
exporter behavior
But it’s a good starting point when designing storage.
How nfcapd Stores Data (And Why It Matters)
When nfcapd writes flow data, it uses a very clean naming scheme:
nfcapd.YYYYMMDDhhmm
Example:
nfcapd.201302262305
Why this matters:
files sort naturally by time
no database needed
easy scripting
easy forensic timelines
By default, nfcapd rotates files every 5 minutes.
That means:
288 files per exporter per day
predictable storage growth
easy time slicing during investigations
-----------------------------------------------------------------------------------------------------------
Bonus Feature: Flow Forwarding (-R Option)
One very underrated feature of nfcapd is flow forwarding.
You can collect NetFlow and forward it to another collector at the same time.
Example scenario:
local collection for DFIR
central collection for SOC visibility
Example command:
nfcapd -p 1025 -w -D -R 10.0.0.1/1025 \
-n router,10.0.0.2,/var/local/flows/router1Command breakdown:
nfcapd - NetFlow capture daemon
-p 1025 - Listen on port 1025 for incoming NetFlow packets
-w - Align file rotation to the next interval (e.g., start at the top of the hour)
-D - Run as a daemon (background process)
-R 10.0.0.1/1025 - Act as a repeater/forwarder: send received flows to IP 10.0.0.1 on port 1025
-n router,10.0.0.2,/var/local/flows/routerlogs - Define an identification string:
router - Identifier name for this source
10.0.0.2 - Expected source IP address
/var/local/flows/routerlogs - Directory where flow files will be stored
In summary:
This command starts a NetFlow collector that listens on port 1025, stores flow data from router at 10.0.0.2 into /var/local/flows/routerlogs, and simultaneously forwards the data to another collector at 10.0.0.1/1025. It runs in the background as a daemon.
This is extremely useful in larger environments.
nfpcapd: Turning PCAPs into NetFlow
Now this is where DFIR people should pay attention.
nfpcapd lets you:
take a pcap file and convert it into NetFlow-style records
Why does this matter?
Because parsing large pcaps is:
slow
CPU-heavy
painful at scale
NetFlow-based analysis is orders of magnitude faster.
So the smart workflow is:
Convert pcap → NetFlow
Hunt quickly using NetFlow
Go back to full pcap only where needed
Example:
nfpcapd -r bigFlows.pcap -l /mnt/c/Users/Akash/Downloads/
This step alone can save hours or days in an investigation.
-----------------------------------------------------------------------------------------------------------
nfdump: Where the Real Analysis Happens
Once flows are collected (or converted), this is where we start asking questions.
nfdump is a command-line NetFlow analysis tool.
It:
reads nfcapd binary files
applies filters
summarizes results
responds very fast — even on huge datasets
Important point:
nfdump does not magically find “bad traffic”
Its power comes from:
how you ask questions
how you refine hypotheses
how you chain queries together
This is investigative work, not alert-driven work.
-----------------------------------------------------------------------------------------------------------
Reading NetFlow Data with nfdump
You can read:
a single file
or an entire directory tree
Reading a Single File
nfdump -r /mnt/c/Users/Akash/Downloads/nfcapd.201302262305
This reads:
flows from one exporter
for a specific 5-minute window
Perfect for targeted investigations.
Reading a Directory (Much More Common)
nfdump -R /mnt/c/Users/Akash/Downloads/test/
This tells nfdump:
recursively walk the directory
read all NetFlow files inside

This is how you analyze:
days
weeks
months of traffic
-----------------------------------------------------------------------------------------------------------
Building Real Investigative Queries
Let’s look at a realistic example.
Goal: Find internal systems that accessed internet web servers without using the corporate proxy.
Conditions:
traffic passed through internet-facing router
destination ports 80 or 443
exclude proxy IP 172.0.1.1
specific 24-hour window
show only top 10 systems
Command:
nfdump -R /mnt/c/Users/Akash/Downloads/test/ \
-t '2026/01/12.12:00:00-2026/01/13.12:00:00' \
-c 10 'proto tcp and (dst port 80 or dst port 443) and not src host 172.0.1.1'
This is classic NetFlow hunting:
scoped
fast
hypothesis-driven
From here, you pivot:
which hosts?
how often?
how much data?
where did they connect?
-----------------------------------------------------------------------------------------------------------
line Output (Default, Lightweight)
This is the default view and the one you’ll see most often when you’re doing quick scoping.
It shows:
start and end time
source and destination IPs
ports
protocol
bytes and packets
Example:
nfdump -R /mnt/c/Users/Akash/Downloads/test -o line host 172.16.128.169
This is perfect when you’re asking:
“Is this IP even talking on my network?”
Fast. Minimal. No noise.
-----------------------------------------------------------------------------------------------------------
2. long Output (Adds TCP Flags)
The long format builds on line and adds:
TCP flags
Type of Service (ToS)
Example:
nfdump -R /mnt/c/Users/Akash/Downloads/test -o long 'proto tcp and port 445'

Why this matters:
TCP flags tell a story
SYN-only traffic looks very different from established sessions
RST storms, half-open connections, or scanning behavior start to stand out
Important reminder: Each line is unidirectional.
A normal bidirectional conversation:
client → server
server → client
…will always appear as two separate flow records.
This trips people up early on.
-----------------------------------------------------------------------------------------------------------
3. extended Output (Adds Statistics)
This is where things get interesting.
The extended format adds derived values, calculated at query time:
packets per second
bits per second
bytes per packet
Example:
nfdump -R /mnt/c/Users/Akash/Downloads/test -o extended 'proto tcp and port 445'

These values help you distinguish:
interactive shells (low & slow)
file transfers (fast ramp-up, steady throughput)
dormant C2 channels (tiny but persistent)
None of this data is stored explicitly — it’s derived — but it’s incredibly useful for behavioral analysis.
-----------------------------------------------------------------------------------------------------------
IPv6 Note (Important but Often Missed)
nfdump fully supports IPv6, but truncates addresses by default for readability.

If you want full IPv6 visibility, use:
line6
long6
extended6
Same formats — just IPv6-aware.
-----------------------------------------------------------------------------------------------------------
Practical Hunt: Finding Patient Zero Using NetFlow
Now let’s do real hunting, not theory.
Goal:
Identify internal hosts communicating with this C2
Step 1: First Hits of the Day
Start with a known NetFlow file
Ask:
“Who talked to this IP first today?”
nfdump -R /mnt/c/Users/Akash/Downloads/test -O tstart -c 5 'proto tcp and dst port 8014 and host 172.16.128.169

We see the first hit into the day.
That’s early — but maybe not early enough.
Step 2: Expand the Time Window (Overnight)
If the first hit isn’t at the beginning of the capture window, that’s a signal.
So we expand: (overnight window)
nfdump -R /mnt/c/Users/Akash/Downloads/test -t '2013/02/26.23:00:00-2013/02/26.23:10:60' -O tstart -c 1 'proto tcp and dst port 8014 and host 172.16.128.169'
Step 3: What Else Did Patient Zero Do?
Now we pivot.
Same time window, but focus on the internal host itself:
nfdump -R /mnt/c/Users/Akash/Downloads/test -t '2013/02/26.23:00:00-2013/02/26.23:10:60' -O tstart 'host 172.16.128.169'This answers:
what happened before C2?
was there a download?
was there lateral movement?
did anything precede the UDP traffic?
Step 4: Infrastructure Expansion Using ASN Analysis
172.16.128.169
Using WHOIS:
whois 172.16.128.169 | grep AS
Step 5: Hunt the “Internet Neighborhood”
If the attacker uses one provider, they may use more infrastructure in the same ASN.
So we ask:
“Who talked to this ASN all month?”
nfdump -q -R /mnt/c/Users/Akash/Downloads/test -o 'fmt:$sa $da' 'dst as 36351' | sort | uniqWhat this gives you
$sa → source IP (internal)
$da → destination IP (external)
Deduplicated list of unique communications
Viewing Minimal Samples for Orientation
Sometimes you just want a quick sanity check:
nfdump -R mnt/c/Users/Akash/Downloads/test -t '2013/02/26.23:00:00-2013/02/26.23:10:60' -O tstart -c 1
Or inspecting a single file:
nfdump -r mnt/c/Users/Akash/Downloads/test/nfcapd.201302262305 -O tstart -c 5
These commands are underrated — they help you:
Validate time ranges
Confirm exporter behavior
Avoid wrong assumptions early
------------------------------------------------------------------------------------------------------------
Why Aggregation Changes Everything
Because flows are split across files, one real-world connection may appear as many records.
By default, nfdump aggregates using five key values:
Source IP
Destination IP
Protocol
Source Port
Destination Port
Flows sharing these values are merged into a single logical event.
Detecting Port Scanning with Custom Aggregation
Port scanners behave differently:
Source port changes constantly
Target port stays fixed
nfdump -q -R mnt/c/Users/Akash/Downloads/test -O bytes -A srcip, proto, dstport -o 'fmt: $sa -> $pr $dp $byt $fl'
This answers:
Who is consuming the most bandwidth
Which protocol and port
How many flows were involved
Great for:
Data exfiltration hunting
Rogue services
Abnormal internal behavior
Using “TopN” Statistics for Threat Hunting
Most engineers use TopN for bandwidth.
Investigators use it differently.
Syntax
-s statistic[:p][/orderby]
Example
nfdump -R /mnt/c/Users/Akash/Downloads/test/ -s ip/bytes -s dstport:p/bytes -n 5

Why this matters
Identify staging systems (high outbound bytes)
Detect scanners (high flow counts)
Separate TCP vs UDP behavior with :p
TopN becomes powerful only when driven by intelligence, not curiosity.
------------------------------------------------------------------------------------------------------------
Final Thoughts
nfdump isn’t flashy. It doesn’t decrypt payloads. It doesn’t show malware strings.
But when used correctly, it tells you:
Who talked
For how long
How often
And how much data moved
In real investigations, that context is often enough to confirm compromise, scope incidents, and prioritize response.
----------------------------------------------Dean-------------------------------------------------------


Comments