Let’s Go Practical: Working with NetFlow Using nfdump Tools

Jan 21
6 min read

Enough theory.

Now let’s actually touch NetFlow data.

If you’re doing DFIR, threat hunting, or even basic network investigations, one toolkit you must be comfortable with is the nfdump suite.

This suite gives you three extremely important tools:

nfcapd – the collector
nfpcapd – the pcap-to-NetFlow converter
nfdump – the analysis engine

-----------------------------------------------------------------------------------------------------------

nfcapd: The NetFlow Collector (Where Everything Starts)

nfcapd is a daemon, not a one-time command.

Its job is simple:

listen on a UDP port
receive NetFlow data from trusted exporters (routers, firewalls, switches)
write that data to disk in a compact binary format

It supports:

NetFlow v5, v7, v9
IPFIX
sFlow

So regardless of vendor or flow standard, nfcapd usually has you covered.

How Much Storage Do You Actually Need?

This is one of the first questions everyone asks.

A rough rule of thumb:

~1 MB of NetFlow data for every 2 GB of network traffic

Is this perfect? No.

Is it useful for planning? Yes.

Your actual numbers will depend on:

number of flows
traffic patterns
sampling
exporter behavior

But it’s a good starting point when designing storage.

How nfcapd Stores Data (And Why It Matters)

When nfcapd writes flow data, it uses a very clean naming scheme:

nfcapd.YYYYMMDDhhmm

Example:

nfcapd.201302262305

Why this matters:

files sort naturally by time
no database needed
easy scripting
easy forensic timelines

By default, nfcapd rotates files every 5 minutes.

That means:

288 files per exporter per day
predictable storage growth
easy time slicing during investigations

-----------------------------------------------------------------------------------------------------------

Bonus Feature: Flow Forwarding (-R Option)

One very underrated feature of nfcapd is flow forwarding.

You can collect NetFlow and forward it to another collector at the same time.

Example scenario:

local collection for DFIR
central collection for SOC visibility

Example command:

nfcapd -p 1025 -w -D -R 10.0.0.1/1025 \
-n router,10.0.0.2,/var/local/flows/router1

Command breakdown:

nfcapd - NetFlow capture daemon
-p 1025 - Listen on port 1025 for incoming NetFlow packets
-w - Align file rotation to the next interval (e.g., start at the top of the hour)
-D - Run as a daemon (background process)
-R 10.0.0.1/1025 - Act as a repeater/forwarder: send received flows to IP 10.0.0.1 on port 1025
-n router,10.0.0.2,/var/local/flows/routerlogs - Define an identification string:
- router - Identifier name for this source
- 10.0.0.2 - Expected source IP address
- /var/local/flows/routerlogs - Directory where flow files will be stored

In summary:

This command starts a NetFlow collector that listens on port 1025, stores flow data from router at 10.0.0.2 into /var/local/flows/routerlogs, and simultaneously forwards the data to another collector at 10.0.0.1/1025. It runs in the background as a daemon.

This is extremely useful in larger environments.

nfpcapd: Turning PCAPs into NetFlow

Now this is where DFIR people should pay attention.

nfpcapd lets you:

take a pcap file and convert it into NetFlow-style records

Why does this matter?

Because parsing large pcaps is:

slow
CPU-heavy
painful at scale

NetFlow-based analysis is orders of magnitude faster.

So the smart workflow is:

Convert pcap → NetFlow
Hunt quickly using NetFlow
Go back to full pcap only where needed

Example:

nfpcapd -r bigFlows.pcap -l /mnt/c/Users/Akash/Downloads/

This step alone can save hours or days in an investigation.

-----------------------------------------------------------------------------------------------------------

nfdump: Where the Real Analysis Happens

Once flows are collected (or converted), this is where we start asking questions.

nfdump is a command-line NetFlow analysis tool.

It:

reads nfcapd binary files
applies filters
summarizes results
responds very fast — even on huge datasets

Important point:

nfdump does not magically find “bad traffic”

Its power comes from:

how you ask questions
how you refine hypotheses
how you chain queries together

This is investigative work, not alert-driven work.

-----------------------------------------------------------------------------------------------------------

Reading NetFlow Data with nfdump

You can read:

a single file
or an entire directory tree

Reading a Single File

nfdump -r /mnt/c/Users/Akash/Downloads/nfcapd.201302262305

This reads:

flows from one exporter
for a specific 5-minute window

Perfect for targeted investigations.

Reading a Directory (Much More Common)

nfdump -R /mnt/c/Users/Akash/Downloads/test/

This tells nfdump:

recursively walk the directory
read all NetFlow files inside

This is how you analyze:

days
weeks
months of traffic

-----------------------------------------------------------------------------------------------------------

Building Real Investigative Queries

Let’s look at a realistic example.

Goal: Find internal systems that accessed internet web servers without using the corporate proxy.

Conditions:

traffic passed through internet-facing router
destination ports 80 or 443
exclude proxy IP 172.0.1.1
specific 24-hour window
show only top 10 systems

Command:

nfdump -R /mnt/c/Users/Akash/Downloads/test/ \
-t '2026/01/12.12:00:00-2026/01/13.12:00:00' \
-c 10 'proto tcp and (dst port 80 or dst port 443) and not src host 172.0.1.1'

This is classic NetFlow hunting:

scoped
fast
hypothesis-driven

From here, you pivot:

which hosts?
how often?
how much data?
where did they connect?

-----------------------------------------------------------------------------------------------------------

line Output (Default, Lightweight)

This is the default view and the one you’ll see most often when you’re doing quick scoping.

It shows:

start and end time
source and destination IPs
ports
protocol
bytes and packets

Example:

nfdump -R /mnt/c/Users/Akash/Downloads/test -o line host 172.16.128.169

This is perfect when you’re asking:

“Is this IP even talking on my network?”

Fast. Minimal. No noise.

-----------------------------------------------------------------------------------------------------------

2. long Output (Adds TCP Flags)

The long format builds on line and adds:

TCP flags
Type of Service (ToS)

Example:

nfdump -R /mnt/c/Users/Akash/Downloads/test -o long 'proto tcp and port 445'

Why this matters:

TCP flags tell a story
SYN-only traffic looks very different from established sessions
RST storms, half-open connections, or scanning behavior start to stand out

Important reminder: Each line is unidirectional.

A normal bidirectional conversation:

client → server
server → client

…will always appear as two separate flow records.

This trips people up early on.

-----------------------------------------------------------------------------------------------------------

3. extended Output (Adds Statistics)

This is where things get interesting.

The extended format adds derived values, calculated at query time:

packets per second
bits per second
bytes per packet

Example:

nfdump -R /mnt/c/Users/Akash/Downloads/test -o extended 'proto tcp and port 445'

These values help you distinguish:

interactive shells (low & slow)
file transfers (fast ramp-up, steady throughput)
dormant C2 channels (tiny but persistent)

None of this data is stored explicitly — it’s derived — but it’s incredibly useful for behavioral analysis.

-----------------------------------------------------------------------------------------------------------

IPv6 Note (Important but Often Missed)

nfdump fully supports IPv6, but truncates addresses by default for readability.

If you want full IPv6 visibility, use:

line6
long6
extended6

Same formats — just IPv6-aware.

-----------------------------------------------------------------------------------------------------------

Practical Hunt: Finding Patient Zero Using NetFlow

Now let’s do real hunting, not theory.

Goal:

Identify internal hosts communicating with this C2

Step 1: First Hits of the Day

Start with a known NetFlow file

Ask:

“Who talked to this IP first today?”

nfdump -R /mnt/c/Users/Akash/Downloads/test -O tstart -c 5 'proto tcp and dst port 8014 and host 172.16.128.169

We see the first hit into the day.

That’s early — but maybe not early enough.

Step 2: Expand the Time Window (Overnight)

If the first hit isn’t at the beginning of the capture window, that’s a signal.

So we expand: (overnight window)

nfdump -R /mnt/c/Users/Akash/Downloads/test -t '2013/02/26.23:00:00-2013/02/26.23:10:60' -O tstart -c 1 'proto tcp and dst port 8014 and host 172.16.128.169'

Step 3: What Else Did Patient Zero Do?

Now we pivot.

Same time window, but focus on the internal host itself:

nfdump -R  /mnt/c/Users/Akash/Downloads/test -t '2013/02/26.23:00:00-2013/02/26.23:10:60' -O tstart 'host 172.16.128.169'

This answers:

what happened before C2?
was there a download?
was there lateral movement?
did anything precede the UDP traffic?

Step 4: Infrastructure Expansion Using ASN Analysis

172.16.128.169

Using WHOIS:

whois 172.16.128.169 | grep AS

Step 5: Hunt the “Internet Neighborhood”

If the attacker uses one provider, they may use more infrastructure in the same ASN.

So we ask:

“Who talked to this ASN all month?”

nfdump -q -R /mnt/c/Users/Akash/Downloads/test -o 'fmt:$sa $da' 'dst as 36351' | sort | uniq

What this gives you

$sa → source IP (internal)
$da → destination IP (external)
Deduplicated list of unique communications

Viewing Minimal Samples for Orientation

Sometimes you just want a quick sanity check:

nfdump -R mnt/c/Users/Akash/Downloads/test -t '2013/02/26.23:00:00-2013/02/26.23:10:60' -O tstart -c 1

Or inspecting a single file:

nfdump -r mnt/c/Users/Akash/Downloads/test/nfcapd.201302262305 -O tstart -c 5

These commands are underrated — they help you:

Validate time ranges
Confirm exporter behavior
Avoid wrong assumptions early

------------------------------------------------------------------------------------------------------------

Why Aggregation Changes Everything

Because flows are split across files, one real-world connection may appear as many records.

By default, nfdump aggregates using five key values:

Source IP
Destination IP
Protocol
Source Port
Destination Port

Flows sharing these values are merged into a single logical event.

Detecting Port Scanning with Custom Aggregation

Port scanners behave differently:

Source port changes constantly
Target port stays fixed

nfdump -q -R mnt/c/Users/Akash/Downloads/test -O bytes -A srcip, proto, dstport  -o 'fmt: $sa -> $pr $dp $byt $fl'

This answers:

Who is consuming the most bandwidth
Which protocol and port
How many flows were involved

Great for:

Data exfiltration hunting
Rogue services
Abnormal internal behavior

Using “TopN” Statistics for Threat Hunting

Most engineers use TopN for bandwidth.

Investigators use it differently.

Syntax

-s statistic[:p][/orderby]

Example

nfdump -R /mnt/c/Users/Akash/Downloads/test/ -s ip/bytes -s dstport:p/bytes -n 5

Why this matters

Identify staging systems (high outbound bytes)
Detect scanners (high flow counts)
Separate TCP vs UDP behavior with :p

TopN becomes powerful only when driven by intelligence, not curiosity.

------------------------------------------------------------------------------------------------------------

Final Thoughts

nfdump isn’t flashy. It doesn’t decrypt payloads. It doesn’t show malware strings.

But when used correctly, it tells you:

Who talked
For how long
How often
And how much data moved

In real investigations, that context is often enough to confirm compromise, scope incidents, and prioritize response.

----------------------------------------------Dean-------------------------------------------------------

Let’s Go Practical: Working with NetFlow Using nfdump Tools

nfcapd: The NetFlow Collector (Where Everything Starts)

How Much Storage Do You Actually Need?

How nfcapd Stores Data (And Why It Matters)

Bonus Feature: Flow Forwarding (-R Option)

nfpcapd: Turning PCAPs into NetFlow

nfdump: Where the Real Analysis Happens

Reading NetFlow Data with nfdump

Reading a Single File

Reading a Directory (Much More Common)

Building Real Investigative Queries

line Output (Default, Lightweight)

2. long Output (Adds TCP Flags)

3. extended Output (Adds Statistics)

IPv6 Note (Important but Often Missed)

Practical Hunt: Finding Patient Zero Using NetFlow

Step 1: First Hits of the Day

Step 2: Expand the Time Window (Overnight)

Step 3: What Else Did Patient Zero Do?

Step 4: Infrastructure Expansion Using ASN Analysis

Step 5: Hunt the “Internet Neighborhood”

What this gives you

Viewing Minimal Samples for Orientation

Why Aggregation Changes Everything

Detecting Port Scanning with Custom Aggregation

Using “TopN” Statistics for Threat Hunting

Syntax

Example

Why this matters

Final Thoughts

Recent Posts

Comments

Subscribe to our newsletter