top of page
Search

Let’s Go Practical: Working with NetFlow Using nfdump Tools

  • 4 hours ago
  • 6 min read

Enough theory.

Now let’s actually touch NetFlow data.

If you’re doing DFIR, threat hunting, or even basic network investigations, one toolkit you must be comfortable with is the nfdump suite.


This suite gives you three extremely important tools:
  • nfcapd – the collector

  • nfpcapd – the pcap-to-NetFlow converter

  • nfdump – the analysis engine


-----------------------------------------------------------------------------------------------------------

nfcapd: The NetFlow Collector (Where Everything Starts)

nfcapd is a daemon, not a one-time command.

Its job is simple:

  • listen on a UDP port

  • receive NetFlow data from trusted exporters (routers, firewalls, switches)

  • write that data to disk in a compact binary format


It supports:
  • NetFlow v5, v7, v9

  • IPFIX

  • sFlow

So regardless of vendor or flow standard, nfcapd usually has you covered.


How Much Storage Do You Actually Need?

This is one of the first questions everyone asks.

A rough rule of thumb:

~1 MB of NetFlow data for every 2 GB of network traffic

Is this perfect? No.

Is it useful for planning? Yes.


Your actual numbers will depend on:

  • number of flows

  • traffic patterns

  • sampling

  • exporter behavior

But it’s a good starting point when designing storage.


How nfcapd Stores Data (And Why It Matters)

When nfcapd writes flow data, it uses a very clean naming scheme:

nfcapd.YYYYMMDDhhmm

Example:

nfcapd.201302262305

Why this matters:

  • files sort naturally by time

  • no database needed

  • easy scripting

  • easy forensic timelines

By default, nfcapd rotates files every 5 minutes.

That means:

  • 288 files per exporter per day

  • predictable storage growth

  • easy time slicing during investigations


-----------------------------------------------------------------------------------------------------------

Bonus Feature: Flow Forwarding (-R Option)

One very underrated feature of nfcapd is flow forwarding.

You can collect NetFlow and forward it to another collector at the same time.


Example scenario:

  • local collection for DFIR

  • central collection for SOC visibility

Example command:

nfcapd -p 1025 -w -D -R 10.0.0.1/1025 \
-n router,10.0.0.2,/var/local/flows/router1

Command breakdown:

  • nfcapd - NetFlow capture daemon

  • -p 1025 - Listen on port 1025 for incoming NetFlow packets

  • -w - Align file rotation to the next interval (e.g., start at the top of the hour)

  • -D - Run as a daemon (background process)

  • -R 10.0.0.1/1025 - Act as a repeater/forwarder: send received flows to IP 10.0.0.1 on port 1025

  • -n router,10.0.0.2,/var/local/flows/routerlogs - Define an identification string:

    • router - Identifier name for this source

    • 10.0.0.2 - Expected source IP address

    • /var/local/flows/routerlogs - Directory where flow files will be stored


In summary:

This command starts a NetFlow collector that listens on port 1025, stores flow data from router at 10.0.0.2 into /var/local/flows/routerlogs, and simultaneously forwards the data to another collector at 10.0.0.1/1025. It runs in the background as a daemon.


This is extremely useful in larger environments.


nfpcapd: Turning PCAPs into NetFlow

Now this is where DFIR people should pay attention.

nfpcapd lets you:

take a pcap file and convert it into NetFlow-style records

Why does this matter?

Because parsing large pcaps is:

  • slow

  • CPU-heavy

  • painful at scale


NetFlow-based analysis is orders of magnitude faster.

So the smart workflow is:
  1. Convert pcap → NetFlow

  2. Hunt quickly using NetFlow

  3. Go back to full pcap only where needed



Example:

nfpcapd -r bigFlows.pcap -l /mnt/c/Users/Akash/Downloads/
This step alone can save hours or days in an investigation.

-----------------------------------------------------------------------------------------------------------

nfdump: Where the Real Analysis Happens

Once flows are collected (or converted), this is where we start asking questions.

nfdump is a command-line NetFlow analysis tool.

It:

  • reads nfcapd binary files

  • applies filters

  • summarizes results

  • responds very fast — even on huge datasets


Important point:

nfdump does not magically find “bad traffic”

Its power comes from:

  • how you ask questions

  • how you refine hypotheses

  • how you chain queries together

This is investigative work, not alert-driven work.


-----------------------------------------------------------------------------------------------------------

Reading NetFlow Data with nfdump

You can read:

  • a single file

  • or an entire directory tree

Reading a Single File

nfdump -r /mnt/c/Users/Akash/Downloads/nfcapd.201302262305

This reads:

  • flows from one exporter

  • for a specific 5-minute window

Perfect for targeted investigations.



Reading a Directory (Much More Common)

nfdump -R /mnt/c/Users/Akash/Downloads/test/

This tells nfdump:

  • recursively walk the directory

  • read all NetFlow files inside

This is how you analyze:

  • days

  • weeks

  • months of traffic



-----------------------------------------------------------------------------------------------------------

Building Real Investigative Queries

Let’s look at a realistic example.

Goal: Find internal systems that accessed internet web servers without using the corporate proxy.

Conditions:

  • traffic passed through internet-facing router

  • destination ports 80 or 443

  • exclude proxy IP 172.0.1.1

  • specific 24-hour window

  • show only top 10 systems


Command:

nfdump -R /mnt/c/Users/Akash/Downloads/test/ \
-t '2026/01/12.12:00:00-2026/01/13.12:00:00' \
-c 10 'proto tcp and (dst port 80 or dst port 443) and not src host 172.0.1.1'

This is classic NetFlow hunting:

  • scoped

  • fast

  • hypothesis-driven

From here, you pivot:
  • which hosts?

  • how often?

  • how much data?

  • where did they connect?



-----------------------------------------------------------------------------------------------------------


line Output (Default, Lightweight)

This is the default view and the one you’ll see most often when you’re doing quick scoping.


It shows:

  • start and end time

  • source and destination IPs

  • ports

  • protocol

  • bytes and packets

Example:

nfdump -R /mnt/c/Users/Akash/Downloads/test -o line host 172.16.128.169

This is perfect when you’re asking:

“Is this IP even talking on my network?”

Fast. Minimal. No noise.


-----------------------------------------------------------------------------------------------------------

2. long Output (Adds TCP Flags)

The long format builds on line and adds:

  • TCP flags

  • Type of Service (ToS)

Example:

nfdump -R /mnt/c/Users/Akash/Downloads/test -o long 'proto tcp and port 445'

Why this matters:

  • TCP flags tell a story

  • SYN-only traffic looks very different from established sessions

  • RST storms, half-open connections, or scanning behavior start to stand out


Important reminder: Each line is unidirectional.

A normal bidirectional conversation:

  • client → server

  • server → client

…will always appear as two separate flow records.

This trips people up early on.


-----------------------------------------------------------------------------------------------------------

3. extended Output (Adds Statistics)

This is where things get interesting.

The extended format adds derived values, calculated at query time:

  • packets per second

  • bits per second

  • bytes per packet

Example:

nfdump -R /mnt/c/Users/Akash/Downloads/test -o extended 'proto tcp and port 445'

These values help you distinguish:

  • interactive shells (low & slow)

  • file transfers (fast ramp-up, steady throughput)

  • dormant C2 channels (tiny but persistent)


None of this data is stored explicitly — it’s derived — but it’s incredibly useful for behavioral analysis.

-----------------------------------------------------------------------------------------------------------

IPv6 Note (Important but Often Missed)

nfdump fully supports IPv6, but truncates addresses by default for readability.

If you want full IPv6 visibility, use:

  • line6

  • long6

  • extended6

Same formats — just IPv6-aware.


-----------------------------------------------------------------------------------------------------------

Practical Hunt: Finding Patient Zero Using NetFlow

Now let’s do real hunting, not theory.


Goal:
Identify internal hosts communicating with this C2

Step 1: First Hits of the Day

Start with a known NetFlow file

Ask:

“Who talked to this IP first today?”
nfdump -R /mnt/c/Users/Akash/Downloads/test -O tstart -c 5 'proto tcp and dst port 8014 and host 172.16.128.169

We see the first hit  into the day.

That’s early — but maybe not early enough.

Step 2: Expand the Time Window (Overnight)

If the first hit isn’t at the beginning of the capture window, that’s a signal.

So we expand: (overnight window)

nfdump -R /mnt/c/Users/Akash/Downloads/test -t '2013/02/26.23:00:00-2013/02/26.23:10:60' -O tstart -c 1 'proto tcp and dst port 8014 and host 172.16.128.169'

Step 3: What Else Did Patient Zero Do?

Now we pivot.

Same time window, but focus on the internal host itself:

nfdump -R  /mnt/c/Users/Akash/Downloads/test -t '2013/02/26.23:00:00-2013/02/26.23:10:60' -O tstart 'host 172.16.128.169'

This answers:

  • what happened before C2?

  • was there a download?

  • was there lateral movement?

  • did anything precede the UDP traffic?



Step 4: Infrastructure Expansion Using ASN Analysis


  • 172.16.128.169


Using WHOIS:

whois 172.16.128.169 | grep AS

Step 5: Hunt the “Internet Neighborhood”

If the attacker uses one provider, they may use more infrastructure in the same ASN.

So we ask:

“Who talked to this ASN all month?”
nfdump -q -R /mnt/c/Users/Akash/Downloads/test -o 'fmt:$sa $da' 'dst as 36351' | sort | uniq

What this gives you

  • $sa → source IP (internal)

  • $da → destination IP (external)

  • Deduplicated list of unique communications



Viewing Minimal Samples for Orientation

Sometimes you just want a quick sanity check:

nfdump -R mnt/c/Users/Akash/Downloads/test -t '2013/02/26.23:00:00-2013/02/26.23:10:60' -O tstart -c 1

Or inspecting a single file:

nfdump -r mnt/c/Users/Akash/Downloads/test/nfcapd.201302262305 -O tstart -c 5

These commands are underrated — they help you:

  • Validate time ranges

  • Confirm exporter behavior

  • Avoid wrong assumptions early


------------------------------------------------------------------------------------------------------------

Why Aggregation Changes Everything

Because flows are split across files, one real-world connection may appear as many records.

By default, nfdump aggregates using five key values:

  • Source IP

  • Destination IP

  • Protocol

  • Source Port

  • Destination Port

Flows sharing these values are merged into a single logical event.



Detecting Port Scanning with Custom Aggregation

Port scanners behave differently:

  • Source port changes constantly

  • Target port stays fixed

nfdump -q -R mnt/c/Users/Akash/Downloads/test -O bytes -A srcip, proto, dstport  -o 'fmt: $sa -> $pr $dp $byt $fl'

This answers:

  • Who is consuming the most bandwidth

  • Which protocol and port

  • How many flows were involved

Great for:

  • Data exfiltration hunting

  • Rogue services

  • Abnormal internal behavior



Using “TopN” Statistics for Threat Hunting

Most engineers use TopN for bandwidth.

Investigators use it differently.

Syntax

-s statistic[:p][/orderby]

Example

nfdump -R /mnt/c/Users/Akash/Downloads/test/ -s ip/bytes -s dstport:p/bytes -n 5

Why this matters

  • Identify staging systems (high outbound bytes)

  • Detect scanners (high flow counts)

  • Separate TCP vs UDP behavior with :p

TopN becomes powerful only when driven by intelligence, not curiosity.


------------------------------------------------------------------------------------------------------------

Final Thoughts

nfdump isn’t flashy. It doesn’t decrypt payloads. It doesn’t show malware strings.

But when used correctly, it tells you:

  • Who talked

  • For how long

  • How often

  • And how much data moved

In real investigations, that context is often enough to confirm compromise, scope incidents, and prioritize response.

----------------------------------------------Dean-------------------------------------------------------

 
 
 

Comments


bottom of page