Let's Talk About HTTP – The Backbone of the Web (And a Goldmine for DFIR Folks)

12 minutes ago
10 min read

---------------------------------------------------------------------------------------------------

Thanks for all the support on the Wireshark article!

https://www.cyberengage.org/post/master-wireshark-tool-like-a-pro-the-ultimate-packet-analysis-guide-for-real-world-analysts

I know there are already tons of articles out there on HTTP—but trust me, this one’s different.

Give it a read, and you’ll see exactly what I mean.

---------------------------------------------------------------------------------------------------

Hey folks

Today, let’s take a walk through a protocol that all of us use literally every day—HTTP. Yup, HyperText Transfer Protocol. Even if you’re not a hardcore networking nerd, if you've ever opened a webpage (which, hello, you're doing now!), you’ve used HTTP.

But if you're into digital forensics, incident response, or just cybersecurity in general, knowing how HTTP works isn't just a bonus—it’s critical. And trust me, there's a lot more to it than just "the thing that gives me web pages."

------------------------------------------------------------------------------------------------------------

First Things First: What Is HTTP?

HTTP is a plaintext protocol, which means it’s readable. You and I can literally look at a packet of HTTP data and figure out what’s going on without needing fancy tools. It’s also stateless, meaning each request doesn’t remember the one before it. Every request stands on its own.

This might sound weird at first—like, how does your web browser remember where you left off? That’s where cookies, sessions, and tokens come in (topics for another day 😄).

------------------------------------------------------------------------------------------------------------

Why Should a Forensic Investigator or Incident Responder Care?

I’m glad you asked 😎

Whether you're investigating a rogue employee, a full-blown APT, or just checking someone’s shady web browsing, HTTP is going to show up a lot. In fact, you’ll probably run into HTTP traffic in almost every case.

Now, here’s the twist:

with the rise of full-disk encryption, incognito modes, and BYOD (bring-your-own-device) policies, disk artifacts aren’t always enough. That’s where network data comes in.

If you’ve got packet captures (PCAPs) available, you can:

Reconstruct entire web sessions
Pull down files that were downloaded (think: malware EXEs or phishing pages)
Track API calls to remote services
Monitor machine-to-machine activity (bots, implants, or automated tools)
Detect C2 traffic (command & control)

And that’s not just theory. I’ve worked with many malware analysts who help us dissect C2 channels running over HTTP. Even if the attacker encrypted the payload, the URLs, headers, or timing patterns can still tell you a lot.

------------------------------------------------------------------------------------------------------------

Real-Life Use Case: Web Server Compromise

Let's say a web server gets popped. Sure, you’ll look at logs and disk evidence. But what if the attacker cleared logs or used living-off-the-land techniques?

That’s when HTTP traffic analysis becomes your best friend.

By reviewing actual network traffic, you might catch:

File uploads via POST
Command injections
Suspicious API usage
Attacker beacons to external servers

------------------------------------------------------------------------------------------------------------

HTTP Versions – It’s Not All 1.1!

Okay, here’s a little version history in plain English:

HTTP/1.0 – Old-school. One request per connection.
HTTP/1.1 – Still widely used. Keeps connections alive. This is what you’ll see most in PCAPs.
HTTP/2 – Multiplexed. Multiple requests over one connection. Super common now.
HTTP/3 – The future. Built on QUIC (based on UDP), not TCP. Crazy fast. Still being adopted.

According to W3Techs (as of now), HTTP/2 is used by over 50% of websites, and HTTP/3 is slowly gaining ground (~10% but growing fast).

------------------------------------------------------------------------------------------------------------

Dissecting an HTTP Request – Let’s Get Nerdy for a Second

Here’s a simple GET request:

GET /time/1/current?cup2key=9:wz8PuwCb6IQ1sPJTx92bCpndCnsugtTLkdpVppulvZE&cup2hreq=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 HTTP/1.1\r\n

Host: clients2.google.com

This line breaks down into:

GET – Request method
cup2key=9:wz8PuwCb6IQ1sPJTx92bCpndCnsugtTLkdpVppulvZE&cup2hreq=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 – The URI (Uniform Resource Identifier) (Request Strings)
HTTP/1.1 – Protocol version
Then you’ve got headers (like Host, User-Agent, Accept, etc.)

Fun fact:

GET and POST are the most common methods. GET is used to fetch data. POST is used to send data (like login credentials, form data, or file uploads).

Here's a quick cheat sheet of other methods:

Method	What It Does
HEAD	Like GET, but fetches only headers (no body)
PUT	Uploads a file or resource
DELETE	Deletes a resource
OPTIONS	Asks what methods the server supports
TRACE	Echoes back the request (used for debugging)
CONNECT	Used to create a tunnel, often for HTTPS

Some of these, like TRACE and CONNECT, are often blocked by firewalls or disabled on servers because of their potential abuse.

------------------------------------------------------------------------------------------------------------

Forensic Tips & Bonus Nuggets

HTTP requests can contain query strings (?name=value&foo=bar), which might hold sensitive search terms, login attempts, or injection payloads.
Headers like User-Agent, Referer, and Cookie can reveal browser behavior, session IDs, and possible spoofing.
When malware uses HTTP as a C2 channel, it often mimics legitimate browser behavior to blend in. Look for anomalies!
Some HTTP-based malware also abuses API endpoints, like /api/upload, /checkin, or /status. These are usually dead giveaways in custom C2 protocols.

One Last Thing...

Not all HTTP traffic is visible today. With HTTPS (the secure version), a lot of the content is encrypted. But don’t worry—the domain (SNI), headers, and timing can still tell you a lot, especially if you're using TLS interception (in legal environments, of course).

------------------------------------------------------------------------------------------------------------

let’s casually break down something that often looks boring but is super powerful when you're into digital forensics, incident response, or even threat hunting—HTTP Request Headers.

What’s the Scene?

Imagine someone visited metadrive.io . When they did that, their browser quietly made an HTTP request to metadrive.io. What’s interesting is how their browser told the website about itself—and that's where headers come in.

Let’s start with the raw request:

GET / HTTP/1.1\r\n
Host: metadrive.io\r\n
Connection: keep-alive\r\n
Upgrade-Insecure-Requests: 1\r\n
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36\r\n
Accept: text/html, application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/ *; q=0.8,application/signed-exchange;v=b3;q=0.7\r\n
Accept-Encoding: gzip, deflate\r\n
Accept-Language: en-US, en; q=0.9\r\n
r\n

------------------------------------------------------------------------------------------------------------

Okay, deep breath!

Host Header – The MVP of HTTP/1.1

Host: metadrive.io\r\n

Why it matters:

In HTTP/1.1, the Host header is required. Without it, the server won’t know which website you want—especially important when one server hosts multiple sites. Think of it as the “to:” address on a letter.

------------------------------------------------------------------------------------------------------------

User-Agent – Browser's ID Card (Well, Sort Of)

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36\r\n

What it tells us:

This is your browser bragging about who it is. In this case: Browser identified as Chrome 136 on Windows 10 (64-bit)

Now here's the kicker:

This value is completely customizable. Anyone can spoof it. You and I can literally install browser extensions like User-Agent Switcher and pretend to be Googlebot, Internet Explorer from 2001, or even a toaster (okay, maybe not—but close!).

------------------------------------------------------------------------------------------------------------

Accept Headers – What the Client Wants

Accept: text/html, application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/ *; q=0.8,application/signed-
Accept-Encoding: gzip, deflate\r\n
Accept-Language: en-US, en; q=0.9\r\n

These are pretty straightforward.

Accept: What content types the browser can handle (HTML, XML, etc.)
Accept-Language: Tells the server the user's preferred languages. Useful for geo-profiling.
Accept-Encoding: Whether the browser can handle compressed responses like gzip.

Also, note the q value—it shows preference. For instance, q=0.9 means “I like XML, but not as much as plain HTML.”

------------------------------------------------------------------------------------------------------------

Cookies – The Trail of Breadcrumbs

(In this example its not there but adding so it will be eays for you)

Cookie: prov=...; hubspotutk=...; docs_hero=x; hero=none

prov=... – Likely a session or user identification token
hubspotutk=... – A HubSpot tracking cookie used for analytics and form submissions
docs_hero=x – Possibly a custom flag to track a docs page UI state
hero=none – Another UI state flag or feature toggle

Cookies are little pieces of data stored by your browser from websites. They're often used to maintain state—which is important because HTTP itself is stateless. Without cookies, every click would feel like starting from scratch.

Types of cookies:

Session Cookies: Gone when the browser closes.
Persistent Cookies: Stick around until they expire (or you delete them).

For us forensic folks, cookies can reveal:

Logins
Tracking IDs
User behavior across sessions

You’d be surprised how much we can correlate just from cookie IDs.

------------------------------------------------------------------------------------------------------------

Authorization – Base64 and Secrets

Authorization: Basic <Base64Encoded(username:password)>

Example:
Authorization: Basic bmV3dXNlcjpzM2NyM3RwYXNz

Here’s where you might find credentials. This is Basic Auth and it’s basically (pun intended) the base64 encoding of username:password.

So bmV3dXNlcjpzM2NyM3RwYXNz decodes to newuser:s3cr3tpass

Modern sites mostly use token-based auth or OAuth, but for internal apps or older services, you still find Basic Auth. When found, it’s gold for an attacker or an investigator.

------------------------------------------------------------------------------------------------------------

X-Forwarded-For – Tracing Real IPs (Kinda)

X-Forwarded-For: <IP>, <IP>

If a request passes through proxies, this header might show the original client IP.

BUT, it’s easily spoofed.

An attacker can just add their own X-Forwarded-For and pretend to come from anywhere (say, an internal IP like 192.168.1.11). Some servers trust this blindly—not good.

That’s why this header is a common target in IP-based bypasses.

------------------------------------------------------------------------------------------------------------

Proxy-Authorization – Auth to Use the Proxy

Proxy-Authorization: Basic bmV3dXNlcjpzM2NyM3RwYXNz

Like Authorization, but used when a client needs to authenticate to a proxy server. Again, base64—same risks apply.

------------------------------------------------------------------------------------------------------------

Referer (Yeah, It’s Misspelled) – Where You Came From

Referer:  https://www.cyberengage.org/search?q=forensic

This tells the server which page you clicked from. Handy for:

Analytics (e.g., “what drove traffic here?”)
Security (e.g., detecting CSRF or phishing flows)
Investigation (e.g., mapping user navigation paths)

Here’s the cool part:

if you’re moving from HTTPS → HTTP, browsers are supposed to suppress or truncate this header. But in practice, some browsers still leak enough info to tell where you came from.

------------------------------------------------------------------------------------------------------------

Other Fun Headers

Upgrade-Insecure-Requests: 1 → Tells the server “hey, if you support HTTPS, switch me there.”
Cache-Control: max-age=0 → Basically says: “Please don’t serve me a cached page; I want it fresh.”

------------------------------------------------------------------------------------------------------------

Dissecting an HTTP Response– Let’s Get Nerdy for a Second

So far, we’ve talked a lot about HTTP requests—what the client sends to the server. But now it’s time to flip the script. Let’s talk about what the server sends back in response.

Let’s Start from the Top — Status Line

Here’s a classic example:

HTTP/1.1 200 OK

This single line tells you three key things:

Protocol Version: HTTP/1.1 — this should match the client’s request version.
Status Code: 200 — tells you if the request went okay or something broke.
Status Text: OK — human-readable, but the client doesn’t really care what this says. It could say "Success", "All Good", or even "Nice Try Buddy" 😄 — as long as the number is 200, the meaning is the same.

💡 Common Status Codes You Should Know

Let me list a few real-world ones we bump into all the time:

Code	Meaning
100	Continue – Client can keep sending request body
200	OK – Everything’s good
301	Moved Permanently – Resource has a new home
302	Found – Temporary redirect
304	Not Modified – Client’s cached copy is still good
400	Bad Request – Syntax error from client
401	Unauthorized – Need authentication
403	Forbidden – You don’t have permission
404	Not Found – Resource doesn’t exist
407	Proxy Auth Required – You need to auth via proxy
500	Internal Server Error – Oops, something’s broken
503	Service Unavailable – Overload or maintenance
511	Network Auth Required – Seen in public Wi-Fi portals

For threat hunters:

Seeing lots of 400s from the same IP? That might be scanning/recon.

A sudden switch from 500s to 200s during POST requests? Could be SQL injection, where the server backend choked on bad input before the attacker got it right.

🔍 Real Response Header Breakdown

Here’s a full sample response:

accept-ranges: bytes\r\n
content-disposition: attachment\r\n
content-length: 1963\r\n
content-security-policy: default-src 'none'\r\n
server: Google-Edge-Cache\r\n
x-content-type-options: nosniff\r\n
x-frame-options: SAMEORIGIN\r\n
x-xss-protection: 0\r\n
x-request-id: c1349dbe-bb51-41bc-a142-e4ba95d94a1c\r\n
date: Sat, 24 May 2025 04:26:33 GMT\r\n
age: 38934\r\n
last-modified: Sat, 24 May 2025 04:24:20 GMT\r\n
etag: "45281ea"\r\n
content-type: application/octet-stream\r\n
alt-svc: h3=":443"; ma=2592000, h3-29=":443"; ma=2592000\r\n
cache-control: public,max-age=86400\r\n
coprocessor-response: download-server\r\n
\r\n

Now let’s decode it like detectives 🕵️:

Cache-Control, Expires, and ETag

These tell you how caching should work.

Cache-Control: private — Only the user’s browser should cache it, not shared proxies. ( if u see Cache-Control: public which means: The response is cacheable by any cache — both the user’s browser and shared caches )
Expires: <timestamp> — When the cache is no longer valid. or max-age=86400 (It remains fresh and reusable for 1 day)
ETag: "<hash>" — Unique fingerprint for the content; helps compare if content changed.

Great for web performance and forensic timeline building.

Content-Type and Content-Encoding

Tells you what kind of content and how it’s packed:

Content-Type: text/html; charset=utf-8 — HTML page in UTF-8 encoding.
or
content-type: application/octet-stream\r\n=tells the browser (or any client) that the server is sending raw binary data.
Content-Encoding: gzip — It's compressed, so your client needs to decompress.

Content-Length

Size of the actual data (after decompressing, if needed).

content-length: 1963: — 1963 bytes.

X-Frame-Options: SAMEORIGIN

Mitigates clickjacking by saying: “Only I can frame myself!”

Date

Exact time the response was generated. Useful when reconstructing timelines or tracking malware behavior.

date: Sat, 24 May 2025 04:26:33 GMT

Investigator Tip:

If your endpoint says it made the request at 1:52 PM, but the server's timestamp says it responded at 1:47 PM — you might have clock skew on the client. This can seriously mess with your timeline, so cross-check time sources always.

Fun fact: Some malware variants use this Date: header as a seed value for their DGA (Domain Generation Algorithm) — clever, huh?

Connection: keep-alive (if found)

With HTTP/1.1, one of the cool upgrades was allowing persistent connections — so your browser could reuse the same TCP session for multiple requests. This reduces overhead and speeds things up.

The client tells the server it supports this using:Connection: Keep-Alive
If the server agrees, it responds with:Connection: Keep-Alive

But if either side wants to close the connection : Connection: close

Investigator Tip:

If you're monitoring traffic and notice lots of "Connection: close" lines mid-session, it might indicate non-browser activity — like malware making single-use requests.

------------------------------------------------------------------------------------------------------------

What About Redirects?

Redirections are handled via a combination of:

300-series status codes (like 301, 302)
A Location: header that says: "Hey, go here instead!"

These redirects can be abused too. Malware campaigns use redirect chains to mask the origin of malicious content.

Forensics tip: Don’t stop at the first hop!

------------------------------------------------------------------------------------------------------------

Pro Tip: Watch Out for X- Headers

Both clients and servers can use custom headers that begin with X-. These can carry unique identifiers, debug info, or even tracking tokens. Example:

X-Request-Guid: <GUID>

This might help correlate a single session across multiple logs.

------------------------------------------------------------------------------------------------------------

HTTP Headers in Investigations

Let’s talk real-world usage. How do these headers help during an actual incident?

1. Pastebin & Data Exfil

Attackers often use public paste sites like Pastebin or SendSpace. Some malware is coded to automatically upload exfiltrated data using these services’ APIs.

If an attacker has RDP or VNC access, they might just open a browser and manually do it — but the network traffic (HTTP POST requests, User-Agent headers, and API URIs) will still leave footprints.

2. User-Agent Fingerprinting

If you're in a corporate environment, there’s probably a known set of legitimate User-Agent strings. Anything else? Could be:

Malware
Unauthorized browser
Portable or dev tools

Sometimes, malware adds its own version string in the User-Agent, helping investigators quickly fingerprint infections across the environment.

3. Credential Sniffing in HTTP Basic Auth

We touched on this earlier, but just a reminder — Basic Auth sends credentials like this:

Authorization: Basic bmV3dXNlcjpzM2NyM3RwYXNz

That Base64 string? It’s just user:password. If you’re capturing traffic, you can extract credentials directly.

4. URI Analysis

Every URI tells a story. It could be:

Web searches
Form submissions
API calls
Malware callbacks

Pairing URI analysis with malware analysis gives you powerful insight into what the attacker was trying to do — exfiltrate data, move laterally, connect to command-and-control, or worse.

5. When the Disk Fails, the Network Tells All

Modern attackers are smart:

They use private browsing
They run portable apps from USBs
They clean up after themselves

So maybe there’s no trace left on the disk. But network traffic? That’s harder to erase. If you have PCAPs or proxy logs, you’ve still got a shot.

------------------------------------------------------------------------------------------------------------

Final Thoughts

HTTP headers might seem boring on the surface, but when you dig in — they’re loaded with useful info. From persistent connections to User-Agent strings to caching behavior and time syncing — every bit tells you something.

Hope this post made it easier to see headers not as noise, but as gold dust for a forensic investigator.

-------------------------------------------------------Dean-------------------------------------------

Actively looking roles in cybersecurity. If you have a reference or a job opportunity, your support would mean the world to me!

Let's Talk About HTTP – The Backbone of the Web (And a Goldmine for DFIR Folks)

First Things First: What Is HTTP?

Why Should a Forensic Investigator or Incident Responder Care?

Real-Life Use Case: Web Server Compromise

HTTP Versions – It’s Not All 1.1!

Dissecting an HTTP Request – Let’s Get Nerdy for a Second

Forensic Tips & Bonus Nuggets

One Last Thing...

What’s the Scene?

Host Header – The MVP of HTTP/1.1

User-Agent – Browser's ID Card (Well, Sort Of)

Accept Headers – What the Client Wants

Cookies – The Trail of Breadcrumbs

Authorization – Base64 and Secrets

X-Forwarded-For – Tracing Real IPs (Kinda)

Proxy-Authorization – Auth to Use the Proxy

Referer (Yeah, It’s Misspelled) – Where You Came From

Other Fun Headers

Dissecting an HTTP Response– Let’s Get Nerdy for a Second

Let’s Start from the Top — Status Line

💡 Common Status Codes You Should Know

🔍 Real Response Header Breakdown

Cache-Control, Expires, and ETag

Content-Type and Content-Encoding

Content-Length

X-Frame-Options: SAMEORIGIN

Date

Connection: keep-alive (if found)

What About Redirects?

Pro Tip: Watch Out for X- Headers

HTTP Headers in Investigations

1. Pastebin & Data Exfil

2. User-Agent Fingerprinting

3. Credential Sniffing in HTTP Basic Auth

4. URI Analysis

5. When the Disk Fails, the Network Tells All

Final Thoughts

Recent Posts

Comments