top of page
Search

Let's Talk About HTTP – The Backbone of the Web (And a Goldmine for DFIR Folks)

  • 12 minutes ago
  • 10 min read

---------------------------------------------------------------------------------------------------

Thanks for all the support on the Wireshark article!
I know there are already tons of articles out there on HTTP—but trust me, this one’s different.
Give it a read, and you’ll see exactly what I mean.

---------------------------------------------------------------------------------------------------

Hey folks

Today, let’s take a walk through a protocol that all of us use literally every day—HTTP. Yup, HyperText Transfer Protocol. Even if you’re not a hardcore networking nerd, if you've ever opened a webpage (which, hello, you're doing now!), you’ve used HTTP.


But if you're into digital forensics, incident response, or just cybersecurity in general, knowing how HTTP works isn't just a bonus—it’s critical. And trust me, there's a lot more to it than just "the thing that gives me web pages."


------------------------------------------------------------------------------------------------------------

First Things First: What Is HTTP?

HTTP is a plaintext protocol, which means it’s readable. You and I can literally look at a packet of HTTP data and figure out what’s going on without needing fancy tools. It’s also stateless, meaning each request doesn’t remember the one before it. Every request stands on its own.

This might sound weird at first—like, how does your web browser remember where you left off? That’s where cookies, sessions, and tokens come in (topics for another day 😄).

------------------------------------------------------------------------------------------------------------

Why Should a Forensic Investigator or Incident Responder Care?

I’m glad you asked 😎

Whether you're investigating a rogue employee, a full-blown APT, or just checking someone’s shady web browsing, HTTP is going to show up a lot. In fact, you’ll probably run into HTTP traffic in almost every case.

Now, here’s the twist:

with the rise of full-disk encryption, incognito modes, and BYOD (bring-your-own-device) policies, disk artifacts aren’t always enough. That’s where network data comes in.


If you’ve got packet captures (PCAPs) available, you can:

  • Reconstruct entire web sessions

  • Pull down files that were downloaded (think: malware EXEs or phishing pages)

  • Track API calls to remote services

  • Monitor machine-to-machine activity (bots, implants, or automated tools)

  • Detect C2 traffic (command & control)


And that’s not just theory. I’ve worked with many malware analysts who help us dissect C2 channels running over HTTP. Even if the attacker encrypted the payload, the URLs, headers, or timing patterns can still tell you a lot.


------------------------------------------------------------------------------------------------------------

Real-Life Use Case: Web Server Compromise

Let's say a web server gets popped. Sure, you’ll look at logs and disk evidence. But what if the attacker cleared logs or used living-off-the-land techniques?


That’s when HTTP traffic analysis becomes your best friend.

By reviewing actual network traffic, you might catch:

  • File uploads via POST

  • Command injections

  • Suspicious API usage

  • Attacker beacons to external servers


------------------------------------------------------------------------------------------------------------

HTTP Versions – It’s Not All 1.1!

Okay, here’s a little version history in plain English:

  • HTTP/1.0 – Old-school. One request per connection.

  • HTTP/1.1 – Still widely used. Keeps connections alive. This is what you’ll see most in PCAPs.

  • HTTP/2 – Multiplexed. Multiple requests over one connection. Super common now.

  • HTTP/3 – The future. Built on QUIC (based on UDP), not TCP. Crazy fast. Still being adopted.


According to W3Techs (as of now), HTTP/2 is used by over 50% of websites, and HTTP/3 is slowly gaining ground (~10% but growing fast).

------------------------------------------------------------------------------------------------------------

Dissecting an HTTP Request – Let’s Get Nerdy for a Second

Here’s a simple GET request:

GET /time/1/current?cup2key=9:wz8PuwCb6IQ1sPJTx92bCpndCnsugtTLkdpVppulvZE&cup2hreq=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 HTTP/1.1\r\n

Host: clients2.google.com

This line breaks down into:

  • GET – Request method

  • cup2key=9:wz8PuwCb6IQ1sPJTx92bCpndCnsugtTLkdpVppulvZE&cup2hreq=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 – The URI (Uniform Resource Identifier) (Request Strings)

  • HTTP/1.1 – Protocol version

  • Then you’ve got headers (like Host, User-Agent, Accept, etc.)



Fun fact:

GET and POST are the most common methods. GET is used to fetch data. POST is used to send data (like login credentials, form data, or file uploads).


Here's a quick cheat sheet of other methods:

Method

What It Does

HEAD

Like GET, but fetches only headers (no body)

PUT

Uploads a file or resource

DELETE

Deletes a resource

OPTIONS

Asks what methods the server supports

TRACE

Echoes back the request (used for debugging)

CONNECT

Used to create a tunnel, often for HTTPS

Some of these, like TRACE and CONNECT, are often blocked by firewalls or disabled on servers because of their potential abuse.

------------------------------------------------------------------------------------------------------------

Forensic Tips & Bonus Nuggets

  • HTTP requests can contain query strings (?name=value&foo=bar), which might hold sensitive search terms, login attempts, or injection payloads.

  • Headers like User-Agent, Referer, and Cookie can reveal browser behavior, session IDs, and possible spoofing.

  • When malware uses HTTP as a C2 channel, it often mimics legitimate browser behavior to blend in. Look for anomalies!

  • Some HTTP-based malware also abuses API endpoints, like /api/upload, /checkin, or /status. These are usually dead giveaways in custom C2 protocols.



One Last Thing...

Not all HTTP traffic is visible today. With HTTPS (the secure version), a lot of the content is encrypted. But don’t worry—the domain (SNI), headers, and timing can still tell you a lot, especially if you're using TLS interception (in legal environments, of course).


------------------------------------------------------------------------------------------------------------


let’s casually break down something that often looks boring but is super powerful when you're into digital forensics, incident response, or even threat hunting—HTTP Request Headers.

What’s the Scene?

Imagine someone visited metadrive.io . When they did that, their browser quietly made an HTTP request to metadrive.io. What’s interesting is how their browser told the website about itself—and that's where headers come in.


Let’s start with the raw request:

GET / HTTP/1.1\r\n
Host: metadrive.io\r\n
Connection: keep-alive\r\n
Upgrade-Insecure-Requests: 1\r\n
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36\r\n
Accept: text/html, application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/ *; q=0.8,application/signed-exchange;v=b3;q=0.7\r\n
Accept-Encoding: gzip, deflate\r\n
Accept-Language: en-US, en; q=0.9\r\n
r\n

------------------------------------------------------------------------------------------------------------

Okay, deep breath!

Host Header – The MVP of HTTP/1.1

Host: metadrive.io\r\n

Why it matters:

In HTTP/1.1, the Host header is required. Without it, the server won’t know which website you want—especially important when one server hosts multiple sites. Think of it as the “to:” address on a letter.


------------------------------------------------------------------------------------------------------------

User-Agent – Browser's ID Card (Well, Sort Of)

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36\r\n

What it tells us:

This is your browser bragging about who it is. In this case: Browser identified as Chrome 136 on Windows 10 (64-bit)


Now here's the kicker:

This value is completely customizable. Anyone can spoof it. You and I can literally install browser extensions like User-Agent Switcher and pretend to be Googlebot, Internet Explorer from 2001, or even a toaster (okay, maybe not—but close!).


------------------------------------------------------------------------------------------------------------

Accept Headers – What the Client Wants

Accept: text/html, application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/ *; q=0.8,application/signed-
Accept-Encoding: gzip, deflate\r\n
Accept-Language: en-US, en; q=0.9\r\n

These are pretty straightforward.

  • Accept: What content types the browser can handle (HTML, XML, etc.)

  • Accept-Language: Tells the server the user's preferred languages. Useful for geo-profiling.

  • Accept-Encoding: Whether the browser can handle compressed responses like gzip.


Also, note the q value—it shows preference. For instance, q=0.9 means “I like XML, but not as much as plain HTML.”


------------------------------------------------------------------------------------------------------------

Cookies – The Trail of Breadcrumbs

(In this example its not there but adding so it will be eays for you)

Cookie: prov=...; hubspotutk=...; docs_hero=x; hero=none
  • prov=... – Likely a session or user identification token

  • hubspotutk=... – A HubSpot tracking cookie used for analytics and form submissions

  • docs_hero=x – Possibly a custom flag to track a docs page UI state

  • hero=none – Another UI state flag or feature toggle


Cookies are little pieces of data stored by your browser from websites. They're often used to maintain state—which is important because HTTP itself is stateless. Without cookies, every click would feel like starting from scratch.


Types of cookies:

  • Session Cookies: Gone when the browser closes.

  • Persistent Cookies: Stick around until they expire (or you delete them).


For us forensic folks, cookies can reveal:

  • Logins

  • Tracking IDs

  • User behavior across sessions


You’d be surprised how much we can correlate just from cookie IDs.

------------------------------------------------------------------------------------------------------------

Authorization – Base64 and Secrets

Authorization: Basic <Base64Encoded(username:password)>

Example:
Authorization: Basic bmV3dXNlcjpzM2NyM3RwYXNz

Here’s where you might find credentials. This is Basic Auth and it’s basically (pun intended) the base64 encoding of username:password.


So bmV3dXNlcjpzM2NyM3RwYXNz decodes to newuser:s3cr3tpass

Modern sites mostly use token-based auth or OAuth, but for internal apps or older services, you still find Basic Auth. When found, it’s gold for an attacker or an investigator.


------------------------------------------------------------------------------------------------------------

X-Forwarded-For – Tracing Real IPs (Kinda)

X-Forwarded-For: <IP>, <IP>

If a request passes through proxies, this header might show the original client IP.

BUT, it’s easily spoofed.

An attacker can just add their own X-Forwarded-For and pretend to come from anywhere (say, an internal IP like 192.168.1.11). Some servers trust this blindly—not good.

That’s why this header is a common target in IP-based bypasses.

------------------------------------------------------------------------------------------------------------

Proxy-Authorization – Auth to Use the Proxy

Proxy-Authorization: Basic bmV3dXNlcjpzM2NyM3RwYXNz

Like Authorization, but used when a client needs to authenticate to a proxy server. Again, base64—same risks apply.


------------------------------------------------------------------------------------------------------------

Referer (Yeah, It’s Misspelled) – Where You Came From

This tells the server which page you clicked from. Handy for:

  • Analytics (e.g., “what drove traffic here?”)

  • Security (e.g., detecting CSRF or phishing flows)

  • Investigation (e.g., mapping user navigation paths)


Here’s the cool part:

if you’re moving from HTTPS → HTTP, browsers are supposed to suppress or truncate this header. But in practice, some browsers still leak enough info to tell where you came from.


------------------------------------------------------------------------------------------------------------

Other Fun Headers

  • Upgrade-Insecure-Requests: 1 → Tells the server “hey, if you support HTTPS, switch me there.”

  • Cache-Control: max-age=0 → Basically says: “Please don’t serve me a cached page; I want it fresh.”


------------------------------------------------------------------------------------------------------------


Dissecting an HTTP Response– Let’s Get Nerdy for a Second


So far, we’ve talked a lot about HTTP requestswhat the client sends to the server. But now it’s time to flip the script. Let’s talk about what the server sends back in response.



Let’s Start from the Top — Status Line

Here’s a classic example:

HTTP/1.1 200 OK

This single line tells you three key things:

  1. Protocol Version: HTTP/1.1 — this should match the client’s request version.

  2. Status Code: 200 — tells you if the request went okay or something broke.

  3. Status Text: OK — human-readable, but the client doesn’t really care what this says. It could say "Success", "All Good", or even "Nice Try Buddy" 😄 — as long as the number is 200, the meaning is the same.


💡 Common Status Codes You Should Know

Let me list a few real-world ones we bump into all the time:

Code

Meaning

100

Continue – Client can keep sending request body

200

OK – Everything’s good

301

Moved Permanently – Resource has a new home

302

Found – Temporary redirect

304

Not Modified – Client’s cached copy is still good

400

Bad Request – Syntax error from client

401

Unauthorized – Need authentication

403

Forbidden – You don’t have permission

404

Not Found – Resource doesn’t exist

407

Proxy Auth Required – You need to auth via proxy

500

Internal Server Error – Oops, something’s broken

503

Service Unavailable – Overload or maintenance

511

Network Auth Required – Seen in public Wi-Fi portals

For threat hunters:

Seeing lots of 400s from the same IP? That might be scanning/recon.

A sudden switch from 500s to 200s during POST requests? Could be SQL injection, where the server backend choked on bad input before the attacker got it right.


🔍 Real Response Header Breakdown

Here’s a full sample response:

accept-ranges: bytes\r\n
content-disposition: attachment\r\n
content-length: 1963\r\n
content-security-policy: default-src 'none'\r\n
server: Google-Edge-Cache\r\n
x-content-type-options: nosniff\r\n
x-frame-options: SAMEORIGIN\r\n
x-xss-protection: 0\r\n
x-request-id: c1349dbe-bb51-41bc-a142-e4ba95d94a1c\r\n
date: Sat, 24 May 2025 04:26:33 GMT\r\n
age: 38934\r\n
last-modified: Sat, 24 May 2025 04:24:20 GMT\r\n
etag: "45281ea"\r\n
content-type: application/octet-stream\r\n
alt-svc: h3=":443"; ma=2592000, h3-29=":443"; ma=2592000\r\n
cache-control: public,max-age=86400\r\n
coprocessor-response: download-server\r\n
\r\n

Now let’s decode it like detectives 🕵️:
  1. Cache-Control, Expires, and ETag

These tell you how caching should work.

  • Cache-Control: private — Only the user’s browser should cache it, not shared proxies. ( if u see Cache-Control: public which means: The response is cacheable by any cache — both the user’s browser and shared caches )

  • Expires: <timestamp> — When the cache is no longer valid. or max-age=86400 (It remains fresh and reusable for 1 day)

  • ETag: "<hash>" — Unique fingerprint for the content; helps compare if content changed.


Great for web performance and forensic timeline building.

  1. Content-Type and Content-Encoding

Tells you what kind of content and how it’s packed:

  • Content-Type: text/html; charset=utf-8 — HTML page in UTF-8 encoding.

    or

  • content-type: application/octet-stream\r\n=tells the browser (or any client) that the server is sending raw binary data.

  • Content-Encoding: gzip — It's compressed, so your client needs to decompress.



  1. Content-Length

Size of the actual data (after decompressing, if needed).

  • content-length: 1963: — 1963 bytes.


  1. X-Frame-Options: SAMEORIGIN

Mitigates clickjacking by saying: “Only I can frame myself!”


  1. Date

Exact time the response was generated. Useful when reconstructing timelines or tracking malware behavior.

date: Sat, 24 May 2025 04:26:33 GMT
 Investigator Tip:

If your endpoint says it made the request at 1:52 PM, but the server's timestamp says it responded at 1:47 PM — you might have clock skew on the client. This can seriously mess with your timeline, so cross-check time sources always.


Fun fact: Some malware variants use this Date: header as a seed value for their DGA (Domain Generation Algorithm) — clever, huh?

  1. Connection: keep-alive (if found)

With HTTP/1.1, one of the cool upgrades was allowing persistent connections — so your browser could reuse the same TCP session for multiple requests. This reduces overhead and speeds things up.


  • The client tells the server it supports this using:Connection: Keep-Alive

  • If the server agrees, it responds with:Connection: Keep-Alive


But if either side wants to close the connection : Connection: close

Investigator Tip:

If you're monitoring traffic and notice lots of "Connection: close" lines mid-session, it might indicate non-browser activity — like malware making single-use requests.

------------------------------------------------------------------------------------------------------------

What About Redirects?

Redirections are handled via a combination of:

  • 300-series status codes (like 301, 302)

  • A Location: header that says: "Hey, go here instead!"

These redirects can be abused too. Malware campaigns use redirect chains to mask the origin of malicious content.

Forensics tip: Don’t stop at the first hop!

------------------------------------------------------------------------------------------------------------

Pro Tip: Watch Out for X- Headers

Both clients and servers can use custom headers that begin with X-. These can carry unique identifiers, debug info, or even tracking tokens. Example:

X-Request-Guid: <GUID>

This might help correlate a single session across multiple logs.


------------------------------------------------------------------------------------------------------------


HTTP Headers in Investigations

Let’s talk real-world usage. How do these headers help during an actual incident?


1. Pastebin & Data Exfil

Attackers often use public paste sites like Pastebin or SendSpace. Some malware is coded to automatically upload exfiltrated data using these services’ APIs.

If an attacker has RDP or VNC access, they might just open a browser and manually do it — but the network traffic (HTTP POST requests, User-Agent headers, and API URIs) will still leave footprints.


2. User-Agent Fingerprinting

If you're in a corporate environment, there’s probably a known set of legitimate User-Agent strings. Anything else? Could be:

  • Malware

  • Unauthorized browser

  • Portable or dev tools

Sometimes, malware adds its own version string in the User-Agent, helping investigators quickly fingerprint infections across the environment.


3. Credential Sniffing in HTTP Basic Auth

We touched on this earlier, but just a reminder — Basic Auth sends credentials like this:

Authorization: Basic bmV3dXNlcjpzM2NyM3RwYXNz
That Base64 string? It’s just user:password. If you’re capturing traffic, you can extract credentials directly.

4. URI Analysis

Every URI tells a story. It could be:


  • Web searches

  • Form submissions

  • API calls

  • Malware callbacks


Pairing URI analysis with malware analysis gives you powerful insight into what the attacker was trying to do — exfiltrate data, move laterally, connect to command-and-control, or worse.

5. When the Disk Fails, the Network Tells All


Modern attackers are smart:

  • They use private browsing

  • They run portable apps from USBs

  • They clean up after themselves


So maybe there’s no trace left on the disk. But network traffic? That’s harder to erase. If you have PCAPs or proxy logs, you’ve still got a shot.

------------------------------------------------------------------------------------------------------------


Final Thoughts

HTTP headers might seem boring on the surface, but when you dig in — they’re loaded with useful info. From persistent connections to User-Agent strings to caching behavior and time syncing — every bit tells you something.


Hope this post made it easier to see headers not as noise, but as gold dust for a forensic investigator.

-------------------------------------------------------Dean-------------------------------------------


 
 
 

Comments


bottom of page