Let's Talk About HTTP – The Backbone of the Web (And a Goldmine for DFIR Folks)
- 12 minutes ago
- 10 min read

---------------------------------------------------------------------------------------------------
Thanks for all the support on the Wireshark article!
I know there are already tons of articles out there on HTTP—but trust me, this one’s different.
Give it a read, and you’ll see exactly what I mean.
---------------------------------------------------------------------------------------------------
Hey folks
Today, let’s take a walk through a protocol that all of us use literally every day—HTTP. Yup, HyperText Transfer Protocol. Even if you’re not a hardcore networking nerd, if you've ever opened a webpage (which, hello, you're doing now!), you’ve used HTTP.
But if you're into digital forensics, incident response, or just cybersecurity in general, knowing how HTTP works isn't just a bonus—it’s critical. And trust me, there's a lot more to it than just "the thing that gives me web pages."
------------------------------------------------------------------------------------------------------------
First Things First: What Is HTTP?
HTTP is a plaintext protocol, which means it’s readable. You and I can literally look at a packet of HTTP data and figure out what’s going on without needing fancy tools. It’s also stateless, meaning each request doesn’t remember the one before it. Every request stands on its own.
This might sound weird at first—like, how does your web browser remember where you left off? That’s where cookies, sessions, and tokens come in (topics for another day 😄).
------------------------------------------------------------------------------------------------------------
Why Should a Forensic Investigator or Incident Responder Care?
I’m glad you asked 😎
Whether you're investigating a rogue employee, a full-blown APT, or just checking someone’s shady web browsing, HTTP is going to show up a lot. In fact, you’ll probably run into HTTP traffic in almost every case.
Now, here’s the twist:
with the rise of full-disk encryption, incognito modes, and BYOD (bring-your-own-device) policies, disk artifacts aren’t always enough. That’s where network data comes in.
If you’ve got packet captures (PCAPs) available, you can:
Reconstruct entire web sessions
Pull down files that were downloaded (think: malware EXEs or phishing pages)
Track API calls to remote services
Monitor machine-to-machine activity (bots, implants, or automated tools)
Detect C2 traffic (command & control)
And that’s not just theory. I’ve worked with many malware analysts who help us dissect C2 channels running over HTTP. Even if the attacker encrypted the payload, the URLs, headers, or timing patterns can still tell you a lot.
------------------------------------------------------------------------------------------------------------
Real-Life Use Case: Web Server Compromise
Let's say a web server gets popped. Sure, you’ll look at logs and disk evidence. But what if the attacker cleared logs or used living-off-the-land techniques?
That’s when HTTP traffic analysis becomes your best friend.
By reviewing actual network traffic, you might catch:
File uploads via POST
Command injections
Suspicious API usage
Attacker beacons to external servers
------------------------------------------------------------------------------------------------------------
HTTP Versions – It’s Not All 1.1!
Okay, here’s a little version history in plain English:
HTTP/1.0 – Old-school. One request per connection.
HTTP/1.1 – Still widely used. Keeps connections alive. This is what you’ll see most in PCAPs.
HTTP/2 – Multiplexed. Multiple requests over one connection. Super common now.
HTTP/3 – The future. Built on QUIC (based on UDP), not TCP. Crazy fast. Still being adopted.
According to W3Techs (as of now), HTTP/2 is used by over 50% of websites, and HTTP/3 is slowly gaining ground (~10% but growing fast).
------------------------------------------------------------------------------------------------------------
Dissecting an HTTP Request – Let’s Get Nerdy for a Second
Here’s a simple GET request:
GET /time/1/current?cup2key=9:wz8PuwCb6IQ1sPJTx92bCpndCnsugtTLkdpVppulvZE&cup2hreq=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 HTTP/1.1\r\n
Host: clients2.google.com
This line breaks down into:
GET – Request method
cup2key=9:wz8PuwCb6IQ1sPJTx92bCpndCnsugtTLkdpVppulvZE&cup2hreq=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 – The URI (Uniform Resource Identifier) (Request Strings)
HTTP/1.1 – Protocol version
Then you’ve got headers (like Host, User-Agent, Accept, etc.)

Fun fact:
GET and POST are the most common methods. GET is used to fetch data. POST is used to send data (like login credentials, form data, or file uploads).
Here's a quick cheat sheet of other methods:
Method | What It Does |
HEAD | Like GET, but fetches only headers (no body) |
PUT | Uploads a file or resource |
DELETE | Deletes a resource |
OPTIONS | Asks what methods the server supports |
TRACE | Echoes back the request (used for debugging) |
CONNECT | Used to create a tunnel, often for HTTPS |
Some of these, like TRACE and CONNECT, are often blocked by firewalls or disabled on servers because of their potential abuse.
------------------------------------------------------------------------------------------------------------
Forensic Tips & Bonus Nuggets
HTTP requests can contain query strings (?name=value&foo=bar), which might hold sensitive search terms, login attempts, or injection payloads.
Headers like User-Agent, Referer, and Cookie can reveal browser behavior, session IDs, and possible spoofing.
When malware uses HTTP as a C2 channel, it often mimics legitimate browser behavior to blend in. Look for anomalies!
Some HTTP-based malware also abuses API endpoints, like /api/upload, /checkin, or /status. These are usually dead giveaways in custom C2 protocols.
One Last Thing...
Not all HTTP traffic is visible today. With HTTPS (the secure version), a lot of the content is encrypted. But don’t worry—the domain (SNI), headers, and timing can still tell you a lot, especially if you're using TLS interception (in legal environments, of course).
------------------------------------------------------------------------------------------------------------
let’s casually break down something that often looks boring but is super powerful when you're into digital forensics, incident response, or even threat hunting—HTTP Request Headers.
What’s the Scene?
Imagine someone visited metadrive.io . When they did that, their browser quietly made an HTTP request to metadrive.io. What’s interesting is how their browser told the website about itself—and that's where headers come in.
Let’s start with the raw request:
GET / HTTP/1.1\r\n
Host: metadrive.io\r\n
Connection: keep-alive\r\n
Upgrade-Insecure-Requests: 1\r\n
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36\r\n
Accept: text/html, application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/ *; q=0.8,application/signed-exchange;v=b3;q=0.7\r\n
Accept-Encoding: gzip, deflate\r\n
Accept-Language: en-US, en; q=0.9\r\n
r\n
------------------------------------------------------------------------------------------------------------
Okay, deep breath!
Host Header – The MVP of HTTP/1.1
Host: metadrive.io\r\n
Why it matters:
In HTTP/1.1, the Host header is required. Without it, the server won’t know which website you want—especially important when one server hosts multiple sites. Think of it as the “to:” address on a letter.
------------------------------------------------------------------------------------------------------------
User-Agent – Browser's ID Card (Well, Sort Of)
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36\r\n
What it tells us:
This is your browser bragging about who it is. In this case: Browser identified as Chrome 136 on Windows 10 (64-bit)
Now here's the kicker:
This value is completely customizable. Anyone can spoof it. You and I can literally install browser extensions like User-Agent Switcher and pretend to be Googlebot, Internet Explorer from 2001, or even a toaster (okay, maybe not—but close!).
------------------------------------------------------------------------------------------------------------
Accept Headers – What the Client Wants
Accept: text/html, application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/ *; q=0.8,application/signed-
Accept-Encoding: gzip, deflate\r\n
Accept-Language: en-US, en; q=0.9\r\n
These are pretty straightforward.
Accept: What content types the browser can handle (HTML, XML, etc.)
Accept-Language: Tells the server the user's preferred languages. Useful for geo-profiling.
Accept-Encoding: Whether the browser can handle compressed responses like gzip.
Also, note the q value—it shows preference. For instance, q=0.9 means “I like XML, but not as much as plain HTML.”
------------------------------------------------------------------------------------------------------------
Cookies – The Trail of Breadcrumbs
(In this example its not there but adding so it will be eays for you)
Cookie: prov=...; hubspotutk=...; docs_hero=x; hero=none
prov=... – Likely a session or user identification token
hubspotutk=... – A HubSpot tracking cookie used for analytics and form submissions
docs_hero=x – Possibly a custom flag to track a docs page UI state
hero=none – Another UI state flag or feature toggle
Cookies are little pieces of data stored by your browser from websites. They're often used to maintain state—which is important because HTTP itself is stateless. Without cookies, every click would feel like starting from scratch.
Types of cookies:
Session Cookies: Gone when the browser closes.
Persistent Cookies: Stick around until they expire (or you delete them).
For us forensic folks, cookies can reveal:
Logins
Tracking IDs
User behavior across sessions
You’d be surprised how much we can correlate just from cookie IDs.
------------------------------------------------------------------------------------------------------------
Authorization – Base64 and Secrets
Authorization: Basic <Base64Encoded(username:password)>
Example:
Authorization: Basic bmV3dXNlcjpzM2NyM3RwYXNz
Here’s where you might find credentials. This is Basic Auth and it’s basically (pun intended) the base64 encoding of username:password.
So bmV3dXNlcjpzM2NyM3RwYXNz decodes to newuser:s3cr3tpass
Modern sites mostly use token-based auth or OAuth, but for internal apps or older services, you still find Basic Auth. When found, it’s gold for an attacker or an investigator.
------------------------------------------------------------------------------------------------------------
X-Forwarded-For – Tracing Real IPs (Kinda)
X-Forwarded-For: <IP>, <IP>
If a request passes through proxies, this header might show the original client IP.
BUT, it’s easily spoofed.
An attacker can just add their own X-Forwarded-For and pretend to come from anywhere (say, an internal IP like 192.168.1.11). Some servers trust this blindly—not good.
That’s why this header is a common target in IP-based bypasses.
------------------------------------------------------------------------------------------------------------
Proxy-Authorization – Auth to Use the Proxy
Proxy-Authorization: Basic bmV3dXNlcjpzM2NyM3RwYXNz
Like Authorization, but used when a client needs to authenticate to a proxy server. Again, base64—same risks apply.
------------------------------------------------------------------------------------------------------------
Referer (Yeah, It’s Misspelled) – Where You Came From
This tells the server which page you clicked from. Handy for:
Analytics (e.g., “what drove traffic here?”)
Security (e.g., detecting CSRF or phishing flows)
Investigation (e.g., mapping user navigation paths)
Here’s the cool part:
if you’re moving from HTTPS → HTTP, browsers are supposed to suppress or truncate this header. But in practice, some browsers still leak enough info to tell where you came from.
------------------------------------------------------------------------------------------------------------
Other Fun Headers
Upgrade-Insecure-Requests: 1 → Tells the server “hey, if you support HTTPS, switch me there.”
Cache-Control: max-age=0 → Basically says: “Please don’t serve me a cached page; I want it fresh.”
------------------------------------------------------------------------------------------------------------
Dissecting an HTTP Response– Let’s Get Nerdy for a Second
So far, we’ve talked a lot about HTTP requests—what the client sends to the server. But now it’s time to flip the script. Let’s talk about what the server sends back in response.
Let’s Start from the Top — Status Line
Here’s a classic example:
HTTP/1.1 200 OK
This single line tells you three key things:
Protocol Version: HTTP/1.1 — this should match the client’s request version.
Status Code: 200 — tells you if the request went okay or something broke.
Status Text: OK — human-readable, but the client doesn’t really care what this says. It could say "Success", "All Good", or even "Nice Try Buddy" 😄 — as long as the number is 200, the meaning is the same.
💡 Common Status Codes You Should Know
Let me list a few real-world ones we bump into all the time:
Code | Meaning |
100 | Continue – Client can keep sending request body |
200 | OK – Everything’s good |
301 | Moved Permanently – Resource has a new home |
302 | Found – Temporary redirect |
304 | Not Modified – Client’s cached copy is still good |
400 | Bad Request – Syntax error from client |
401 | Unauthorized – Need authentication |
403 | Forbidden – You don’t have permission |
404 | Not Found – Resource doesn’t exist |
407 | Proxy Auth Required – You need to auth via proxy |
500 | Internal Server Error – Oops, something’s broken |
503 | Service Unavailable – Overload or maintenance |
511 | Network Auth Required – Seen in public Wi-Fi portals |
For threat hunters:
Seeing lots of 400s from the same IP? That might be scanning/recon.
A sudden switch from 500s to 200s during POST requests? Could be SQL injection, where the server backend choked on bad input before the attacker got it right.
🔍 Real Response Header Breakdown
Here’s a full sample response:
accept-ranges: bytes\r\n
content-disposition: attachment\r\n
content-length: 1963\r\n
content-security-policy: default-src 'none'\r\n
server: Google-Edge-Cache\r\n
x-content-type-options: nosniff\r\n
x-frame-options: SAMEORIGIN\r\n
x-xss-protection: 0\r\n
x-request-id: c1349dbe-bb51-41bc-a142-e4ba95d94a1c\r\n
date: Sat, 24 May 2025 04:26:33 GMT\r\n
age: 38934\r\n
last-modified: Sat, 24 May 2025 04:24:20 GMT\r\n
etag: "45281ea"\r\n
content-type: application/octet-stream\r\n
alt-svc: h3=":443"; ma=2592000, h3-29=":443"; ma=2592000\r\n
cache-control: public,max-age=86400\r\n
coprocessor-response: download-server\r\n
\r\n
Now let’s decode it like detectives 🕵️:
Cache-Control, Expires, and ETag
These tell you how caching should work.
Cache-Control: private — Only the user’s browser should cache it, not shared proxies. ( if u see Cache-Control: public which means: The response is cacheable by any cache — both the user’s browser and shared caches )
Expires: <timestamp> — When the cache is no longer valid. or max-age=86400 (It remains fresh and reusable for 1 day)
ETag: "<hash>" — Unique fingerprint for the content; helps compare if content changed.
Great for web performance and forensic timeline building.
Content-Type and Content-Encoding
Tells you what kind of content and how it’s packed:
Content-Type: text/html; charset=utf-8 — HTML page in UTF-8 encoding.
or
content-type: application/octet-stream\r\n=tells the browser (or any client) that the server is sending raw binary data.
Content-Encoding: gzip — It's compressed, so your client needs to decompress.
Content-Length
Size of the actual data (after decompressing, if needed).
content-length: 1963: — 1963 bytes.
X-Frame-Options: SAMEORIGIN
Mitigates clickjacking by saying: “Only I can frame myself!”
Date
Exact time the response was generated. Useful when reconstructing timelines or tracking malware behavior.
date: Sat, 24 May 2025 04:26:33 GMT
Investigator Tip:
If your endpoint says it made the request at 1:52 PM, but the server's timestamp says it responded at 1:47 PM — you might have clock skew on the client. This can seriously mess with your timeline, so cross-check time sources always.
Fun fact: Some malware variants use this Date: header as a seed value for their DGA (Domain Generation Algorithm) — clever, huh?
Connection: keep-alive (if found)
With HTTP/1.1, one of the cool upgrades was allowing persistent connections — so your browser could reuse the same TCP session for multiple requests. This reduces overhead and speeds things up.
The client tells the server it supports this using:Connection: Keep-Alive
If the server agrees, it responds with:Connection: Keep-Alive
But if either side wants to close the connection : Connection: close
Investigator Tip:
If you're monitoring traffic and notice lots of "Connection: close" lines mid-session, it might indicate non-browser activity — like malware making single-use requests.
------------------------------------------------------------------------------------------------------------
What About Redirects?
Redirections are handled via a combination of:
300-series status codes (like 301, 302)
A Location: header that says: "Hey, go here instead!"
These redirects can be abused too. Malware campaigns use redirect chains to mask the origin of malicious content.
Forensics tip: Don’t stop at the first hop!
------------------------------------------------------------------------------------------------------------
Pro Tip: Watch Out for X- Headers
Both clients and servers can use custom headers that begin with X-. These can carry unique identifiers, debug info, or even tracking tokens. Example:
X-Request-Guid: <GUID>
This might help correlate a single session across multiple logs.
------------------------------------------------------------------------------------------------------------
HTTP Headers in Investigations
Let’s talk real-world usage. How do these headers help during an actual incident?
1. Pastebin & Data Exfil
Attackers often use public paste sites like Pastebin or SendSpace. Some malware is coded to automatically upload exfiltrated data using these services’ APIs.
If an attacker has RDP or VNC access, they might just open a browser and manually do it — but the network traffic (HTTP POST requests, User-Agent headers, and API URIs) will still leave footprints.
2. User-Agent Fingerprinting
If you're in a corporate environment, there’s probably a known set of legitimate User-Agent strings. Anything else? Could be:
Malware
Unauthorized browser
Portable or dev tools
Sometimes, malware adds its own version string in the User-Agent, helping investigators quickly fingerprint infections across the environment.
3. Credential Sniffing in HTTP Basic Auth
We touched on this earlier, but just a reminder — Basic Auth sends credentials like this:
Authorization: Basic bmV3dXNlcjpzM2NyM3RwYXNz
That Base64 string? It’s just user:password. If you’re capturing traffic, you can extract credentials directly.
4. URI Analysis
Every URI tells a story. It could be:
Web searches
Form submissions
API calls
Malware callbacks
Pairing URI analysis with malware analysis gives you powerful insight into what the attacker was trying to do — exfiltrate data, move laterally, connect to command-and-control, or worse.
5. When the Disk Fails, the Network Tells All
Modern attackers are smart:
They use private browsing
They run portable apps from USBs
They clean up after themselves
So maybe there’s no trace left on the disk. But network traffic? That’s harder to erase. If you have PCAPs or proxy logs, you’ve still got a shot.
------------------------------------------------------------------------------------------------------------
Final Thoughts
HTTP headers might seem boring on the surface, but when you dig in — they’re loaded with useful info. From persistent connections to User-Agent strings to caching behavior and time syncing — every bit tells you something.
Hope this post made it easier to see headers not as noise, but as gold dust for a forensic investigator.
-------------------------------------------------------Dean-------------------------------------------
Comments