Proxies in DFIR– Deep Dive into Squid Log & Cache Forensics with Calamaris and Extraction Techniques
- 5 hours ago
- 6 min read

I’m going to walk you through how to analyze proxy logs—what tools you can use, what patterns to look for, and where to dig deeper—but keep in mind, every investigation is different, so while I’ll show you the process, the real analysis is something you will need to drive based on your case.
Let’s talk about something that’s often sitting quietly in the background of many networks but plays a huge role when an investigation kicks off: Proxies. Whether you’re a forensic analyst, an incident responder, or just someone interested in how network traffic is monitored, proxies are your silent allies.
🧭 First Things First: What Does a Proxy Even Do?
Think of a proxy like a middleman between users and the internet. Every time a user accesses a website, the request goes through the proxy first. This is awesome for:
Monitoring user activity: Who went where, when, and what happened.
Enforcing policies: Blocking sketchy sites or filtering content.
Caching: Saving bandwidth by storing frequently accessed content locally.
And the best part? Proxies keep logs. Gold mines for investigations.
🔍 Why Proxy Logs Are a Big Deal in Forensics
When you're dealing with a potential breach or malware incident, one of the first questions is:
Who visited what site?
Now, imagine going machine-by-machine trying to find that out… 😫 That’s where proxy logs shine:
✅ Speed up investigations
✅ Quickly identify systems reaching out to malicious URLs
✅ Track timelines without touching each device individually
And even better—some proxies cache content. So even if malware was downloaded and deleted from a device, the proxy might still have a copy in its cache. Lifesaver.
🐙 Enter Squid Proxy – A Favorite
Squid is a widely used HTTP proxy server. If you’ve worked in enterprise environments, chances are you’ve run into it.
🧾 Key Squid File Paths:
Config file: /etc/squid/squid.conf
Logs: /var/log/squid/*
Cache: /var/spool/squid/
These are your go-to places when digging into evidence.
-----------------------------------------------------------------------------------------------------------
📈 What You Can Learn from Squid Logs
Squid logs tell you things like:
Field | Example | What It Means |
UNIX Timestamp | 1608838269.433 | Date/time of the request |
Response Time | 531 | Time taken to respond (in ms) |
Client IP | 192.168.10.10 | Who made the request |
Cache/HTTP Status | TCP_MISS/200 | Was it cached? Was it successful? |
Reply Size | 17746 | Size of response |
HTTP Method | GET | Type of request |
URL | Site accessed | |
Source Server | DIRECT/192.168.0.0 | Origin server IP |
MIME Type | text/html | Content type returned |
So from one single log line, you can know who accessed what, when, and how the proxy handled it.
🧠 Bonus Info: Cache Status Codes That Help You Analyze
TCP_HIT: Content served from cache
TCP_MISS: Had to fetch from the internet
TCP_REFRESH_HIT: Cached content was revalidated
TCP_DENIED: Blocked by proxy rules
This gives you an idea of how users interact with sites and how often content is being reused.
-----------------------------------------------------------------------------------------------------------
⚠️ Default Squid Logs Are Good… But Not Perfect
Here’s the catch:By default, Squid doesn’t log everything you might want during an investigation.
For example:
🚫 No User-Agent
🚫 No Referer
🚫 Query strings (like ?user=admin&pass=1234) are stripped by default
This can hurt if malware uses obfuscated URLs or redirects. But don’t worry—Squid is super customizable.
🔧 How to Improve Squid Logs for Better Visibility
You can change the Squid log format to include things like the User-Agent and Referer.
✅ Example Configuration (Put this in squid.conf):
logformat combined %>a %[ui %[un [%tl] "%rm %ru HTTP/%rv" %>Hs %<st "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
access_log /var/log/squid/access.log combined
🧠 What It Logs:
%>a: Client IP
%tl: Local time (human-readable)
%rm %ru: HTTP method and URL
%>Hs: Status code (200, 404, etc.)
%<st: Response size
%{Referer}>h: Page that referred the user
%{User-Agent}>h: Browser or software used
%Ss:%Sh: Cache and hierarchy status
Boom. Now your logs are a forensic analyst’s dream.
🔍 Sample Human-Readable Log Entry
192.1688.10.10 - - [30/Apr/2025:00:00:00 +0000] "GET https://www.cyberengage.org/...js HTTP/1.1" 200 38986 "http://https://www.cyberengage11.org/..." "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0)... Firefox/47.0" TCP_MISS:HIER_DIRECT
From this one line, we can tell:
The user at IP 192.1688.10.10 accessed a JavaScript file
The browser was Firefox on Windows
The request wasn't cached (TCP_MISS)
That’s a full story from one log entry.
-----------------------------------------------------------------------------------------------------------
🛑 But Wait—A Word of Caution!
Want to log query strings or detailed headers? You must change your config.
# In /etc/squid/squid.conf
strip_query_terms off
⚠️ Warning: This could capture sensitive data (like usernames/passwords in URLs), so make sure you’re authorized to log this. Respect privacy policies.
-----------------------------------------------------------------------------------------------------------
Alright, let’s get real.
When you're looking at a Squid proxy for investigation, it can look like a mess of logs, cache files, and cryptic timestamps. But trust me, with the right tools and techniques, you'll be digging up web activity and cached secrets like a forensic wizard
.
🛠️ Let’s Begin with a Tool – Calamaris
So first up, there's this pretty slick tool called Calamaris – great for getting summaries out of Squid logs. It's not fancy-looking, but it's efficient, and sometimes that's all you need. You can check out the tool here: Calamaris Official Page
To install it inside your WSL (Windows Subsystem for Linux), just run:
sudo apt install calamaris
Boom. Installed.
Now let’s analyze a Squid access log:
cat /mnt/c/Users/Akash\'s/Downloads/proxy/proxy/proxy2/squid/squid/access.log | calamaris -a
And just like that, it spits out a clean summary. Requests, clients, status codes—it’s all there. This makes the initial review of a log super simple.


🔎 BUT... there’s a catch. If your Squid logs use a custom format (which happens often in real environments), Calamaris might fumble. So if your output looks weird or incomplete, don’t panic—we’ll have to get our hands dirty and analyze stuff manually. Let’s keep going.
🕰️ Dealing with Timestamps – From Unix to Human
By default, Squid logs come with UNIX epoch timestamps. Unless you're a robot, they aren't human-friendly. But converting them is easy.
Use this:
date -u -d @1742462110.226
That -u gives you UTC format (ideal for timeline consistency).
Now you're thinking—"Akash, am I supposed to convert each timestamp manually?"
Heck no. Here's a one-liner that’ll do the job for the entire log file:
sudo cat /mnt/c/Users/Akash\'s/Downloads/proxy/proxy2/squid/squid/access.log | awk '{$1=strftime("%F %T", $1, 1); print $0}' > /mnt/c/Users/Akash\'s/Downloads/readable.txt

This outputs a clean, readable version of your log into readable.txt.

📂 Important Files to Collect in a Squid Forensic Investigation
While you’re getting into the logs, don’t forget to grab these essentials:
/etc/squid/squid.conf – Config file that tells you how the proxy works, where logs are stored, ACLs, cache settings, etc.
/var/log/squid/access.log – The main access log (you’ll be here a lot)
/var/log/squid/referer.log, /useragent.log, /cache.log, /store.log – All useful for understanding context like who clicked what, what browser they used, cache hits/misses, etc.
🔍 Starting Your Investigation – Log Hunting
Let’s say you’re investigating activity around google.com. Start basic:
grep google.com access.log
Now you can narrow it down further. Want to see only GET or POST requests?
grep "GET.*google.com" access.log
Start building a timeline from here—this is your story-building phase in an incident investigation.
-----------------------------------------------------------------------------------------------------------
💾 Let’s Talk About Cache – One of the Juiciest Parts
Squid caches web objects to speed things up. This means files, URLs, images, even docs might be sitting there waiting to be carved out.
Default cache path:
/var/spool/squid/
Here, cached files are stored in a structured format like:
/var/spool/squid/00/05/000005F4
If you want to inspect these:
grep -rail www.google.com /var/spool/squid/
Flags explained:
-r: Recursively search
-a: Treat files as ASCII
-i: Case-insensitive
-l: Show filenames only
-F: Literal search (no regex overhead)
Then use strings to dig deeper into the cache object:
strings -n 10 /var/spool/squid/00/05/000005F4 | grep ^http | head -n 1
This gives you clean URLs that were cached.
-----------------------------------------------------------------------------------------------------------
📤 Extracting Actual Files from Cache
Let’s say you found a cached .doc file and want to pull it out. Here's how:
Find it:
grep -rail file.doc ./
Example output: 00/0e/00000E20
Examine it:
strings -n 10 00/0e/00000E20
Check for headers like:
Content-Type:
Cache-Control:
Expires:
This tells you what’s inside the file and why it was cached.
Carve the file:
Use a hex editor like ghex to open the file and locate the 0x0D0A0D0A byte pattern (that’s the HTTP header/body separator). Delete all the bytes before this pattern and save the result to a new file.
Identify the file type:
file carved_output
If it says something like “Microsoft Word Document,” you’ve got your artifact extracted. Mission success! 💥
-----------------------------------------------------------------------------------------------------------
🔗 Extra Resources You’ll Love
Want to keep up with new tools for analyzing Squid? Bookmark this:
And don’t forget to explore another gem:
👉 SquidView Tool – Neat for interactive visual log analysis.
-----------------------------------------------------------------------------------------------------------
🧠 Final Thought
Log and cache analysis in Squid isn't just about reading boring log lines. It's storytelling through network artifacts. From timestamps to URLs, from GETs to cached DOC files—every bit tells you something.
The trick is not just knowing what to look for—but knowing how to get it out.
If you're starting your journey with Squid forensics, this is your friendly roadmap. And hey, the more you do it, the more patterns you start seeing. It becomes second nature.
---------------------------------------------Dean----------------------------------------------------------