Proxies in DFIR– Deep Dive into Squid Log & Cache Forensics with Calamaris and Extraction Techniques

5 hours ago
6 min read

I’m going to walk you through how to analyze proxy logs—what tools you can use, what patterns to look for, and where to dig deeper—but keep in mind, every investigation is different, so while I’ll show you the process, the real analysis is something you will need to drive based on your case.

Let’s talk about something that’s often sitting quietly in the background of many networks but plays a huge role when an investigation kicks off: Proxies. Whether you’re a forensic analyst, an incident responder, or just someone interested in how network traffic is monitored, proxies are your silent allies.

🧭 First Things First: What Does a Proxy Even Do?

Think of a proxy like a middleman between users and the internet. Every time a user accesses a website, the request goes through the proxy first. This is awesome for:

Monitoring user activity: Who went where, when, and what happened.
Enforcing policies: Blocking sketchy sites or filtering content.
Caching: Saving bandwidth by storing frequently accessed content locally.

And the best part? Proxies keep logs. Gold mines for investigations.

🔍 Why Proxy Logs Are a Big Deal in Forensics

When you're dealing with a potential breach or malware incident, one of the first questions is:

Who visited what site?

Now, imagine going machine-by-machine trying to find that out… 😫 That’s where proxy logs shine:

✅ Speed up investigations

✅ Quickly identify systems reaching out to malicious URLs

✅ Track timelines without touching each device individually

And even better—some proxies cache content. So even if malware was downloaded and deleted from a device, the proxy might still have a copy in its cache. Lifesaver.

🐙 Enter Squid Proxy – A Favorite

Squid is a widely used HTTP proxy server. If you’ve worked in enterprise environments, chances are you’ve run into it.

🧾 Key Squid File Paths:

Config file: /etc/squid/squid.conf
Logs: /var/log/squid/*
Cache: /var/spool/squid/

These are your go-to places when digging into evidence.

-----------------------------------------------------------------------------------------------------------

📈 What You Can Learn from Squid Logs

Squid logs tell you things like:

Field	Example	What It Means
UNIX Timestamp	1608838269.433	Date/time of the request
Response Time	531	Time taken to respond (in ms)
Client IP	192.168.10.10	Who made the request
Cache/HTTP Status	TCP_MISS/200	Was it cached? Was it successful?
Reply Size	17746	Size of response
HTTP Method	GET	Type of request
URL	https://www.cyberengage.org/	Site accessed
Source Server	DIRECT/192.168.0.0	Origin server IP
MIME Type	text/html	Content type returned

So from one single log line, you can know who accessed what, when, and how the proxy handled it.

🧠 Bonus Info: Cache Status Codes That Help You Analyze

TCP_HIT: Content served from cache
TCP_MISS: Had to fetch from the internet
TCP_REFRESH_HIT: Cached content was revalidated
TCP_DENIED: Blocked by proxy rules

This gives you an idea of how users interact with sites and how often content is being reused.

-----------------------------------------------------------------------------------------------------------

⚠️ Default Squid Logs Are Good… But Not Perfect

Here’s the catch:By default, Squid doesn’t log everything you might want during an investigation.

For example:

🚫 No User-Agent

🚫 No Referer

🚫 Query strings (like ?user=admin&pass=1234) are stripped by default

This can hurt if malware uses obfuscated URLs or redirects. But don’t worry—Squid is super customizable.

🔧 How to Improve Squid Logs for Better Visibility

You can change the Squid log format to include things like the User-Agent and Referer.

✅ Example Configuration (Put this in squid.conf):

logformat combined %>a %[ui %[un [%tl] "%rm %ru HTTP/%rv" %>Hs %<st "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
access_log /var/log/squid/access.log combined

🧠 What It Logs:

%>a: Client IP
%tl: Local time (human-readable)
%rm %ru: HTTP method and URL
%>Hs: Status code (200, 404, etc.)
%<st: Response size
%{Referer}>h: Page that referred the user
%{User-Agent}>h: Browser or software used
%Ss:%Sh: Cache and hierarchy status

Boom. Now your logs are a forensic analyst’s dream.

🔍 Sample Human-Readable Log Entry

192.1688.10.10 - - [30/Apr/2025:00:00:00 +0000] "GET https://www.cyberengage.org/...js HTTP/1.1" 200 38986 "http://https://www.cyberengage11.org/..." "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0)... Firefox/47.0" TCP_MISS:HIER_DIRECT

From this one line, we can tell:

The user at IP 192.1688.10.10 accessed a JavaScript file
The browser was Firefox on Windows
The request wasn't cached (TCP_MISS)

That’s a full story from one log entry.

-----------------------------------------------------------------------------------------------------------

🛑 But Wait—A Word of Caution!

Want to log query strings or detailed headers? You must change your config.

# In /etc/squid/squid.conf
strip_query_terms off

⚠️ Warning: This could capture sensitive data (like usernames/passwords in URLs), so make sure you’re authorized to log this. Respect privacy policies.

-----------------------------------------------------------------------------------------------------------

Alright, let’s get real.

When you're looking at a Squid proxy for investigation, it can look like a mess of logs, cache files, and cryptic timestamps. But trust me, with the right tools and techniques, you'll be digging up web activity and cached secrets like a forensic wizard

🛠️ Let’s Begin with a Tool – Calamaris

So first up, there's this pretty slick tool called Calamaris – great for getting summaries out of Squid logs. It's not fancy-looking, but it's efficient, and sometimes that's all you need. You can check out the tool here: Calamaris Official Page

To install it inside your WSL (Windows Subsystem for Linux), just run:

sudo apt install calamaris

Boom. Installed.

Now let’s analyze a Squid access log:

cat /mnt/c/Users/Akash\'s/Downloads/proxy/proxy/proxy2/squid/squid/access.log | calamaris -a

And just like that, it spits out a clean summary. Requests, clients, status codes—it’s all there. This makes the initial review of a log super simple.

🔎 BUT... there’s a catch. If your Squid logs use a custom format (which happens often in real environments), Calamaris might fumble. So if your output looks weird or incomplete, don’t panic—we’ll have to get our hands dirty and analyze stuff manually. Let’s keep going.

🕰️ Dealing with Timestamps – From Unix to Human

By default, Squid logs come with UNIX epoch timestamps. Unless you're a robot, they aren't human-friendly. But converting them is easy.

Use this:

date -u -d @1742462110.226

That -u gives you UTC format (ideal for timeline consistency).

Now you're thinking—"Akash, am I supposed to convert each timestamp manually?"

Heck no. Here's a one-liner that’ll do the job for the entire log file:

sudo cat /mnt/c/Users/Akash\'s/Downloads/proxy/proxy2/squid/squid/access.log | awk '{$1=strftime("%F %T", $1, 1); print $0}' > /mnt/c/Users/Akash\'s/Downloads/readable.txt

This outputs a clean, readable version of your log into readable.txt.

📂 Important Files to Collect in a Squid Forensic Investigation

While you’re getting into the logs, don’t forget to grab these essentials:

/etc/squid/squid.conf – Config file that tells you how the proxy works, where logs are stored, ACLs, cache settings, etc.
/var/log/squid/access.log – The main access log (you’ll be here a lot)
/var/log/squid/referer.log, /useragent.log, /cache.log, /store.log – All useful for understanding context like who clicked what, what browser they used, cache hits/misses, etc.

🔍 Starting Your Investigation – Log Hunting

Let’s say you’re investigating activity around google.com. Start basic:

grep google.com access.log

Now you can narrow it down further. Want to see only GET or POST requests?

grep "GET.*google.com" access.log

Start building a timeline from here—this is your story-building phase in an incident investigation.

-----------------------------------------------------------------------------------------------------------

💾 Let’s Talk About Cache – One of the Juiciest Parts

Squid caches web objects to speed things up. This means files, URLs, images, even docs might be sitting there waiting to be carved out.

Default cache path:

/var/spool/squid/

Here, cached files are stored in a structured format like:

/var/spool/squid/00/05/000005F4

If you want to inspect these:

grep -rail www.google.com /var/spool/squid/

Flags explained:

-r: Recursively search
-a: Treat files as ASCII
-i: Case-insensitive
-l: Show filenames only
-F: Literal search (no regex overhead)

Then use strings to dig deeper into the cache object:

strings -n 10 /var/spool/squid/00/05/000005F4 | grep ^http | head -n 1

This gives you clean URLs that were cached.

-----------------------------------------------------------------------------------------------------------

📤 Extracting Actual Files from Cache

Let’s say you found a cached .doc file and want to pull it out. Here's how:

Find it:

grep -rail file.doc ./

Example output: 00/0e/00000E20

Examine it:

strings -n 10 00/0e/00000E20

Check for headers like:

Content-Type:
Cache-Control:
Expires:

This tells you what’s inside the file and why it was cached.

Carve the file:

Use a hex editor like ghex to open the file and locate the 0x0D0A0D0A byte pattern (that’s the HTTP header/body separator). Delete all the bytes before this pattern and save the result to a new file.

Identify the file type:

file carved_output

If it says something like “Microsoft Word Document,” you’ve got your artifact extracted. Mission success! 💥

-----------------------------------------------------------------------------------------------------------

🔗 Extra Resources You’ll Love

Want to keep up with new tools for analyzing Squid? Bookmark this:

👉 Squid Log Analysis Tools List (Official)

And don’t forget to explore another gem:

👉 SquidView Tool – Neat for interactive visual log analysis.

-----------------------------------------------------------------------------------------------------------

🧠 Final Thought

Log and cache analysis in Squid isn't just about reading boring log lines. It's storytelling through network artifacts. From timestamps to URLs, from GETs to cached DOC files—every bit tells you something.

The trick is not just knowing what to look for—but knowing how to get it out.

If you're starting your journey with Squid forensics, this is your friendly roadmap. And hey, the more you do it, the more patterns you start seeing. It becomes second nature.

---------------------------------------------Dean----------------------------------------------------------

Actively looking roles in cybersecurity. If you have a reference or a job opportunity, your support would mean the world to me!

Proxies in DFIR– Deep Dive into Squid Log & Cache Forensics with Calamaris and Extraction Techniques

🧭 First Things First: What Does a Proxy Even Do?

🔍 Why Proxy Logs Are a Big Deal in Forensics

🐙 Enter Squid Proxy – A Favorite

🧾 Key Squid File Paths:

📈 What You Can Learn from Squid Logs

🧠 Bonus Info: Cache Status Codes That Help You Analyze

⚠️ Default Squid Logs Are Good… But Not Perfect

🔧 How to Improve Squid Logs for Better Visibility

✅ Example Configuration (Put this in squid.conf):

🧠 What It Logs:

🔍 Sample Human-Readable Log Entry

🛑 But Wait—A Word of Caution!

🛠️ Let’s Begin with a Tool – Calamaris

🕰️ Dealing with Timestamps – From Unix to Human

📂 Important Files to Collect in a Squid Forensic Investigation

🔍 Starting Your Investigation – Log Hunting

💾 Let’s Talk About Cache – One of the Juiciest Parts

📤 Extracting Actual Files from Cache

🔗 Extra Resources You’ll Love

🧠 Final Thought

Recent Posts