top of page
Search

Proxies in DFIR– Deep Dive into Squid Log & Cache Forensics with Calamaris and Extraction Techniques

  • 5 hours ago
  • 6 min read

I’m going to walk you through how to analyze proxy logs—what tools you can use, what patterns to look for, and where to dig deeper—but keep in mind, every investigation is different, so while I’ll show you the process, the real analysis is something you will need to drive based on your case.

Let’s talk about something that’s often sitting quietly in the background of many networks but plays a huge role when an investigation kicks off: Proxies. Whether you’re a forensic analyst, an incident responder, or just someone interested in how network traffic is monitored, proxies are your silent allies.



🧭 First Things First: What Does a Proxy Even Do?

Think of a proxy like a middleman between users and the internet. Every time a user accesses a website, the request goes through the proxy first. This is awesome for:


  • Monitoring user activity: Who went where, when, and what happened.

  • Enforcing policies: Blocking sketchy sites or filtering content.

  • Caching: Saving bandwidth by storing frequently accessed content locally.


And the best part? Proxies keep logs. Gold mines for investigations.


🔍 Why Proxy Logs Are a Big Deal in Forensics

When you're dealing with a potential breach or malware incident, one of the first questions is:


Who visited what site?

Now, imagine going machine-by-machine trying to find that out… 😫 That’s where proxy logs shine:


Speed up investigations

Quickly identify systems reaching out to malicious URLs

Track timelines without touching each device individually


And even better—some proxies cache content. So even if malware was downloaded and deleted from a device, the proxy might still have a copy in its cache. Lifesaver.

🐙 Enter Squid Proxy – A Favorite

Squid is a widely used HTTP proxy server. If you’ve worked in enterprise environments, chances are you’ve run into it.


🧾 Key Squid File Paths:

  • Config file: /etc/squid/squid.conf

  • Logs: /var/log/squid/*

  • Cache: /var/spool/squid/


These are your go-to places when digging into evidence.


-----------------------------------------------------------------------------------------------------------


📈 What You Can Learn from Squid Logs

Squid logs tell you things like:

Field

Example

What It Means

UNIX Timestamp

1608838269.433

Date/time of the request

Response Time

531

Time taken to respond (in ms)

Client IP

192.168.10.10

Who made the request

Cache/HTTP Status

TCP_MISS/200

Was it cached? Was it successful?

Reply Size

17746

Size of response

HTTP Method

GET

Type of request

URL

Site accessed

Source Server

DIRECT/192.168.0.0

Origin server IP

MIME Type

text/html

Content type returned


So from one single log line, you can know who accessed what, when, and how the proxy handled it.



🧠 Bonus Info: Cache Status Codes That Help You Analyze

  • TCP_HIT: Content served from cache

  • TCP_MISS: Had to fetch from the internet

  • TCP_REFRESH_HIT: Cached content was revalidated

  • TCP_DENIED: Blocked by proxy rules


This gives you an idea of how users interact with sites and how often content is being reused.


-----------------------------------------------------------------------------------------------------------


⚠️ Default Squid Logs Are Good… But Not Perfect

Here’s the catch:By default, Squid doesn’t log everything you might want during an investigation.

For example:

🚫 No User-Agent

🚫 No Referer

🚫 Query strings (like ?user=admin&pass=1234) are stripped by default


This can hurt if malware uses obfuscated URLs or redirects. But don’t worry—Squid is super customizable.

🔧 How to Improve Squid Logs for Better Visibility

You can change the Squid log format to include things like the User-Agent and Referer.


✅ Example Configuration (Put this in squid.conf):

logformat combined %>a %[ui %[un [%tl] "%rm %ru HTTP/%rv" %>Hs %<st "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh
access_log /var/log/squid/access.log combined

🧠 What It Logs:

  • %>a: Client IP

  • %tl: Local time (human-readable)

  • %rm %ru: HTTP method and URL

  • %>Hs: Status code (200, 404, etc.)

  • %<st: Response size

  • %{Referer}>h: Page that referred the user

  • %{User-Agent}>h: Browser or software used

  • %Ss:%Sh: Cache and hierarchy status


Boom. Now your logs are a forensic analyst’s dream.

🔍 Sample Human-Readable Log Entry

192.1688.10.10 - - [30/Apr/2025:00:00:00 +0000] "GET https://www.cyberengage.org/...js HTTP/1.1" 200 38986 "http://https://www.cyberengage11.org/..." "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0)... Firefox/47.0" TCP_MISS:HIER_DIRECT

From this one line, we can tell:

  • The user at IP 192.1688.10.10 accessed a JavaScript file

  • The browser was Firefox on Windows

  • The request wasn't cached (TCP_MISS)


That’s a full story from one log entry.

-----------------------------------------------------------------------------------------------------------


🛑 But Wait—A Word of Caution!

Want to log query strings or detailed headers? You must change your config.

# In /etc/squid/squid.conf
strip_query_terms off
⚠️ Warning: This could capture sensitive data (like usernames/passwords in URLs), so make sure you’re authorized to log this. Respect privacy policies.

-----------------------------------------------------------------------------------------------------------


Alright, let’s get real.

When you're looking at a Squid proxy for investigation, it can look like a mess of logs, cache files, and cryptic timestamps. But trust me, with the right tools and techniques, you'll be digging up web activity and cached secrets like a forensic wizard

.

🛠️ Let’s Begin with a Tool – Calamaris

So first up, there's this pretty slick tool called Calamaris – great for getting summaries out of Squid logs. It's not fancy-looking, but it's efficient, and sometimes that's all you need. You can check out the tool here: Calamaris Official Page


To install it inside your WSL (Windows Subsystem for Linux), just run:

sudo apt install calamaris
Boom. Installed.

Now let’s analyze a Squid access log:

cat /mnt/c/Users/Akash\'s/Downloads/proxy/proxy/proxy2/squid/squid/access.log | calamaris -a

And just like that, it spits out a clean summary. Requests, clients, status codes—it’s all there. This makes the initial review of a log super simple.




🔎 BUT... there’s a catch. If your Squid logs use a custom format (which happens often in real environments), Calamaris might fumble. So if your output looks weird or incomplete, don’t panic—we’ll have to get our hands dirty and analyze stuff manually. Let’s keep going.


🕰️ Dealing with Timestamps – From Unix to Human

By default, Squid logs come with UNIX epoch timestamps. Unless you're a robot, they aren't human-friendly. But converting them is easy.


Use this:

date -u -d @1742462110.226

That -u gives you UTC format (ideal for timeline consistency).


Now you're thinking—"Akash, am I supposed to convert each timestamp manually?"

Heck no. Here's a one-liner that’ll do the job for the entire log file:

sudo cat /mnt/c/Users/Akash\'s/Downloads/proxy/proxy2/squid/squid/access.log | awk '{$1=strftime("%F %T", $1, 1); print $0}' > /mnt/c/Users/Akash\'s/Downloads/readable.txt

This outputs a clean, readable version of your log into readable.txt.




📂 Important Files to Collect in a Squid Forensic Investigation

While you’re getting into the logs, don’t forget to grab these essentials:


  • /etc/squid/squid.conf – Config file that tells you how the proxy works, where logs are stored, ACLs, cache settings, etc.

  • /var/log/squid/access.log – The main access log (you’ll be here a lot)

  • /var/log/squid/referer.log, /useragent.log, /cache.log, /store.log – All useful for understanding context like who clicked what, what browser they used, cache hits/misses, etc.


🔍 Starting Your Investigation – Log Hunting

Let’s say you’re investigating activity around google.com. Start basic:

grep google.com access.log

Now you can narrow it down further. Want to see only GET or POST requests?

grep "GET.*google.com" access.log
Start building a timeline from here—this is your story-building phase in an incident investigation.

-----------------------------------------------------------------------------------------------------------


💾 Let’s Talk About Cache – One of the Juiciest Parts

Squid caches web objects to speed things up. This means files, URLs, images, even docs might be sitting there waiting to be carved out.


Default cache path:

/var/spool/squid/

Here, cached files are stored in a structured format like:

/var/spool/squid/00/05/000005F4

If you want to inspect these:

grep -rail www.google.com /var/spool/squid/

Flags explained:

  • -r: Recursively search

  • -a: Treat files as ASCII

  • -i: Case-insensitive

  • -l: Show filenames only

  • -F: Literal search (no regex overhead)


Then use strings to dig deeper into the cache object:

strings -n 10 /var/spool/squid/00/05/000005F4 | grep ^http | head -n 1
This gives you clean URLs that were cached.

-----------------------------------------------------------------------------------------------------------


📤 Extracting Actual Files from Cache

Let’s say you found a cached .doc file and want to pull it out. Here's how:


  1. Find it:

grep -rail file.doc ./
Example output: 00/0e/00000E20

  1. Examine it:

strings -n 10 00/0e/00000E20

Check for headers like:

  • Content-Type:

  • Cache-Control:

  • Expires:


This tells you what’s inside the file and why it was cached.

  1. Carve the file:

Use a hex editor like ghex to open the file and locate the 0x0D0A0D0A byte pattern (that’s the HTTP header/body separator). Delete all the bytes before this pattern and save the result to a new file.


  1. Identify the file type:

file carved_output
If it says something like “Microsoft Word Document,” you’ve got your artifact extracted. Mission success! 💥

-----------------------------------------------------------------------------------------------------------


🔗 Extra Resources You’ll Love

Want to keep up with new tools for analyzing Squid? Bookmark this:

And don’t forget to explore another gem:

👉 SquidView Tool – Neat for interactive visual log analysis.


-----------------------------------------------------------------------------------------------------------


🧠 Final Thought

Log and cache analysis in Squid isn't just about reading boring log lines. It's storytelling through network artifacts. From timestamps to URLs, from GETs to cached DOC files—every bit tells you something.


The trick is not just knowing what to look for—but knowing how to get it out.

If you're starting your journey with Squid forensics, this is your friendly roadmap. And hey, the more you do it, the more patterns you start seeing. It becomes second nature.


---------------------------------------------Dean----------------------------------------------------------

 
 
 
bottom of page