Why learning a Forensic Artifact matters?

diyinfosec
8 min readFeb 3, 2022

--

5-steps to learning an artifact and how I used it to learn a thing or two about NTFS.

As security analysts, I am sure some of us have had this question — what’s the best way to improve my host forensics skills? Should I learn more about tools or should I learn more about the artifacts themselves?

Making an argument for learning tools is straightforward. A security analyst’s job is built around tools. We spend our time using them, learning their APIs/commandline switches, chaining them together, raising bugs, and suggesting improvements.

Learning artifacts deeply on the other hand isn’t easy to justify. In fact, making counterarguments is easier. First, there are too many artifacts to learn. It’s not worth your time. Even if you learn, how much will it help in your daily job? For example, to determine if a user logged on to a Windows host, you only want to verify if there are any event ID 4624 in the eventlog. You don’t need to know that entries in the eventlog are stored as binary-encoded XML inside 64 kB Elf Chunks.

So it’s pretty obvious, right? Just learn the tools and leave figuring out the artifacts to the tool writers. In my experience, however, learning a few artifacts deeply have created new learning paths and provided a much better understanding of the tools.

How do I make these statements? To answer that, I’ll share my experience of understanding an artifact. Instead of writing it as a story, I thought it might be easier to consume as “The 5-step process I used to learn an artifact”. In each of these steps I have added a My Experience section to mention what I actually did and the outcomes.

The steps themselves:

Step 1— Pick an Artifact

Just to clear the definition, an artifact is a binary data structure that contains something of forensic value (activity + timestamps). A few examples of Windows forensic artifacts are Prefetch, LNK files, $MFT, $USNJournal, $LogFile.

Picking an artifact should be fairly straightforward. It can be something that interests you, something that came across in a recent incident, or an artifact that shows up frequently.

My Experience (1):

Picking up an artifact just happened for me. I had goofed up a forensics interview where they asked quite a few questions about file systems, particularly NTFS. So I thought I’ll start by understanding NTFS. The heart of NTFS is a file called the $MFT (Master File Table). So this would become the artifact of my choice.

Back to Index

Step 2— Learn to use the existing tools

What is the easiest way to make sense of an artifact? Use existing tools! Find the most common ones, install and try them and out. What information do they provide? What terms do they use? Get a feel for the artifact. You will learn how to think about it. You will now have a mental model of the artifact.

My Experience (2):

For studying the MFT, I used a visual tool called Active Disk Editor. It allowed me to do things like open up a drive, select any file I want and see how it looked inside the MFT.

Active Disk Editor showing the MFT record of a file

After spending some time with Active Disk Editor (and the linux-ntfs documentation), I had my own mental model for the MFT. It went something like this: The MFT is like a warehouse. Inside the warehouse, every file and directory is stored in equal-sized cardboard boxes (MFT records). Inside every box, there were smaller boxes called “attributes”. Each attribute contains specific information about the file. So if you open one of the attribute boxes you will find timestamps and if you open another you will find the file’s contents and so on.

This made it easy for me to get started and ask more questions (the answers don’t matter, but I have given them for completeness):

  • If the boxes are of equal sizes, how do we differentiate between a file or a directory? Using the MFT header.
  • If a file’s contents are bigger than the box what happens then? Non-resident attributes and data runs.
  • If a file is deleted is the box destroyed? No, the MFT record gets reused.
  • Does a file/directory always have attributes (smaller boxes)? Yes, at a minimum you will see SI,FN, DATA (for files), and INDX* (for directories) attributes.

There are just too many questions to add here. But overall, I was able to experiment and learn. I created different types of files — small, large, encrypted, compressed, etc and observed how the MFT record looked, what attributes were created, and what was inside those attributes. I was learning a lot and it was exciting.

Side Note: Mental models are your choice, you can choose what works for you best. Sometimes tools also come with their own mental models. For example, TSK breaks down a file system into 5 different layers (instead of cardboard boxes :)).

Back to Index

Step 3— Parse the Artifact byte-by-byte

Now that you understand the artifact better, try to interpret it yourself. Write your own template to parse it. You will usually do this with a Hex editor or a tool like Kaitai Struct. Wait! why write a parser when there are tens of them already out there? You definitely should. It is a great way to understand the quirks of working with hex. More importantly, you will be surprised to learn the number of “Unknown” fields, this is true especially for proprietary formats. Maybe those bytes were there all along waiting to be interpreted by you. You never know!

My Experience (3):

I chose to write a template using 010 Editor because I had seen it in some of Didier Steven’s blog posts. To acquire the $MFT I used the icat tool from TSK icat \\.\c: 0–128–6 > mft.bin.

My attempt to parse the $MFT. I started by parsing a single record, and that too only the record header :)

This probably took me the longest time. First, I didn’t know how to use a hex editor. Second, little-endian took some pRactice to get used to (because you have to read the bytes backward). Lastly, things that I thought were simple weren’t so. Unknown fields apart, even known fields were challenging. For example, NTFS stores date as absolute time. The date02-Feb-2022 00:34:11 would be represented as 1D817CC98B20380. To understand this I had to learn about absolute time, epoch and precision. (I also made an 8-minute video explaining this).

Finally, I did write my own template. I relied heavily on Active Disk Editor and NTFS Linux documentation. I also found an MFT template from 2018 written by Eric Zimmerman. I didn’t want to copy from it and used it only as an inspiration :)

With the template done, I was able to ask more fundamental questions:

  • How many attributes can a file have? Attribute ID is a 16-bit field, so a file can have 65535 attributes.
  • Can you create a file with 65536 attributes? No. FN and DATA are the only attributes that can occur multiple times in a file. However, there are soft limits that allow you to create only 1024 FN and 6750 DATA attributes.
  • Can the slack space in the MFT be used to hide data? Looks like it can be, check FragFS.

Overall I felt my understanding of file systems was getting better. It was also surprising that there were limited resources to learn how to write templates in 010 Editor. So I made a YouTube tutorial to help people get started. It has 5 videos each around 5–7 minutes.

Back to Index

Step 4— Create your own tool

Now that you have understood the artifact, think about its usefulness. What do you find interesting? What do you think would be most relevant to extract/aggregate? An easy way to try it out is to write your own tool.

“Analysts don’t need to code!”. Yes, I hear you. But I am not talking about balancing a binary tree or solving LeetCode Medium in 20 minutes. It’s often enough to open a file, read some data, parse and print it. A language like, Python is easy to learn and you could be writing simple programs in one or two weeks. Socratica and Corey Schafer have some pretty good videos on YouTube to help learn Python.

My Experience (4):

I was personally interested to convert the MFT records to JSON format. That way we can easily query them using jq or put them in ELK and analyze them. I was not new to programming and was able to learn Python by trying out simple programs. Once I had the basic skills — reading binary data, and working with lists and dictionaries, I was able to write an MFT parser in Python and eventually name it mft2json. I also collected about 25 use-cases for analysis and documented them here.

Back to Index

Step 5 — Going beyond the Artifact

Artifacts do not exist in isolation. They exist in a particular context. There is a feature of the operating system that is creating/using this artifact. At this stage you have learned some skills — you can use tools, you know how to parse and template hex, you have some programming skills and most importantly, you are a lot wiser now about how forensic tools work. Use your skills to move beyond the artifact. Study the ecosystem in which it exists. This is largely a self-charted path. Your interest determines which way you go and how much ground you cover.

My Experience (5):

At this stage, I was comfortable with file systems in general and NTFS in particular. A file system typically offers features like compression, encryption, sparse files, quotas, and a lot more.

I picked out features randomly and studied how they were implemented. Some features like hardlinks and sparse files were straightforward to understand. When I started to learn about file encryption it opened up a scary new world.

However, this time, I had the necessary basics to figure things out. I learned how to:

  • Study EFS (Encrypting File System) manually and eventually write a set of scripts to make this faster.
  • Figure out the format of a couple of previously undocumented artifacts (EFS0.LOG and $EFS with DPAPI-NG).
  • Find a small bug in how TSK processed encrypted attributes.
  • Scan kernel memory and find out an object of interest.
  • Write it as a paper.

Back to Index

Conclusion

Tools and technologies are shared often. Something that is less shared often is how an individual became better at something. What are the struggles we go through? How does the process evolve? This is one such attempt. The key takeaway here is the process. I am using my work as a means to highlight the process.

In conclusion, I would say as an analyst learn to use your tools. Learn to use them well. But also make an attempt to go beyond the tools. Pick one or two artifacts, learn them, ask deeper questions, learn more, rinse, repeat. If not anything the fulfillment of exploring something is worth the effort :)

Thanks for reading!

EDIT: On the topic of writing parsers, I have found the work of msuhanov, ericzimmerman, and kacos2000 to be quite useful.

--

--