This is a short post covering the topic of manual data carving on Linux systems. Following a brief explanation of the topic itself, I focus on using a command-line tool called xxd to manually specify the start and end offsets of the content I wish to ‘carve’ out of the target data stream.
Data carving is sometimes referred to simply as ‘carving’ or, depending on the content being extracted; ‘file carving’. This is a very interesting topic that every forensic examiner should understand and is in fact one of the first disciplines I learned when I began my journey into the field of digital forensics.
In its most basic form, the first step in the file carving process would be to search through a stream of data for a file signature, often referred to as a ‘header’ or ‘magic value’. The next step in this process would be determining, or assuming, the logical end point of a suspected file, usually by looking for the existence of an accompanying ‘footer’ value. For example, when you view the raw hexadecimal content of a typical JPEG image file, you will often find that the start value (file header) is 0xFFD8FFE0 and the end value (file footer) is 0xFFD9. Therefore, the final step in this process is to extract the raw data based on the header/footer values and hopefully ‘carve’ out a tangible file from the data stream. Obviously it will not always be that easy as you will have to take other variables into account, such as; file fragmentation, file format differences, allocation status and signature manipulation (anti-forensics).
On Linux systems, you can run a command called file to identify the type of a given file, based on its signature (magic test). This command works by reading information from a special file on your system and cross-referencing the file signature against it. The standard file command should be installed by default on most Linux distributions and identifies file types using the data contained in the following file:
In Figure 1 below, I demonstrate usage of the file command against a generic ‘.jpg’ image file:
As we can see, the file is identified by the command as being “JPEG image data”. Using grep to search for this string in the aforementioned ‘magic’ file, we can see that file associates the hexadecimal value ‘0xffd8’ with JPEG image data. Finally, I confirm this by using the xxd hex editor tool to view the first line of the raw JPEG image data in hexadecimal format, which matches the ‘magic’ file value ‘0xffd8’. I explain more about the xxd tool and demonstrate how it can be used for manual file carving below (see Manual File Carving section).
File Carving Tools
File carving can prove to be an invaluable technique in both the field of digital forensics and data recovery. The process of file carving does not rely on metadata and instead operates on the data unit (block) layer (and/or the file system layer) to identify file contents to extract. As a result; file carving can be utilised to recover data from storage devices with corrupted metadata which can prove very useful to data recovery specialists. On the other hand, forensic examiners can employ file carving methods to recover data of evidentiary value, such as deleted files.
There are a multitude of open-source data carving tools available to specialists and I highly recommend consulting the forensic wiki page for a broad list of popular carving tools. That being said, there are two in particular that I use regularly; scalpel and foremost. These two carving tools are very powerful, easily accessible (open-source) and are used on the command-line. There are a plethora of tutorials and guides available online covering both the installation and extensive usage of these tools, so I shall not cover them in-depth here.
It is also worth noting that most mainstream commercial forensic software such as FTK, EnCase and X-Ways, have their own built-in file carving and file signature analysis capabilities.
Manual File Carving
Although I would always recommend to utilise specialised carving tools as mentioned in the previous section, it is possible to perform the file carving process manually. To achieve this, I will be using the xxd hex editor tool on Linux, which is provided as part of the ‘vim-common’ (or equivalent) package across many standard Linux distributions.
The manual carving demonstration using xxd will be performed against a test image containing non-fragmented graphics files. The test image is called ‘L0_Graphic.dd’ and was downloaded from the CFReDS (Computer Forensic Reference Data Sets) Project website here.
The methodology I shall be using to manually carve data from the test image is as follows:
- Identify a graphics file type (e.g. JPEG, PNG)
- Locate the associated header/footer in test image
- Note the start and end offsets of the file content
- Determine the length of the file contents
- Use xxd to extract the file
Because we know the test image contains graphics files, I will search for JPEG files using their header/footer values as discussed in the Data Carving section above. In addition, JPEG and other image file types are very commonly searched for in forensic examinations which involve file carving methods. Firstly I will look for the hexadecimal value 0xFFD8FFE0, indicating the start of a JPEG file, as well as the value 0xFFD9, indicating the end of a JPEG file:
Firstly, I used xxd in upper-case mode (-u) to dump the raw hexadecimal contents of the test image to STDOUT. I then filtered the resulting output using an awk command with flags to only show the contents between the JPEG header value and its accompanying footer value. The syntax for this particular awk command is fairly simple and you can modify it as you please with other hexadecimal values in the format ‘XXXX XXXX’:
From this output, we can also extrapolate the length of the file in bytes by using the offsets of the header and footer values, which are shown to be:
Starting Offset: 0x1bc8c00
Ending Offset: 0x1bd792b
Converting these values into decimal using the Linux command-line will give the length of the file as follows (note that we add 1 to the final value to account for the byte lost using this rudimentary method):
Now we know that the length of the JPEG file is 60716 bytes, we can use the xxd tool to extract the file from the test image as seen in Figure 4 below:
xxd -p -s <START_OFFSET> -l <LENGTH> FILE > OUTPUT
This command specifies that we want the output to be in a continuous plain hexdump style (-p) and to start from the specified offset (-s) of the JPEG file we found in Figure 2. Then the length of the file (-l) found in Figure 3 is specified as the last option. Finally, the content of STDOUT is then redirected to a temporary output file with the (>) operator.
xxd -r -p <OUTPUT> <CARVED_FILE>
The final xxd command uses a reverse operation (-r) to convert the hexdump style output into binary format, while (-p) reads the plain hexdump without line number information and column layouts. The file produced as a result of this command should be a tangible file in the format you initially identified. In Figure 4, I ran the file command (see Figure 1) against the carved file to ensure the signature was correct.
During my research into manual data carving on Linux, I created a custom Bash script, aptly named O-Carve (Offset-Carve), to perform the manual carving process described in the previous section. It is pretty simple; you run the command against a file you want to carve data from, specify the start and end offsets and the script will extract the contents using xxd. You can find the script on my GitLab page here.
Here is a quick demonstration of O-Carve on the command-line, being used against the same test image from the previous section:
Because O-Carve works using the raw hexadecimal offset values to extract data, it could theoretically be used to carve data from any file you run it against, so long as the offset values are correct.
The goal of this post was to demonstrate how manual file carving (header/footer) could be achieved on Linux systems with relatively standard command-line tools. I fully recognise that there are a few other variables to consider such as fragmentation, file format differences, allocation status, anti-forensics, etc. The examples shown in this post were simplistic for the purposes of demonstration and not reflective of a real-world scenario.
I also do not recommend manually carving files in a professional environment, especially when tools like scalpel are much more efficient. However, it is useful to understand the carving process from a low-level perspective and how it can be performed manually should the need ever arise.
Thank you for reading and I hope this post has taught you something new!
While doing extra research for the content covered in this post, I found these online resources to be particularly insightful. I have also added a brief description of each one for convenience:
CFReDS – A list of data sets useful for forensic tool testing.
ForensicWiki – The official forensic wiki page on file carving.
SANS – An excellent paper from SANS about data carving concepts.
SANS – Presentation from SANS about advanced file carving.
Signatures – A comprehensive list of known file signatures by Gary Kessler.
TrID – A more robust file identification tool, an alternative to the standard file command.