Parsing PE File Headers with C++

# Context

In this lab I'm writing a simple Portable Executable (PE) file header parser for 32bit binaries, using C++ as the programming language of choice. The lab was inspired by the techniques such as reflective DLL injection and process hollowing which both deal with various parts of the PE files.
The purpose of this lab is two-fold:
Get a bit more comfortable with C++
Get a better understanding of PE file headers
This lab is going to be light on text as most of the relevant info is shown in the code section, but I will touch on the piece that confused me the most in this endevour - parsing the DLL imports.
Below is a graphic showing the end result - a program that parses a 32bit cmd.exe executable and spits out various pieces of information from various PE headers as well as DLL imports.
The code is not able to parse 64bit executables correctly. This will not be fixed.
The code was not meant to be clean and well organised - it was not the goal of this lab
The parser is not full-blown - it only goes through the main headers and DLL imports, so no exports, relocations or resources will be touched.

# The Big Hurdle

For the most part of this lab, header parsing was going smoothly, until it was time to parse the DLL imports. The bit below is the final solution that worked for parsing out the DLL names and their functions:
Parsing out imported DLLs and their functions requires a good number of offset calculations that initially may seem confusing and this is the bit I will try to put down in words in these notes.
So how do we go about extracting the DLL names the binary imports and the function names that DLL exports?

# Definitions

First off, we need to define some terms:
Section - a PE header that defines various sections contained in the PE. Some sections are .text - this is where the assembly code is stored, .data contains global and static local variables, etc.
File item - part of a PE file, for example a code section .text
Relative Virtual Address (RVA) - address of some file item in memory minus the base address of the image.
Virtual Address (VA) - virtual memory address of some file item in memory without the image base address subtracted.
For example, if we have a VA 0x01004000 and we know that the image base address is 0x0100000, the RVA is 0x01004000 - 0x01000000 = 0x0004000.
Data Directories - part of the Optional Header and contains RVAs to various tables - exports, resources and most importantly for this lab - dll imports table. It also contains size of the table.

# Calculating Offsets

If we look at the notepad.exe binary using CFF Explorer (or any other similar program) and inspect the Data Directories from under the Optional Header , we can see that the Import Table is located at RVA 0x0000A0A0 that according to CFF Explorer happens to live in the .text section:
Indeed, if we look at the Section Headers and note the values Virtual Size and Virtual Address for the .text section:
and check if the Import Directory RVA of 0x0000A0A0 falls into the range of .text section with this conditional statement in python:
1
0x000a0a0 > 0x00001000 and 0x000a0a0 < 0x00001000 + 0x0000a6fc
Copied!
...we can confirm it definitely does fall into the .text section's range:

## PIMAGE_IMPORT_DESCRIPTOR

In order to read out DLL names that this binary imports, we first need to populate a data structure called PIMAGE_IMPORT_DESCRIPTOR with revlevant data from the binary, but how do we find it?
We need to translate the Import Directory RVA to the file offset - a place in the binary file where the DLL import information is stored. The way this can be achieved is by using the following formula:
$offset = imageBase + text.RawOffset + (importDirectory.RVA - text.VA)$
where imageBase is the start address of where the binary image is loaded, text.RawOffset is the Raw Address value from the .text section, text.VA is Virtual Address value from the .text section and importDirectory.RVA is the Import Directory RVA value from Data Directories in Optional Header.
If you think about what was discussed so far and the above formula for a moment, you will realise that:
imageBase in our case is 0 since the file is not loaded to memory and we are inspecting it on the disk
import table is located in .text section of the binary. Since the binary is not loaded to disk, we need to know the file offset of the .text section in relation to the imageBase
imageBase + text.RawOffset gives us the file offset to the .text section - we need it, because remember - the import table is inside the .text section
Since importDirectory.RVA, as mentioned earlier, lives in the .text section, importDirectory.RVA - text.VA gives us the offset of the import table relative to the start of the .text section
We take the value of importDirectory.RVA - text.VA and add it to the text.RawOffset and we get the offset of the import table in the raw .text data.
Below is some simple powershell to do the calculations for us to get the file offset that we can later use for filling up the PIMAGE_IMPORT_DESCRIPTOR structure with:
PIMAGE_IMPORT_DESCRIPTOR
1
PS C:\Users\mantvydas> $fileBase = 0x0 2 PS C:\Users\mantvydas>$textRawOffset = 0x00000400
3
PS C:\Users\mantvydas> $importDirectoryRVA = 0x0000A0A0 4 PS C:\Users\mantvydas>$textVA = 0x00001000
5
PS C:\Users\mantvydas>
6
PS C:\Users\mantvydas> # this points to the start of the .text section
7
PS C:\Users\mantvydas> $rawOffsetToTextSection =$fileBase + $textRawOffset 8 PS C:\Users\mantvydas>$importDescriptor = $rawOffsetToTextSection + ($importDirectoryRVA - $textVA) 9 PS C:\Users\mantvydas> [System.Convert]::ToString($importDescriptor, 16)
10
11
// this is the file offset we are looking for for PIMAGE_IMPORT_DESCRIPTOR
12
94a0
Copied!
If we check the file offset 0x95cc, we can see we are getting close to a list of imported DLL names - note that at we can see the VERSION.dll starting to show - that is a good start:
Now more importantly, note the value highlighted at offset 0x000094ac - 7C A2 00 00 (reads A2 7C due to little indianness) - this is important. If we consider the layout of the PIMAGE_IMPORT_DESCRIPTOR structure, we can see that the fourth member of the structure (each member is a DWORD, so 4 bytes in size) is DWORD Name, which implies that 0x000094ac contains something that should be useful for us to get our first imported DLL's name:
Indeed, if we check the Import Directory of notepad.exe in CFF Explorer, we see that the 0xA27C is another RVA to the DLL name, which happens to be ADVAPI32.dll - and we will manually verify this in a moment:
If we look closer at the ADVAPI32.dll import details and compare it with the hex dump of the binary at 0x94A0, we can see that the 0000a27c is surrounded by the same info we saw in CFF Explorer for the ADVAPI32.dll:

## First DLL Name

Let's see if we can translate this Name RVA 0xA27c to the file offset using the technique we used earlier and finally get the first imported DLL name.
This time the formula we need to use is:
$offset = imageBase + text.RawOffset + (nameRVA - text.VA)$
where nameRVA is Name RVA value for ADVAPI32.dll from the Import Directory and text.VA is the Virtual Address of the .text section.
Again, some powersehell to do the RVA to file offset calculation for us:
1
# first dll name
2
$nameRVA = 0x0000A27C 3$firstDLLname = $rawOffsetToTextSection + ($nameRVA - $textVA) 4 [System.Convert]::ToString($firstDLLname, 16)
5
967c
Copied!
If we check offset 0x967c in our hex editor - success, we found our first DLL name:

## DLL Imported Functions

Now in order to get a list of imported functions from the given DLL, we need to use a structure called PIMAGE_THUNK_DATA32which looks like this:
In order to utilise the above structure, again, we need to translate an RVA of the OriginalFirstThunk member of the structure PIMAGE_IMPORT_DESCRIPTOR which in our case was pointing to 0x0000A28C:
If we use the same formula for calculating RVAs as previously and use the below Powershell to calculate the file offset, we get:
1
# first thunk
2
$firstThunk =$rawOffsetToTextSection + (0x0000A28C - $textVA) 3 [System.Convert]::ToString($firstThunk, 16)
4
5
968c
Copied!
At that offset 968c+4 (+4 because per PIMAGE_THUNK_DATA32 structure layout, the second member is called Function and this is the member we are interested in), we see a couple more values that look like RVAs - 0x0000a690 and 0x0000a6a2:
If we do a final RVA to file offset conversion for the second (we could do the same for 0x0000a690) RVA 0x0000a6a2:
1
$firstFunction =$rawOffsetToTextSection + (0x0000A6A2 - $textVA) 2 [System.Convert]::ToString($firstFunction, 16)
3
9aa2
Copied!
Finally, with the file offset 0x9aa2, we get to see a second (because we chose the offset a6a2 rather than a690) imported function for the DLL ADVAPI32. Note that the function name actually starts 2 bytes further into the file, so the file offset 9aa2 becomes 9aa2 + 2 = 9aa4 - currently I'm not sure what the reason for this is:
Cross checking the above findings with CFF Explorer's Imported DLLs parser, we can see that our calculations were correct - note the OFTs column and the values a6a2 and a690 we referred to earlier:

# Code

The below code shows how to loop through the file in its entirety to parse all the DLLs and all of their imported functions.
1
#include "stdafx.h"
2
#include "Windows.h"
3
#include <iostream>
4
5
int main(int argc, char* argv[]) {
6
const int MAX_FILEPATH = 255;
7
char fileName[MAX_FILEPATH] = {0};
8
memcpy_s(&fileName, MAX_FILEPATH, argv[1], MAX_FILEPATH);
9
HANDLE file = NULL;
10
DWORD fileSize = NULL;
11
12
LPVOID fileData = NULL;
13
14
15
16
17
IMAGE_IMPORT_DESCRIPTOR* importDescriptor = {};
18
PIMAGE_THUNK_DATA thunkData = {};
19
DWORD thunk = NULL;
20
DWORD rawOffset = NULL;
21
22
// open file
23
file = CreateFileA(fileName, GENERIC_ALL, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
24
if (file == INVALID_HANDLE_VALUE) printf("Could not read file");
25
26
// allocate heap
27
fileSize = GetFileSize(file, NULL);
28
fileData = HeapAlloc(GetProcessHeap(), 0, fileSize);
29
30
// read file bytes to memory
31
32
33
34
35
36
37
printf("\t0x%x\t\tBytes on last page of file\n", dosHeader->e_cblp);
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
// DATA_DIRECTORIES
104
printf("\n******* DATA DIRECTORIES *******\n");
105
106
107
108
109
110
// get offset to first section headeer
111
112
113
114
// get offset to the import directory RVA
115
116
117
// print section data
118
119
120
121
122
123
124
125
126
127
128
129
130
131
// save section that contains import directory table
132
133
134
}
135
sectionLocation += sectionSize;
136
}
137
138
// get file offset to import table
139
rawOffset = (DWORD)fileData + importSection->PointerToRawData;
140
141
// get pointer to import descriptor's file offset. Note that the formula for calculating file offset is: imageBaseAddress + pointerToRawDataOfTheSectionContainingRVAofInterest + (RVAofInterest - SectionContainingRVAofInterest.VirtualAddress)
142
143
144
printf("\n******* DLL IMPORTS *******\n");
145
for (; importDescriptor->Name != 0; importDescriptor++) {
146
// imported dll modules
147
printf("\t%s\n", rawOffset + (importDescriptor->Name - importSection->VirtualAddress));
148
thunk = importDescriptor->OriginalFirstThunk == 0 ? importDescriptor->FirstThunk : importDescriptor->OriginalFirstThunk;
149
thunkData = (PIMAGE_THUNK_DATA)(rawOffset + (thunk - importSection->VirtualAddress));
150
151
// dll exported functions
152
for (; thunkData->u1.AddressOfData != 0; thunkData++) {
153
//a cheap and probably non-reliable way of checking if the function is imported via its ordinal number ¯\_(ツ)_/¯
154
155
//show lower bits of the value to get the ordinal ¯\_(ツ)_/¯
156
157
} else {
158
159
}
160
}
161
}
162
163
return 0;
164
}
Copied!
perparser.exe
44KB
Binary
peparser.exe

# References

PE Format - Win32 apps
docsmsft
http://win32assembly.programminghorizon.com/pe-tut6.html
win32assembly.programminghorizon.com