The Joy of Low-Level Programming: Writing a Windows Game in x64 Assembly
Saturday, 28 December 2024
In a world where high-level languages dominate, writing code in Assembly might seem like a lost art. There’s little need for most developers to learn Assembly, except in specialized fields like security and reverse engineering, operating system development, or compiler engineering. However, for those who find joy in exploring the raw, low-level mechanics of programming, Assembly offers a unique and rewarding challenge—provided they bring enough patience, of course.
After many years away from Assembly, I recently found time to revisit the language—specifically Assembly for x86-64 CPUs. To challenge myself and have some fun, I decided to create a simple guessing game for Windows entirely in x64 Assembly. This project was less about necessity and more about indulging in a nostalgic dive into the roots of computing. I’m not an expert in this area, and my code may not compete with that of an expert or with the highly optimized code generated by modern compilers.
Introduction
Although 64-bit Assembly is built on the foundation of its 32-bit predecessor, it introduces new principles and conventions that make it both familiar and distinct. The expanded register set, the shift to a 64-bit address space, and changes in calling conventions are just a few examples of how x64 Assembly pushes the boundaries of what its 32-bit counterpart could achieve.
This post is not an Assembly tutorial, nor will I dive deeply into the language. Therefore, any prior familiarity with Assembly will be highly beneficial. If you're serious about learning x64 Assembly, I’ll include a few valuable resources at the end of this post.
The Development Environment
To compile the Assembly source files into an executable, we need an assembler that creates object files from source code files, and a linker to generate the final executable from the intermediate object files. For this project, we are going to use NASM and the Microsoft linker. The only external dependency is kernel32.lib
. To set up the Microsoft linker and kernel32.lib, you need to install the Visual Studio C++ Build Tools and the Windows SDK using the Visual Studio installer.
x64 Assembly Important Concepts
Calling Convention
The main difference between x86-64 and x86 Assembly lies in what is known as the calling convention
. Simply put, a calling convention defines how functions interact at the machine-code level. A calling convention specifies:
- how the caller and callee must use CPU registers and the stack to send and receive data,
- which registers are caller-saved and which registers are callee-saved, and
- which function is responsible for cleanup.
These rules are crucial to ensure your code can link to existing libraries when compiled into a binary. While x86 offers several different calling conventions, x86-64 Assembly simplifies this by standardizing on a single calling convention. Check the resources at the end to learn more.
The Stack
When writing code in x64 Assembly, there are a few important considerations when working with the stack.
First, Windows x64 requires the stack pointer (rsp)
to be aligned to a 16-byte boundary
at the point of a function call. This means the caller
is responsible for making the stack aligned to a 16-byte boundary before making a function call. The callee
, on the other hand, assumes the stack is misaligned by 8 bytes (due to the call instruction
automatically pushing the return address) and must explicitly realign the stack to a 16-byte boundary if it uses local variables or makes further function calls.
Function Prologue and Epilog
In x86 Assembly, you often see a typical pattern at the beginning and end of a function to adjust the stack frame:
MyFunction:
push ebp ; Save old base pointer
mov ebp, esp ; Set up frame pointer
; Function body
mov esp, ebp ; Restore stack pointer
pop ebp ; Restore base pointer
ret
While this approach is still valid, x64 Assembly takes a different approach to enhance performance. The base pointer register (rbp)
is no longer used, and the exact required size of the stack frame is calculated at the function entry point to move the stack pointer down (remember that the stack grows from high memory locations to low memory locations) to create room for the function's operations. Before the function returns, the stack pointer is moved back up to its original position. The stack size should remain unchanged (e.g., using push
or pop
) inside the function body.
MyFunction:
sub rsp, <required_size> ; Allocate local variables,
; shadow space, ...
; Function body
add rsp, <required_size> ; Restore stack pointer
ret
What do they all mean in relation to our program?
Stack alignment is crucial when calling external (Windows API) functions. If a pure Assembly function calls only other pure Assembly functions, ignoring stack alignments doesn't bother anyone. But when there is a call to a Windows API function or any external C or C++ function built for Windows x64, the stack pointer register (rsp) must be 16-byte aligned - divisible by 16 - when calling the external function, otherwise, the program is susceptible to crash due to the access violation exception. Bear in mind that every invocation of the "call" instruction implicitly pushes the return address onto the stack and you need to consider it when try aligning rsp.
Function prologue on Windows x64 consists of a precise calculation of the stack frame size based on the function needs (local variables, stack parameters, shadow area, etc.) and to make it 16-byte aligned the following rule can be used:
[ Required stack space + 8 bytes (return address) + padding ] must be divisible by 16
Using the rule above, you can calculate how much the padding should be. You need to make sure that rsp is divisible by 16 after the function prologue.
It is still possible to save the base pointer register (rbp) by pushing it onto the stack in the function prologue but it is mainly used for debugging purposes to ease stack frame navigation. In this case, you also have to take pushed rbp into consideration while aligning the stack pointer.
In addition to 16-byte alignment, be careful to build the correct stack layout (shadow space, room for stack-allocated parameters,...) when calling an external function.
And bear in mind that the stack address grows downwards and the stack pointer decreases when pushing something onto the stack.
What is the game about?
The game is a simple guessing game where the computer chooses a random number between 1 and 100,000, and you, as the player, are given 20 tries to guess the number. After each guess, the computer will tell you whether your guess was higher or lower than the chosen number.
Technically, for a range of 100,000 numbers you would need a maximum of 17 guesses if you followed a binary search approach, however, I decided to round it up to 20.
In the next part, I am going to talk about the program itself and digging into the code.