Asmtut 1: true – magnushoff.com

This was originally posted on Google+, which has now been shut down. It was helpfully converted to Markdown by Robert Jacobson after which I adjusted it for reposting here. I have dated it at its original posting date, but it was posted here on 2019-09-07.

In an effort to educate the dunces Knut and Jon who apparently never programmed in assembly, I am going to post a step by step instruction on making a snake game in x86-64/amd64 assembly on a modern operating system. Here. As if this were some kind of a blog. I am learning x86-64 assembly as I go, which makes this even more of a blog-like experience.

The modern operating system I chose is OS X 10.6. The same assembly might work directly in BSD and should work with only minor changes in Linux. The build rules will require additional changes :)

The goal is to make an interactive realtime snake game in the terminal. Graphics was a lot easier in the old days when you would just write directly to the video memory, but that is unavailable now and I would rather make this more contemporary than making it more graphical but limit it to Dosbox.

For programming assembly we need an assembler. GCC has got one, but it uses AT&T syntax for no good reason, which is harder to read and write than Intel syntax. Therefore, we choose NASM, which implements Intel syntax. Install the nasm binary somewhere in your PATH. You might already have a /usr/bin/nasm from Xcode which is too old to support x86-64, so be sure not to confuse that one with the one you just installed.

Step 1: Assemble, link and execute

In this installment, we are implementing the true command line utility. It executes and returns 0, indicating success. An equivalent C program is int main() { return 0; }. Simple, but a good place to start to check that all the tools are working and that everything is in place.

In reality, unlike the overly abstract world of C, the entry to a process does not act exactly as a function call, and the exit is not like a function return. C programs usually get compiled with trampoline functions to make all of this more convenient. We will go straight for calling the exit function with 0 as the argument.

Although, before we get to that, we need to supply an entry point to our program and export this symbol to the linker. We do the exporting first, by declaring global main at the top of our new file true.asm. Go ahead, it is safe. Now, immediately below it we write main: on a line for itself. main: is a label, and referring to this label anywhere will give us its address. This is what the linker needs. true.asm should now look like this:

global main

main:

This is actually all we need to assemble and link.

Assembling (.asm -> .o): nasm -f macho64 -o true.o true.asm

-f macho64 tells NASM to produce an object file of the Mach-O 64bit format. This should be elf64 for Linux, for example. nasm -hf gives you a list of the formats NASM supports.

Linking (.o -> executable): ld -macosx_version_min 10.6 -o true -e main true.o

-e main tells the linker that main is the label of our entry point, and the linker can find it because we have global main in our .asm file.

It should now be possible to execute ./true, and it will probably cause the nondescript error message "Bus error: 10" to appear.

Step 2: exit(0)

We are not going to call the exit function in the C runtime library, but rather the exit system call via the OS's syscall functionality. There is some information on this in /usr/include/sys/syscall.h, and in it we can see that SYS_exit has identification number 1. Nice.

Thanks to http://thexploit.com/secdev/mac-os-x-64-bit-assembly-system-calls/ I also found out that since exit is classified as a Unix call, it gets to have an identifier of [whatever's in syscall.h] + 0x02000000. That is, 0x02000001. Great. We now know how to identify the system call exit.

We also know that exit takes an argument, the value to return.

According to the ABI we should put the syscall number in the register rax and the first argument in the register rdi. Think of registers as (global-ish) variables that you don't get to name. So, in pseudocode, we want something like:

rax := 0x02000001; // Put the ID for SYS_exit into rax
rdi := 0;          // Put the desired exit status value into rdi
performTheSyscall;

This is quite easy to express in assembly:

mov rax, 0x02000001
mov rdi, 0
syscall

Now, test.asm should look something like this:

global main

main:
    mov rax, 0x02000001     ; System call number for exit = 1
    mov rdi, 0              ; Exit success = 0
    syscall                 ; Invoke the kernel

syscall is actually a dedicated assembly instruction that was introduced in the x86-64 instruction set to make calls to the operating system more snappy.

Now you should be able to assemble, link and run this proper implementation and it should act exactly like the true built-in in bash.

Exercise for the reader: Modify this to implement false ;)

OBTW TIL: Google+ doesn't offer formatting for code.

Next lesson: Asmtut 2: Hello world!

Magnus Hovland Hoff, 2012