Nowadays, application of programming in assembly language is very small. Writing production code in assembly may be found in the most demanding tasks in embedded. Even in modern firmware (coreboot, EDK2) most of the code is written in C. Honesly, that’s quite undestandable, assembly code isn’t easy to read nor write and some state that it’s no longer needed to be known. In my opinion, that’s far from the truth.
Not only it’s heavily used in reverse engineering, very important especially in software/firmware security, its design impact more or less code in virtualy any programming language. Let’s set an example. Consider those two lines of C code:
Even C’s syntax (considered low-level nowadays) create an impression
that they are more or less the same, except that
char* and the range:
buffer is freed automatically at
the and of current block. However, someone with some knowledge of internls
know that first line has virtually no overhead (because all local variables
may be allocated and freed at the same time, it takes 2 instructions), while
second one involves kernel activity to map needed space, needs to update heap
structures, etc. Except that
malloc() often happen to introduce memory leaks,
on the other hand, the first method may cause stack overflow in some cases.
This is just one of many examples where low-level details (assembly, network stack, devices, etc.) condition higher level code execution. That’s why every senior software developer should know them and, of course, every security specialist. Except that, it’s essentional to anyone interested in Reverse Enineering.
This article is meant to preasent the most important ideas behind x86 assembly just to show how does it work and what are its limitations. If you want to code in assembly or read disassembly, I recommend to look at [x86 instruction set] (https://c9x.me/x86/) and tutorials like this. If you are interested in advanced optimization you’d have to dive into CPU model specific documentation.
Basics of x86 CPU
We have CPU — computation unit and RAM for temporary data storage. We are usually separated from all other devices but we use OS interface, which reuses concepts we use for those two. Generally speaking, everything that happens in a computer is series of passing data between components and transforming them in between.
For example, when we display a web page we can think of it:
- we have given an address in memory.
- transform it into request conforming standard.
- pass a request to the networking device
- accept response
- transform text into an image
- pass the image to the graphic card
The job of application is just to transform the data and inform OS where is the product, what it is and where to pass it. RAM is a medium for that exchange as well as storage for middle products. RAM also store instructions how to perform those transformations. Assembly language is a textual representation of codes understood by CPU.
We can think of RAM as a function. Every byte in memory has its ordinal number. We use it to read or change its value. Bytes are usually accessed in groups of 4(32bit) or 8(64-bit).
X86 CPU never uses two memory locations at the same time. That’s why it has own memory called registers. In modern x86 architecture, there are 16 64-bit general purpose registers called: RAX, RBX, RCX, RDX, RDI (destination index), RSI (source index), RBP (base pointer), RSP (stack pointer), R8-R15. Despite meaningful names of some, only RSP preserved special meaning. This naming was important when it was usual to code directly in assembly). “R” letter in beginning denote its 64-bits as during 40 years of x86 evolution registers have grown from 16-bit (AX, BX, etc), to 32-bit (EAX, EBX, etc). Among special registers, we have RIP (instruction pointer) stores pointer (ordinal number in memory) of the next instruction to run and EFLAGS which register special situations (like zeroing register or overflow while adding). Those special registers are never accessed directly.
To move data between registers and memory we use
which loads constant value 0xff to RAX register. This form of assembly language is called AT&T and mainly widespread in GNU world. Other popular is Intel Syntax. The most important difference is argument order and lack of sigils:
In this document, I use AT&T syntax. Other variants are:
Note that labels in the machine code are just constants. Labels are for programmers convenience. Remembering addresses for every little thing would be hard, but it’s not the only reason. In modern OS controlled code must not access any address without OSs permission. Labels mark places allocated at program loading. It has initial value:
For pointer loading, there is other set instruction:
(Load Effective Address):
There are many instructions for data transformations. Among the most popular:
There are many more of them, but most common of them follow the same pattern. That’s why it’s quite easy to automatically trace value changes. Some instructions have implicit parameters, however still, we need just encode exception. Example:
RDX:RAX means 128-bit value with higher 64-bits in
and other in
RAX. Such a solution let us never lose data
due to overflow. On the other hand, cause a pitfall of
instruction — if the operand is not enough to make result 64-bit
or operand is 0 — CPU exception is issued (mentioned later).
There are also special instructions and registers for floating point arithmetics, for matrix operations, and some reserved ones only for OS/firmware code, among others to communicate with other devices. And configure protection mechanisms.
Of course, we can execute an instruction not in an order using jumps.
Most of the jumps are relative to current RIP position. Thanks to this OS can load our program at any point in memory. Similarly, once compiled function can be placed at any point of program binary.
will disassemble as:
Disassembly shows whole address, but when we look at the machine codes, jump takes only two bytes so it can’t be absolute address. 0xEB encodes relative jump and another byte is 8-bit signed offset coded so that highest bit means -0x81 instead of 0x80 so 0xfa = -0x80 + 0x8a = -0x06, which is length of both instructions.
We can also make conditional jumps.
EFLAGS register is
used for that. For example, if we call
except for substracting
RAX, specific bit of
be set to 1 if %rax will become 0 (ie. was 5 in the first
place) and other if it becomes negative. There are
instructions that make jump according to
CMP instruction which sets
but doesn’t store the result. Similarly,
without storing the result (usually used for bit fields).
Note that, it’s totally valid to put many conditional jumps
one by one, because they don’t affect
graphic from here
For very temporary storage of values, there is special memory
range that implements stack structure.
RSP register points
at last pushed value. There are two special instructions for
There is no popping in memory. Of course, they are faster and smaller than add/sub + mov combination. As you can see, the stack is growing backwards. There is no standard way to determine boundaries for the stack.
As RSP is general purpose register, there’s nothing wrong with using it in normal operations. In fact, it’s how local variables are compiled in C (unless they are in register).
BTW. stack overflow is a kind of attack that exploits stack so that stack overlaps with other variable. Originally it could overwrite code too, but modern OSs prevent writing code section and executing data section. Note that on 32-bit OSs stack cells are only 4-bytes long.
For calling functions there are 2 other commands:
CALL works just like
JMP, but pushes
RIP first. At the
end of the function we put
RET which simply pops that value
back, so that execution continues after last
CALL. Of course,
if you don’t change
RSP value to the initial value,
(%rsp) anyway, so in most cases, it would cause a crash.
The stack is also used to pass function parameters. In 32-bit
architecture all of them are put on the stack (the first argument
pushed as last). Depending on convention caller al callee
was responsible for freeing parameters. That’s why there is
such variant of
RET with a parameter which indicates how much
would be added to
RSP after popping to
The original purpose of
EBP was to store
ESP value before
allocating local variables. So that you can allocate
variables in the middle of function not caring how much
because you would use
EBP, that’s why Base Pointer. In the
end, you would just reset
were (and still are) instructions for that:
LEAVE. However as those are clearly coding oriented features
it’s no longer convention in 64-bit architecture, but still
may be found. For instance,
gcc without optimization still
The reason why I grep out lines starting with a dot is additional directives which are information for compiler rather than actual instructions. ‘l’ and ‘q’ at the end of instructions marks operand size. It is required only one constant->mem write is performed (because there’s no way to deduce it).
In 64-bit architecture convention changed a little because
first 5 parameters (except structures bigger than 8 bytes)
are passed through registers:
R9. As you can see on above code, the return value is put
into %rax register (unless it’s too big), but as the main return
EAX register is actually used.
RBX, RBP, and R12-R15 are considered callee-save, which means, that all functions should provide that their values will be the same after returning.
Interrupts and syscalls
Very similar thing are interrupts — they are also kind of
functions implemented by OS or boot firmware, but they are
usually called by hardware. When CPU get such interrupt,
normal execution is stopped and restored when the interrupt is
handled. Also, CPU itself can generate the interrupt, that’s how
CPU exceptions work (they are issued when some illegal
instruction is called). There is also
INT instruction to
generate the interrupt.
For a long time, this feature was used to provide runtime
services in BIOS and DOS. In *nix 32-bit OSs (Linux, *BSD)
it’s still used:
int 0x80. EAX (or it’s part) was
typically specifying system function and other registers
contained parameters. In 64-bit architecture, there is
syscall instruction instead, which works very similar.
For example for
However, those calls are usually called using C wrapper calls so the most likely place to find them are shared libraries.
This brief explanation is probably not enough to code in
assembly language but will let you understand most of
the disassembly of userspace programs. As modern programs make
much use of shared libraries, those calls used most of the
time. The good thing is that unless you deal with
OS/firmware calls you don’t need to care about multitasking,
caching etc. You will probably face strange constructs
call (%rip), which doesn’t any make functional sense but
turns out to help CPU execute code faster. Another good news
is that userspace program is written as though it was only
processed running on the machine which simplifies it a lot.
Anyway this should give you a good start to understand most of assembly code (assuming that you use instruction reference). If you deal with obfuscated code, you will proabably need some help from dedicated software like IDA.
We are open to help you if some of presented information is unclear or you are interested in more detail. Please let us know.