You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

6.2 KiB

Assembly

Assembly (also ASM) is, for any given hardware computing platform (ISA, basically a CPU architecture), the lowest level programming language that expresses a linear (typically unstructured) sequence of instructions -- it maps (mostly) 1:1 to machine code (the actual binary CPU instructions) and basically only differs from the actual machine code by utilizing a more human readable form (it mostly just gives human friendly nicknames to combinations of 1s and 0s). Assembly is converted by assembler into the the machine code. Assembly is similar to bytecode, but bytecode is meant to be interpreted or used as an intermediate representation in compilers while assembly represents actual native code run by hardware. In ancient times when there were no higher level languages (like C or Fortran) assembly was used to write computer programs -- nowadays most programmers no longer write in assembly (majority of zoomer "coders" probably never even touch anything close to it) because it's hard (takes a long time) and not portable, however programs written in assembly are known to be extremely fast as the programmer has absolute control over every single instruction.

Assembly is NOT a single language, it differs for every architecture, i.e. every model of CPU has potentially different architecture, understands a different machine code and hence has a different assembly; therefore assembly is not portable (i.e. the program won't generally work on a different type of CPU or under a different OS)! For this reason (and also for the fact that "assembly is hard") you shouldn't write your programs directly in assembly but rather in a bit higher level language such as C (which can be compiled to any CPU's assembly). However you should know at least the very basics of programming in assembly as a good programmer will come in contact with it sometimes, for example during hardcore optimization (many languages offer an option to embed inline assembly in specific places), debugging, reverse engineering, when writing a C compiler for a completely new platform or even when designing one's own new platform.

The most common assembly languages you'll encounter nowadays are x86 (used by most desktop CPUs) and ARM (used by most mobile CPUs) -- both are used by proprietary hardware and though an assembly language itself cannot (as of yet) be copyrighted, the associated architectures may be "protected" (restricted) e.g. by patents. RISC-V on the other hand is an "open" alternative, though not yet so wide spread. Other assembly languages include e.g. AVR (8bit CPUs used e.g. by some Arduinos) and PowerPC.

To be precise, a typical assembly language is actually more than a set of nicknames for machine code instructions, it may offer helpers such as macros (something aking the C preprocessor), pseudoinstructions (commands that look like instructions but actually translate to e.g. multiple instructions), comments, directives, named labels for jumps (as writing literal jump addresses would be extremely tedious) etc.

Assembly is extremely low level, so you get no handholding or much programming "safety" (apart from e.g. CPU operation modes), you have to do everything yourself -- you'll be dealing with things such as function call conventions, interrupts, syscalls and their conventions, memory segments, endianness, raw addresses/goto jumps, call frames etc.

Typical Assembly Language

Assembly languages are usually unstructured, i.e. there are no control structures such as if or while statements: these have to be manually implemented using labels and jump (goto) instructions. There may exist macros that mimic control structures. The typical look of an assembly program is however still a single column of instructions with arguments, one per line.

The working of the language reflects the actual hardware architecture -- most architectures are based on registers so usually there is a small number (something like 16) of registers which may be called something like R0 to R15, or A, B, C etc. Sometimes registers may even be subdivided (e.g. in x86 there is an eax 32bit register and half of it can be used as the ax 16bit register). These registers are the fastest available memory (faster than the main RAM memory) and are used to perform calculations. Some registers are general purpose and some are special: typically there will be e.g. the FLAGS register which holds various 1bit results of performed operations (e.g. overflow, zero result etc.). Some instructions may only work with some registers (e.g. there may be kind of a "pointer" register used to hold addresses along with instructions that work with this register, which is meant to implement arrays). Values can be moved between registers and the main memory.

Instructions are typically written as three-letter abbreviations and follow some unwritten naming conventions so that different assembly languages at least look similar. Common instructions found in most assembly languages are for example:

  • MOV (move): move a number between registers and/or memory.
  • JMP (jump): unconditional jump to far away instruction.
  • BEQ (branch if equal): jump if result of previous comparison was equality.
  • ADD (add): add two numbers.
  • NOP (no operation): do nothing (used e.g. for delays).
  • CMP (compare): compare two numbers and set relevant flags (typically for a subsequent conditional jump).

Assembly languages may offer simple helpers such as macros.

Example

TODO: some C code and how it translates to different assembly langs

#include <stdio.h>

char incrementDigit(char d)
{
  return
    d >= '0' && d < '9' ?
    d + 1 :
    '?';
}

int main(void)
{
  char c = getchar();
  putchar(incrementDigit(c));
  return 0;
}