This commit is contained in:
Miloslav Ciz 2024-04-19 21:07:22 +02:00
parent 8eee949aae
commit 66aa9c99f4
14 changed files with 1838 additions and 1814 deletions

View file

@ -1,18 +1,16 @@
# Assembly
Assembly (also ASM) is, for any given hardware computing platform ([ISA](isa.md), basically a [CPU](cpu.md) architecture), the lowest level [programming language](programming_language.md) that expresses typically a linear, unstructured sequence of CPU instructions -- it maps (mostly) 1:1 to [machine code](machine_code.md) (the actual [binary](binary.md) CPU instructions) and basically only differs from the actual machine code by utilizing a more human readable form (it gives human friendly nicknames, or mnemonics, to different combinations of 1s and 0s). Assembly is converted by [assembler](assembler.md) into the the machine code, something akin a computer equivalent of the "[DNA](dna.md)", the lowest level instructions for the computer. Assembly is similar to [bytecode](bytecode.md), but bytecode is meant to be [interpreted](interpreter.md) or used as an intermediate representation in [compilers](compiler.md) while assembly represents actual native code run by hardware. In ancient times when there were no higher level languages (like [C](c.md) or [Fortran](fortran.md)) assembly was used to write computer programs -- nowadays most programmers no longer write in assembly (majority of zoomer "[coders](coding.md)" probably never even touch anything close to it) because it's hard (takes a long time) and not [portable](portability.md), however programs written in assembly are known to be extremely fast as the programmer has absolute control over every single instruction (of course that is not to say you can't fuck up and write a slow program in assembly).
Assembly (also ASM) is, for any given hardware computing platform ([ISA](isa.md), basically a [CPU](cpu.md) architecture), the lowest level [programming language](programming_language.md) that expresses typically a linear, unstructured (i.e. without nesting blocks of code) sequence of CPU instructions -- it maps (mostly) 1:1 to [machine code](machine_code.md) (the actual [binary](binary.md) CPU instructions) and basically only differs from the actual machine code by utilizing a more human readable form (it gives human friendly nicknames, or mnemonics, to different combinations of 1s and 0s). Assembly is converted by [assembler](assembler.md) into the the machine code, something akin a computer equivalent of the "[DNA](dna.md)", the lowest level instructions for the computer. Assembly is similar to [bytecode](bytecode.md), but bytecode is meant to be [interpreted](interpreter.md) or used as an intermediate representation in [compilers](compiler.md) while assembly represents actual native code run by hardware. In ancient times when there were no higher level languages (like [C](c.md) or [Fortran](fortran.md)) assembly was used to write computer programs -- nowadays most programmers no longer write in assembly (majority of [zoomer](zoomer.md) "[coders](coding.md)" probably never even touch anything close to it) because it's hard (takes a long time) and not [portable](portability.md), however programs written in assembly are known to be extremely fast as the programmer has absolute control over every single instruction (of course that is not to say you can't fuck up and write a slow program in assembly).
**Assembly is NOT a single language**, it differs for every architecture, i.e. every model of CPU has potentially different architecture, understands a different machine code and hence has a different assembly (though there are some standardized families of assembly like x86 that work on wide range of CPUs); therefore **assembly is not [portable](portability.md)** (i.e. the program won't generally work on a different type of CPU or under a different [OS](os.md))! And even the same kind of assembly language may have several different [syntax](syntax.md) formats which may differ in comment style, order of writing arguments and even instruction abbreviations (e.g. x86 can be written in [Intel](intel.md) and [AT&T](at_and_t.md) syntax). For the reason of non-portability (and also for the fact that "assembly is hard") you shouldn't write your programs directly in assembly but rather in a bit higher level language such as [C](c.md) (which can be compiled to any CPU's assembly). However you should know at least the very basics of programming in assembly as a good programmer will come in contact with it sometimes, for example during hardcore [optimization](optimization.md) (many languages offer an option to embed inline assembly in specific places), debugging, reverse engineering, when writing a C compiler for a completely new platform or even when designing one's own new platform. **You should write at least one program in assembly** -- it gives you a great insight into how a computer actually works and you'll get a better idea of how your high level programs translate to machine code (which may help you write better [optimized](optimization.md) code) and WHY your high level language looks the way it does.
**Assembly is NOT a single language**, it differs for every architecture, i.e. every model of CPU has potentially different architecture, understands a different machine code and hence has a different assembly (though there are some standardized families of assembly like x86 that work on wide range of CPUs); therefore **assembly is not [portable](portability.md)** (i.e. the program won't generally work on a different type of CPU or under a different [OS](os.md))! And even the same kind of assembly language may have several different [syntax](syntax.md) formats that also create basically slightly different languages which differ e.g. in comment style, order of writing arguments and even instruction abbreviations (e.g. x86 can be written in [Intel](intel.md) or [AT&T](at_and_t.md) syntax). For the reason of non-portability (and also for the fact that "assembly is hard") you mostly shouldn't write your programs directly in assembly but rather in a bit higher level language such as [C](c.md) (which can be compiled to any CPU's assembly). However you should know at least the very basics of programming in assembly as a good programmer will come in contact with it sometimes, for example during hardcore [optimization](optimization.md) (many languages offer an option to embed inline assembly in specific places), debugging, reverse engineering, when writing a C compiler for a completely new platform or even when designing one's own new platform (you'll probably want to make your compiler generate native assembly, so you have to understand it). **You should write at least one program in assembly** -- it gives you a great insight into how a computer actually works and you'll get a better idea of how your high level programs translate to machine code (which may help you write better [optimized](optimization.md) code) and WHY your high level language looks the way it does.
**OK, but why doesn't anyone make a portable assembly?** Well, people do, they just usually call it a [bytecode](bytecode.md) -- take a look at that. [C](c.md) is portable and low level, so it is often called a "portable assembly", though it still IS significantly higher in abstraction and won't usually give you the real assembly vibes. [Forth](forth.md) may also be seen as close to such concept. ACTUALLY [Dusk OS](duskos.md) has something yet closer, called [Harmonized Assembly Layer](hal.md) (see https://git.sr.ht/~vdupras/duskos/tree/master/fs/doc/hal.txt). [Web assembly](web_assembly.md) would also probably fit the definition.
The most common assembly languages you'll encounter nowadays are **[x86](x86.md)** (used by most desktop [CPUs](cpu.md)) and **[ARM](arm.md)** (used by most mobile CPUs) -- both are used by [proprietary](proprietary.md) hardware and though an assembly language itself cannot (as of yet) be [copyrighted](copyright.md), the associated architectures may be "protected" (restricted) e.g. by [patents](patent.md) (see also [IP cores](ip_core.md)). **[RISC-V](risc_v.md)** on the other hand is an "[open](open.md)" alternative, though not yet so wide spread. Other assembly languages include e.g. [AVR](avr.md) (8bit CPUs used e.g. by some [Arduinos](arduino.md)) and [PowerPC](ppc.md).
To be precise, a typical assembly language is actually more than a set of nicknames for machine code instructions, it may offer helpers such as [macros](macro.md) (something akin the C preprocessor), pseudoinstructions (commands that look like instructions but actually translate to e.g. multiple instructions), [comments](comment.md), directives, named labels for jumps (as writing literal jump addresses would be extremely tedious) etc.
To be precise, a typical assembly language is actually more than a set of nicknames for machine code instructions, it may offer helpers such as [macros](macro.md) (something akin the C preprocessor), pseudoinstructions (commands that look like instructions but actually translate to e.g. multiple instructions), [comments](comment.md), directives, automatic inference of opcode from operands, named labels for jumps (as writing literal jump addresses would be extremely tedious) etc. I.e. it is still much easier to write in assembly than to write pure machine code even if you knew all opcodes from memory. For the same reason remember that just replacing assembly mnemonics with binary machine code instructions is not yet enough to make an executable program! More things have to be done such as [linking](linking.md) [libraries](library.md) and converting the result to some [executable format](executable_format.md) such as [elf](elf.md) which contains things like header with metainformation about the program etc.
Assembly is extremely low level, so you get no handholding or much programming "safety" (apart from e.g. CPU operation modes), you have to do everything yourself -- you'll be dealing with things such as function [call conventions](call_convention.md), [interrupts](interrupt.md), [syscalls](syscall.md) and their conventions, counting CPU cycles of individual instructions, looking up exact hexadecimal memory addresses, defining memory segments, dealing with [endianness](endianness.md), raw [goto](goto.md) jumps, [call frames](call_frame.md) etc.
Note that just replacing assembly mnemonics with binary machine code instructions is not yet enough to make an executable program! More things have to be done such as [linking](linking.md) [libraries](library.md) and converting the result to some [executable format](executable_format.md) such as [elf](elf.md) which contains things like header with metainformation about the program etc.
**How will programming in assembly differ from your mainstream high-level programming?** Quite a lot, assembly is extremely low level, so you get no handholding or much programming "safety" (apart from e.g. CPU operation modes), you have to do everything yourself -- you'll be dealing with things such as function [call conventions](call_convention.md), [interrupts](interrupt.md), [syscalls](syscall.md) and their conventions, counting CPU cycles of individual instructions, looking up exact hexadecimal memory addresses, opcodes, defining memory segments, dealing with [endianness](endianness.md), raw [goto](goto.md) jumps, [call frames](call_frame.md) etc. You have no branching (if-then-else), loops or functions, you make these yourself with gotos. You can't write expressions like `(a + 3 * b) / 10`, no, you have to write every step of how to evaluate this expression using registers, i.e. something like: load *a* to register *A*, load *b* to register *B*, multiply *B* by 3, add register *B* to *A*, divide *A* by 10. You don't have any [data types](data_type.md), you have to know yourself that your variables really represent signed values so when you're dividing, you have to use signed divide instruction instead of unsigned divide -- if you mess this up, no one will tell you, your program simply won't work. And so on.
## Typical Assembly Language
@ -40,13 +38,15 @@ Instructions are typically written as three-letter abbreviations and follow some
## How To
*For specific assembly language how tos see their own articles: [x86](x86.md), [Arm](arm.md) etc.*
On [Unices](unix.md) the [objdump](objdump.md) utility from GNU binutils can be used to **disassemble** compiled programs, i.e view the instructions of the program in assembly (other tools like ndisasm can also be used). Use it e.g. as:
```
objdump -d my_compiled_program
```
Let's now write a simple Unix program in [x86](x86.md) assembly (AT&T syntax). Write the following source code into a file named e.g. `program.s`:
Let's now write a simple Unix program in 64bit [x86](x86.md) assembly -- we'll be using AT&T syntax that's used by [GNU](gnu.md). Write the following source code into a file named e.g. `program.s`:
```
.global _start # include the symbol in object file
@ -132,7 +132,7 @@ We will now compile it to different assembly languages (you can do this e.g. wit
{ Also not sure the comments are 100% correct, let me know if not. ~drummyfish }
The [x86](x86.md) assembly may look like this:
The [x86](x86.md) assembly may look like this (to understand the weird juggling of values between registers see [calling conventions](calling_convention.md)):
```
incrementDigit: