less_retarded_wiki/cpu.md
2024-02-02 08:28:14 +01:00

22 KiB

CPU

WORK IN PROGRESS

Central processing unit (CPU, often just processor) is the main, most central part of a computer, the one that performs the computation by following the instructions of the main program; CPU can be seen as the computer's brain. It stands at the center of the computer design -- other parts, such as the main memory, hard disk and input/output devices like keyboard and monitor are present to serve the CPU, CPU is at the top and issues commands to everyone else. A CPU is normally composed of ALU (arithmetic logic unit, the circuit performing calculations), CU (control unit, the circuit that directs the CPU's operation), a relatively small amount of memory (e.g. its registers and cache, the main RAM memory is NOT part of a CPU!) and possibly some other parts. A specific model of CPU is characterized by its instruction set (ISA, e.g. x86 or Arm, which we mostly divide into CISC and RISC), which determines the machine code it will understand, then its transistor count (nowadays billions), operation frequency or clock rate (defining how many instructions per second it can execute, nowadays typically billions; the frequency can also be increased with overclocking), number of cores (determining how many programs it can run in parallel) and also other parameters and "features" such as amount of cache memory, possible operation modes etcetc. We also often associate the CPU with some number of bits (called e.g. word size) that's often connected to the data bus width and the CPU's native integer size, i.e. for example a 16 bit CPU will likely have 16 bit integer registers, it will see the memory as a sequence of 16 bit words etc. (note the CPU can still do higher bit operations but they'll typically have to be emulated so they'll be slower, will take more instructions etc.) -- nowadays most mainstream CPUs are 64 bit (to allow ungodly amounts of RAM), but 32 or even 16 and 8 bits is usually enough for good programs. CPU in form of a single small integrated circuit is called microprocessor. CPU is not to be confused with MCU, a small single board computer which is composed of a CPU and other parts.

CPU is meant for general purpose computations, i.e. it can execute anything reasonably fast but for some tasks, e.g. processing HD video, won't reach near optimum speed, which is why other specialized processing units such as GPUs (graphics processing unit) and sound cards exist. As a general algorithm executing unit CPU is made for executing linear programs, i.e. a series of instructions that go one after another; even though CPUs nowadays typically have multiple cores thanks to which they can run several linear programs in parallel, their level of parallelism is still low, not nearly as great as that of a GPU for example. However CPUs are good enough for most things and they are extremely fast nowadays, so a suckless/LRS program will likely choose to only rely on CPU, knowing CPU will be present in any computer and so that our program will be portable.

Designs of CPUs differ, some may aim for very high performance while other ones may prefer low power consumption or low transistor count -- remember, a more complex CPU will require more transistors and will be more expensive! Of course it will also be harder to design, debug etc., so it may be better to keep it simple when designing a CPU. For this reason many CPUs, e.g. those in embedded microcontrollers, intentionally lack cache, microcode, multiple cores or even a complex instruction pipeline.

WATCH OUT: modern mainstream CPUs (i.e. basically the desktop ones, soon probably mobile ones too) are shit, they are hugely consumerist, bloated (they literally include shit like GPUs and whole operating systems, e.g. Intel's ME runs Minix) and have built-in antifeatures such as backdoors (post 2010 basically all Intel and AMD CPUs, see Intel Management Engine and AMD PSP) that can't be disabled and that allow remote infiltration of your computer by the CPU manufacturer (on hardware level, no matter what operating system you run). You are much better off using a simple CPU if you can (older, embedded etc.).

Details

TODO: diagrams, modes, transistor count history ...

Let's take a look at how a typical CPU works. Remember that anything may differ between CPUs, you can think of doing things differently and many real world CPUs do. Also we may simplify some things here, real world CPUs are complicated as hell.

What does a CPU really do? Basically it just reads instructions from the memory (depending on specific computer architecture this may be RAM or ROM) and does what they say -- these instructions are super simple, often things like "add two numbers", "write a number to memory" and so on. The instructions themselves are just binary data in memory and their format depends on each CPU, or its instruction set (basically a very low level language it understands) -- each CPU, or rather a CPU family, may generally have a different instruction set, so a program in one instruction set can't be executed by a CPU that doesn't understand this instruction set. The whole binary program for the CPU is called machine code and machine code corresponds to assembly language (basically a textual representation of the machine code, for better readability by humans) of the CPU (or better said its instruction set). So a CPU can be seen as a hardware interpreter of specific machine code, machine code depends on the instruction set and programmer can create machine code by writing a program in assembly language (which is different for each instruction set) and then using an assembler to translate the program to machine code. Nowadays mostly two instruction sets are used: x86 and Arm, but there are also other ones, AND it's still not so simple because each instruction set gets some kind of updates and/or has some extensions that may or may not be supported by a specific CPU, so it's a bit messy. For example IA-32 and x86_64 are two different versions of the x86 ISA, one 32 bit and one 64 bit.

The CPU has some internal state (we can see it as a state machine), i.e. it has a few internal variables, called registers; these are NOT variables in RAM but rather in the CPU itself, there is only a few of them (there may be let's say 32) but they are extremely fast. What exactly these registers are, what they are called, how many bits they can hold and what their purpose is depends again on the instruction set architecture. However there are usually a few special registers, notably the program counter which holds the address of the currently executed instruction. After executing an instruction program counter is incremented so that in the nest step the next instruction will be executed, AND we can also modify program counter (sometimes directly, sometimes by specialized instructions) to jump between instruction to implement branching, loops, function calls etc.

So at the beginning (when powered on) the CPU is set to some initial state, most notably it sets its program counter to some initial value (depending on each CPU, it may be e.g. 0) so that it points to the first instruction of the program. Then it performs so called fetch, decode, execute cycle, i.e. it reads the instruction, decodes what it means and does what it says. In simpler CPUs this functionality is hard wired, however more complex CPUs are programmed in so called microcode, a code yet at the lower level than machine code, machine code execution is programmed in microcode -- microcode is something like "firmware for the CPU" (or a "CPU shader"?), it basically allows later updates and reprogramming of how the CPU internally works. However this is pretty overcomplicated and you shouldn't make CPUs like this.

A CPU works in clock cycles, i.e. it is a sequential circuit which has so called clock input; on this input voltage periodically switches between high and low (1 and 0) and each change makes the CPU perform another operation cycle. How fast the clock changes is determined by the clock frequency (nowadays usually around 3 GHz) -- the faster the frequency, the faster the CPU will compute, but the more it will also heat up (so we can't just set it up arbitrarily high, but we can overclock it a bit if we are cooling it down). WATCH OUT: one clock cycle doesn't necessarily equal one executed instruction, i.e. frequency of 1 Hz doesn't have to mean the CPU will execute 1 instruction per second because executing an instruction may take several cycles (how many depends on each instruction and also other factors). The number saying how many cycles an instruction takes is called CPI (cycles per instruction) -- CPUs try to aim for CPI 1, i.e. they try to execute 1 instruction per cycle, but they can't always do it.

One way to try to achieve CPI 1 is by optimizing the fetch, decode, execute cycle in hardware so that it's performed as fast as possible. This is typically done by utilizing an instruction pipeline -- a pipeline has several stages that work in parallel so that when one instruction is entering e.g. the decode stage, another one is already entering the fetch stage (and the previous instruction is in execute stage), i.e. we don't have to wait for an instruction to be fully processed before starting to process the next one. This is practically the same principle as that of manufacturing lines in factories; if you have a long car manufacturing pipeline, you can make a factory produce let's say one car each hour, though it is impossible to make a single car from scratch in one hour (or imagine e.g. a university producing new PhDs each year despite no one being able to actually earn PhD in a year). This is also why branching (jumps between instructions) are considered bad for program performance -- a jump to different instruction makes the CPU have to throw away its currently preprocessed instruction because that will not be executed (though CPUs again try to deal with this with so called branch prediction, but it can't work 100%). Some CPUs even have multiple pipelines, allowing for execution of multiple instructions at the same time -- however this can only be done sometimes (the latter instruction must be independent of the former, also the other pipelines may be simpler and able to only handle simple instructions).

In order for a CPU to be useful it has to be able to perform some input/output, i.e. it has to be able to retrieve data from the outside and present what it has computed. Notable ways of performing I/O are:

  • Through memory: here some parts of memory serve to pass data to the CPU and to retrieve computed results back. For example a keyboard may be mapped to memory so that when certain keys are pressed, the memory bits are set to 1 -- this way a CPU can simply read from memory and know if a key is pressed. Similarly a display may be mapped to memory so that when a CPU writes a value to this address, a pixel appears on the display. Note that his doesn't always have to PHYSICALLY pass through memory, there may be a special circuit that translate e.g. memory access in some address range to signals to hardware etc., but the CPU is using the same instructions it would use for interacting with memory.
  • Through GPIO pins: CPUs typically have pins that are reserved for general purpose input/output, i.e. we can electronically communicate through them with whatever device we physically connect to those pins. A CPU can set and read voltage to/from those pins e.g. with some special instructions. This may be convenient if we just want to e.g. light up some LED without having to somehow hook it to the main memory.
  • Interrupts: a CPU can be informed about an external event with an interrupt (see further on).

CPUs often also have a cache memory that speeds up communication with the main memory (RAM, ROM, ...), though simpler CPUs may live even without cache of course. Mainstream CPUs even have several levels of cache, called L1, L2 etc. Caches are basically transparent for the programmer, they don't have to deal with them, it's just something that makes memory access faster, however a programmer knowing how a cache works can write code so as to be friendlier to the cache and utilize it better.

Mainstream consoomer CPUs nowadays have multiple cores so that each core can basically run a separate computation in parallel. The separate cores can be seen kind of like duplicate copies of the single core CPU with some connections between them (details again depend on each model), for example cores may share the cache memory, they will be able to communicate with each other etc. Of course this doesn't just magically make the whole CPU faster, it can now only run multiple computations at once, but someone has to make programs so as to make use of this -- typical use cases are e.g. multitasking operating systems which can run different programs (or rather processes) on each core (note that multitasking can be done even with a single core by rapidly switching between the processes, but that's slower), or multithreading programming languages which may run each thread on a separate core.

Interrupts are an important concept for the CPU and for low level programming, they play a role e.g. in saving power -- high level programmers often don't know what interrupts are, to those interrupts can be likened to "event callbacks". An interrupt happens on some kind of even, for example when a key is pressed, when timer ticks, when error occurred etc. (An interrupt can also be raised by the CPU itself, this is how operating system syscalls are often implemented). What kinds of interrupts there are depends on each CPU architecture (consult your datasheet) and one can usually configure which interrupts to enable and which "callbacks" to use for them -- this is often done through so called vector table, a special area in memory that records addresses ("vectors") of routines (functions/subprograms) to be called on specified interrupts. When interrupt happens, the current program execution is paused and the CPU automatically jumps to the subroutine for handling the interrupt -- after returning from the subroutine the main program execution continues. Interrupts are contrasted with polling, i.e. manually checking some state and handling things as part of the main program, e.g. executing an infinite loop in which we repeatedly check keyboard state until some key is pressed. However polling is inefficient, it wastes power by constantly performing computation just by waiting -- interrupts on the other hand are a hard wired functionality that just performs a task when it happens without any overhead of polling. Furthermore interrupts can make programming easier (you save many condition checks and memory reads) and mainly interrupts allow CPU to go into sleep mode and so save a lot of power. When a CPU doesn't have any computation to do, it can stop itself and go into waiting state, not executing any instructions -- however interrupts still work and when something happens, the CPU jumps back in to work. This is typically what the sleep/wait function in your programming language does -- it puts the CPU to sleep and sets a timer interrupt to wake up after given amount of time. As a programmer you should know that you should call this sleep/wait function in your main program loop to relieve the CPU -- if you don't, you will notice the CPU utilization (amount of time it is performing computations) will go to 100%, it will heat up, your computer starts spinning the fans and be noisy because you don't let it rest.

There are often several modes of operation in a CPU which is typically meant for operating systems -- there will usually be some kind of privileged mode in which the CPU can do whatever it wants (this is the mode for the OS kernel) and a restricted mode in which there are restrictions, e.g. on which areas of memory can be accessed or which instructions can be used (this will be used for user program). Thanks to this a user program won't be able to crash the operating system, it will at worst crash itself. See also real mode and protected mode.

A CPU may also have integrated some coprocessors, though sometimes coprocessors are really a separate chip. Coprocessors that may be inside the CPU include e.g. the FPU (floating point unit) or encryption coprocessor. Again, this will make the CPU a lot more expensive.

TODOOOOOOO: ALU, virtual memory, IP cores, architectures (register, ...), ...

Notable CPUs

UNDER CONSTRUCTION

Here are listed some notable CPUs (or sometimes CPU families or cores).

{ I'm not so great with HW, suggest me improvements for this section please, thanks <3 ~drummyfish }

TODO: add more, mark CPUs with ME, add features like MMX, FPU, ...

CPU year bits (/a) ISA ~tr. c. tr. size freq. pins cores other notes
Intel 4004 1971 4 / 12 own 2.3 K 10 um 75O K 16 1 1st commercial microproc.
Intel 8008 1972 8 / 14 own 3.5 K 10 um 800 K 18 1
Intel 8080 1974 8 / 16 own 6 K 6 um 3 M 40 1
AMD Am9080 1975 8 / 16 own 6 K 6 um 4 M 40 1 reverse-eng. clone of i8080
MOS Technology 6502 1975 8 / 16 own 3.5 K 8 um 3 M 40 1 popular, cheap, Atari 2600, C64, ...
Zilog Z80 1976 8 / 16 own 8.5 K 4 um 10 M 40 1 popular
Intel 8086 1978 16 / 20 x86 (x86-16) 29 K 3 um 10 M 40 1 started x86 ISA
Motorola 68000 1979 32 / 24 own (CISC) 68 K 64 1 popular, e.g. Amiga, Mega Drive, ...
Intel (80)286 1982 16 / 24 x86 (x86-16) 130 K 1.5 um 25 M 68 1
Intel (80)386 1985 32 x86 (IA-32) 275 K 1 um 40 M 132 1
Intel (80)486 1989 32 x86 (IA-32) 1.6 M 600 nm 100 M 196 1 16 K cache
AMD Am386 1991 32 x86 (IA-32) 275 K 800 nm 40 M 132 1 clone of i386, lawsuit
Intel Pentium P5 1993 32 x86 (IA-32) 3 M 800 nm 60 M 273 1 16 K cache starts Pentium line with many to follow
AMD K5 1996 32 x86 (IA-32) 4.3 M 500 nm 133 M 296 1 24 K cache 1st in-house AMD CPU, compet. of Pentium
Intel Pentium II 1997 32 x86 (IA-32) 7 M 180 nm 450 M 240 1 512 K L2 cache, MMX
ARM7TDMI 1994 32 ARM 100 M 1 ARM core, e.g. GBA, PS2, Nokia 6110 ...
AMD Athlon 1000 Thunderbird 2000 32 x86 (IA-32) 37 M 180 nm 1 G 453 1 ~300 K cache 1st 1GHz+ CPU
RAD750 2001 32 PowerPC 10 M 150 nm 200 M 360 1 64 K cache radiation hard., space (Curiosity, ...)
AMD Opteron 2003 64 x86 (x86-64) 105 M 130 nm 1.6 G 940 1 ~1 M cache 1st 64 bit x86 CPU
Intel Pentium D 820 2005 64 x86 (x86-64) 230 M 90 nm 2.8 G 775 2 ~2 M cache 1st desktop multi core CPU
Intel Core i5-2500K 2011 64 x86 (x86-64) 1 B 32 nm 3.3 G 4 ~6 M cache
PicoRV32 2015? 32 RISC-V (RV32IMC) ~700 M simple, free hardware RISV-V core
Apple A9 2015 64 ARM (ARMv8) 2 B 14 nm 1.8 G 2 ~7 M cache iPhones
AMD Ryzen Threadrip. PRO 5995WX 2022 64 x86 (x86-64) 33 B 7 nm 4.5 G 4094 64 ~300 M cache, ME high end bloat