# CPU

WORK IN PROGRESS

Central processing unit (CPU, often just *processor*) is the main, most central part of a [computer](computer.md), the one that performs the computation by following the instructions of the main program; CPU can be seen as the computer's brain. It stands at the center of the computer design -- other parts, such as the main [memory](ram.md), [hard disk](hdd.md) and [input/output](io.md) devices like keyboard and monitor are present to serve the CPU, CPU is at the top and issues commands to everyone else. A CPU is normally composed of [ALU](alu.md) (arithmetic logic unit, the circuit performing calculations), CU ([control unit](control_unit.md), the circuit that directs the CPU's operation), a relatively small amount of memory (e.g. its registers, temporary buffers and [cache](cache.md), the main [RAM](ram.md) memory is NOT part of a CPU!) and possibly some other parts. A specific model of CPU is characterized by its [instruction set](isa.md) (ISA, e.g. [x86](x86.md) or [Arm](arm.md), which we mostly divide into [CISC](cisc.md) and [RISC](risc.md)), which determines the [machine code](machine_code.md) it will understand, then its [transistor](transistor.md) count (nowadays billions), operation [frequency](frequency.md) or **clock rate** (defining how many instructions per second it can execute, nowadays typically billions; the frequency can also be increased with [overclocking](overclocking.md)), number of cores (determining how many programs it can run in parallel) and also other parameters and "features" such as amount of cache memory, possible operation modes etcetc. We also often associate the CPU with some **number of [bits](bit.md)** (called e.g. *[word](word.md) size*) that's often connected to the data [bus](bus.md) width and the CPU's native integer size, i.e. for example a 16 bit CPU will likely have 16 bit integer registers, it will see the memory as a sequence of 16 bit words etc. (note the CPU can still do higher bit operations but they'll typically have to be emulated so they'll be slower, will take more instructions etc.) -- nowadays most mainstream CPUs are 64 bit (to allow ungodly amounts of RAM), but 32 or even 16 and 8 bits is usually enough for [good programs](lrs.md). CPU in form of a single small integrated circuit is called *microprocessor*. CPU is not to be confused with [MCU](mcu.md), a small single board computer which is composed of a CPU and other parts.

CPU is meant for **general purpose computations**, i.e. it can execute anything reasonably fast but for some tasks, e.g. processing HD video, won't reach near optimum speed, which is why other specialized processing units such as [GPU](gpu.md)s (graphics processing unit) and sound cards exist. As a general [algorithm](algorithm.md) executing unit CPU is made for executing **linear** programs, i.e. a series of instructions that go one after another; even though CPUs nowadays typically have multiple cores thanks to which they can run several linear programs in parallel, their level of parallelism is still low, not nearly as great as that of a GPU for example. However CPUs are [good enough](good_enough.md) for most things and they are extremely fast nowadays, so a [suckless](suckless.md)/[LRS](lrs.md) program will likely choose to only rely on CPU, knowing CPU will be present in any computer and so that our program will be [portable](portability.md).

Designs of CPUs differ, some may aim for very high performance while other ones may prefer low power consumption or low transistor count -- remember, a more complex CPU will require more [transistors](transistor.md) and will be more expensive! Of course it will also be harder to design, debug etc., so it may be better to [keep it simple](kiss.md) when designing a CPU. For this reason many CPUs, e.g. those in [embedded](embedded.md) [microcontrollers](mcu.md), intentionally lack cache, [microcode](microcode.md), multiple cores or even a complex instruction pipeline.

**WATCH OUT**: [modern](modern.md) mainstream CPUs (i.e. basically the desktop ones, soon probably mobile ones too) are [shit](shit.md), they are hugely [consumerist](consumerism.md), [bloated](bloat.md) (they literally include shit like [GPU](gpu.md)s and whole [operating systems](os.md), e.g. Intel's [ME](me.md) runs [Minix](minix.md)) and have built-in antifeatures such as [backdoor](backdoor.md)s (post 2010 basically all Intel and AMD CPUs, see Intel [Management Engine](me.md) and AMD [PSP](psp.md)) that can't be disabled and that allow remote infiltration of your computer by the CPU manufacturer (on hardware level, no matter what operating system you run). You are much better off using a simple CPU if you can ([older](old.md), [embedded](embedded.md) etc.).

## Details

TODO: diagrams, modes, transistor count history ...

Let's take a look at how a typical CPU works. Remember that anything may differ between CPUs, you can think of doing things differently and many real world CPUs do. Also we may simplify some things here, real world CPUs are complicated as hell.

**What does a CPU really do?** Basically it just reads instructions from the memory (depending on specific computer architecture this may be [RAM](ram.md) or [ROM](rom.md)) and does what they say -- these instructions are super simple, often things like "add two numbers", "write a number to memory" and so on. The instructions themselves are just [binary](binary.md) data in memory and their format depends on each CPU, or its **[instruction set](isa.md)** (basically a very low level language it understands) -- each CPU, or rather a CPU family, may generally have a different instruction set, so a program in one instruction set can't be executed by a CPU that doesn't understand this instruction set. The whole binary program for the CPU is called **[machine code](machine_code.md)** and machine code corresponds to **[assembly](assembly.md) language** (basically a textual representation of the machine code, for better readability by humans) of the CPU (or better said its instruction set). So a CPU can be seen as a hardware [interpreter](interpreter.md) of specific machine code, machine code depends on the instruction set and programmer can create machine code by writing a program in assembly language (which is different for each instruction set) and then using an assembler to translate the program to machine code. Nowadays mostly two instruction sets are used: [x86](x86.md) and [Arm](arm.md), but there are also other ones, AND it's still not so simple because each instruction set gets some kind of updates and/or has some extensions that may or may not be supported by a specific CPU, so it's a bit messy. For example [IA-32](ia_32.md) and [x86_64](x86_64.md) are two different versions of the x86 ISA, one 32 bit and one 64 bit.

The CPU has some internal state (we can see it as a [state machine](finite_state_machine.md)), i.e. it has a few internal variables, called **[registers](register.md)**; these are NOT variables in RAM but rather in the CPU itself, there is only a few of them (there may be let's say 32) but they are extremely fast. What exactly these registers are, what they are called, how many [bits](bit.md) they can hold and what their purpose is depends again on the instruction set architecture. However there are usually a few special registers, notably the **program counter** which holds the address of the currently executed instruction. After executing an instruction program counter is incremented so that in the nest step the next instruction will be executed, AND we can also modify program counter (sometimes directly, sometimes by specialized instructions) to jump between instruction to implement branching, loops, function calls etc.

So at the beginning (when powered on) the CPU is set to some initial state, most notably it sets its program counter to some initial value (depending on each CPU, it may be e.g. 0) so that it points to the first instruction of the program. Then it performs so called **fetch, decode, execute** cycle, i.e. it reads the instruction, decodes what it means and does what it says. In simpler CPUs this functionality is hard wired, however more complex CPUs (especially [CISC](cisc.md)) are programmed in so called [microcode](microcode.md), a code yet at the lower level than machine code, machine code execution is programmed in microcode -- microcode is something like "[firmware](firmware.md) for the CPU" (or a "CPU [shader](shader.md)"?), it basically allows later updates and reprogramming of how the CPU internally works. However this is pretty [overcomplicated](bloat.md) and you shouldn't make CPUs like this.

A CPU works in **clock cycles**, i.e. it is a sequential circuit which has so called *clock* input; on this input voltage periodically switches between high and low (1 and 0) and each change makes the CPU perform another operation cycle. How fast the clock changes is determined by the clock **frequency** (nowadays usually around 3 GHz) -- the faster the frequency, the faster the CPU will compute, but the more it will also heat up (so we can't just set it up arbitrarily high, but we can [overclock](overclocking.md) it a bit if we are cooling it down). WATCH OUT: **one clock cycle doesn't necessarily equal one executed instruction**, i.e. frequency of 1 Hz doesn't have to mean the CPU will execute 1 instruction per second because executing an instruction may take several cycles (how many depends on each instruction and also other factors). The number saying how many cycles an instruction takes is called CPI (cycles per instruction) -- CPUs try to aim for CPI 1, i.e. they try to execute 1 instruction per cycle, but they can't always do it.

One way to try to achieve CPI 1 is by optimizing the *fetch, decode, execute* cycle in hardware so that it's performed as fast as possible. This is typically done by utilizing an instruction **[pipeline](pipeline.md)** -- a pipeline has several stages that work in parallel so that when one instruction is entering e.g. the *decode* stage, another one is already entering the *fetch* stage (and the previous instruction is in *execute* stage), i.e. we don't have to wait for an instruction to be fully processed before starting to process the next one. This is practically the same principle as that of manufacturing lines in factories; if you have a long car manufacturing pipeline, you can make a factory produce let's say one car each hour, though it is impossible to make a single car from scratch in one hour (or imagine e.g. a university producing new PhDs each year despite no one being able to actually earn PhD in a year). This is also why branching (jumps between instructions) are considered bad for program performance -- a jump to different instruction makes the CPU have to throw away its currently preprocessed instruction because that will not be executed (though CPUs again try to deal with this with so called *branch prediction*, but it can't work 100%). Some CPUs even have multiple pipelines, allowing for execution of multiple instructions at the same time -- however this can only be done sometimes (the latter instruction must be independent of the former, also the other pipelines may be simpler and able to only handle simple instructions).

In order for a CPU to be useful it has to be able to perform some **[input/output](io.md)**, i.e. it has to be able to retrieve data from the outside and present what it has computed. Notable ways of performing I/O are:

- Through **memory**: here some parts of memory serve to pass data to the CPU and to retrieve computed results back. For example a keyboard may be mapped to memory so that when certain keys are pressed, the memory bits are set to 1 -- this way a CPU can simply read from memory and know if a key is pressed. Similarly a display may be mapped to memory so that when a CPU writes a value to this address, a pixel appears on the display. Note that his doesn't always have to PHYSICALLY pass through memory, there may be a special circuit that translate e.g. memory access in some address range to signals to hardware etc., but the CPU is using the same instructions it would use for interacting with memory. 
- Through **[GPIO](gpio.md) pins**: CPUs typically have pins that are reserved for general purpose input/output, i.e. we can electronically communicate through them with whatever device we physically connect to those pins. A CPU can set and read voltage to/from those pins e.g. with some special instructions. This may be convenient if we just want to e.g. light up some LED without having to somehow hook it to the main memory.
- **[Interrupts](interrupt.md)**: a CPU can be informed about an external event with an interrupt (see further on).

CPUs often also have a **[cache](cache.md)** memory that speeds up communication with the main memory (RAM, ROM, ...), though simpler CPUs may live even without cache of course. Mainstream CPUs even have several levels of cache, called L1, L2 etc. Caches are basically transparent for the programmer, they don't have to deal with them, it's just something that makes memory access faster, however a programmer knowing how a cache works can write code so as to be friendlier to the cache and utilize it better.

Mainstream consoomer CPUs nowadays have multiple **[cores](core.md)** so that each core can basically run a separate computation in parallel. The separate cores can be seen kind of like duplicate copies of the single core CPU with some connections between them (details again depend on each model), for example cores may share the cache memory, they will be able to communicate with each other etc. Of course this doesn't just magically make the whole CPU faster, it can now only run multiple computations at once, but someone has to make programs so as to make use of this -- typical use cases are e.g. [multitasking](multitasking.md) operating systems which can run different programs (or rather processes) on each core (note that multitasking can be done even with a single core by rapidly switching between the processes, but that's slower), or [multithreading](multithreading.md) programming languages which may run each thread on a separate core.

**[Interrupts](interrupt.md)** are an important concept for the CPU and for low level programming, they play a role e.g. in saving power -- high level programmers often don't know what interrupts are, to those interrupts can be likened to "event [callbacks](callback.md)". An interrupt happens on some kind of even, for example when a key is pressed, when timer ticks, when error occurred etc. (An interrupt can also be raised by the CPU itself, this is how operating system [syscalls](syscall.md) are often implemented). What kinds of interrupts there are depends on each CPU architecture (consult your datasheet) and one can usually configure which interrupts to enable and which "callbacks" to use for them -- this is often done through so called **[vector](vector.md) table**, a special area in memory that records addresses ("vectors") of routines (functions/subprograms) to be called on specified interrupts. When interrupt happens, the current program execution is paused and the CPU automatically jumps to the subroutine for handling the interrupt -- after returning from the subroutine the main program execution continues. Interrupts are contrasted with **[polling](polling.md)**, i.e. manually checking some state and handling things as part of the main program, e.g. executing an infinite loop in which we repeatedly check keyboard state until some key is pressed. However polling is inefficient, it wastes power by constantly performing computation just by waiting -- interrupts on the other hand are a hard wired functionality that just performs a task when it happens without any overhead of polling. Furthermore interrupts can make programming easier (you save many condition checks and memory reads) and mainly **interrupts allow CPU to go into sleep mode** and so save a lot of power. When a CPU doesn't have any computation to do, it can stop itself and go into waiting state, not executing any instructions -- however interrupts still work and when something happens, the CPU jumps back in to work. This is typically what the `sleep`/`wait` function in your programming language does -- it puts the CPU to sleep and sets a timer interrupt to wake up after given amount of time. As a programmer you should know that you should call this sleep/wait function in your main program loop to relieve the CPU -- if you don't, you will notice the **CPU utilization** (amount of time it is performing computations) will go to 100%, it will heat up, your computer starts spinning the fans and be noisy because you don't let it rest.

Frequently there are several **modes** of operation in a CPU which is typically meant for operating systems -- there will usually be some kind of privileged mode in which the CPU can do whatever it wants (this is the mode for the OS kernel) and a restricted mode in which there are restrictions, e.g. on which areas of memory can be accessed or which instructions can be used (this will be used for user program). Thanks to this a user program won't be able to crash the operating system, it will at worst crash itself. Most notably x86 CPUs have the *real mode* (addresses correspond to real, physical addresses) and *protected mode* (memory is [virtualized](virtual_memory.md), protected, addresses don't generally correspond to physical addresses).

A CPU may also have integrated some **[coprocessors](coprocessor.md)**, though sometimes coprocessors are really a separate chip. Coprocessors that may be inside the CPU include e.g. the FPU ([floating point](float.md) unit) or encryption coprocessor. Again, this will make the CPU a lot more expensive.

TODOOOOOOO: ALU, virtual memory, IP cores, architectures (register, ...), ...

## Notable CPUs

UNDER CONSTRUCTION

Here are listed some notable CPUs (or sometimes CPU families or cores).

{ I'm not so great with HW, suggest me improvements for this section please, thanks <3 ~drummyfish }

{ WTF, allthetropes has quite a big list of famous CPUs, isn't it a site about movies? https://allthetropes.org/wiki/Central_Processing_Unit. ~drummyfish }

TODO: add more, mark CPUs with ME, add features like MMX, FPU, ...

| CPU                           |year |bits (/a)| ISA            |~tr. c.|tr. size | freq.  | pins |cores| other               | notes                                   |
| ----------------------------- | --- | ------- | -------------- | ----- | ------- | ------ | ---- | --- | ------------------- | --------------------------------------- |
| Intel 4004                    |1971 | 4 / 12  | own            | 2.3 K | 10 um   | 75O K  | 16   | 1   |                     | 1st commercial microproc.               |
| Intel 8008                    |1972 | 8 / 14  | own            | 3.5 K | 10 um   | 800 K  | 18   | 1   |                     |                                         |
| Intel 8080                    |1974 | 8 / 16  | own            | 6 K   | 6 um    | 3 M    | 40   | 1   |                     |                                         | 
| AMD Am9080                    |1975 | 8 / 16  | own            | 6 K   | 6 um    | 4 M    | 40   | 1   |                     | reverse-eng. clone of i8080             |
| MOS Technology 6502           |1975 | 8 / 16  | own            | 3.5 K | 8 um    | 3 M    | 40   | 1   |                     | popular, cheap, Atari 2600, C64, ...    |
| Zilog Z80                     |1976 | 8 / 16  | own            | 8.5 K | 4 um    | 10 M   | 40   | 1   |                     | popular                                 |
| Intel 8086                    |1978 | 16 / 20 | x86 (x86-16)   | 29 K  | 3 um    | 10 M   | 40   | 1   |                     | started x86 ISA                         |
| Motorola 68000                |1979 | 32 / 24 | own (CISC)     | 68 K  |         |        | 64   | 1   |                     | popular, e.g. Amiga, Mega Drive, ...    |
| Intel (80)286                 |1982 | 16 / 24 | x86 (x86-16)   | 130 K | 1.5 um  | 25 M   | 68   | 1   |                     |                                         |
| Intel (80)386                 |1985 | 32      | x86 (IA-32)    | 275 K | 1 um    | 40 M   | 132  | 1   |                     |                                         |
| Intel (80)486                 |1989 | 32      | x86 (IA-32)    | 1.6 M | 600 nm  | 100 M  | 196  | 1   | 16 K cache, FPU     | 1st intel with cache and FPU            |
| AMD Am386                     |1991 | 32      | x86 (IA-32)    | 275 K | 800 nm  | 40 M   | 132  | 1   |                     | clone of i386, lawsuit                  |
| Intel Pentium P5              |1993 | 32      | x86 (IA-32)    | 3 M   | 800 nm  | 60 M   | 273  | 1   | 16 K cache          | starts Pentium line with many to follow |
| AMD K5                        |1996 | 32      | x86 (IA-32)    | 4.3 M | 500 nm  | 133 M  | 296  | 1   | 24 K cache          |1st in-house AMD CPU, compet. of Pentium |
| Intel Pentium II              |1997 | 32      | x86 (IA-32)    | 7 M   | 180 nm  | 450 M  | 240  | 1   | 512 K L2 cache, MMX |                                         |
| ARM7TDMI                      |1994 | 32      | ARM            |       |         | 100 M  |      | 1   |                     | ARM core, e.g. GBA, PS2, Nokia 6110 ... |
| AMD Athlon 1000 Thunderbird   |2000 | 32      | x86 (IA-32)    | 37 M  | 180 nm  | 1 G    | 453  | 1   | ~300 K cache        | 1st 1GHz+ CPU                           |
| RAD750                        |2001 | 32      | PowerPC        | 10 M  | 150 nm  | 200 M  | 360  | 1   | 64 K cache          | radiation hard., space (Curiosity, ...) |
| AMD Opteron                   |2003 | 64      | x86 (x86-64)   | 105 M | 130 nm  | 1.6 G  | 940  | 1   | ~1 M cache          | 1st 64 bit x86 CPU                      |
| Intel Pentium D 820           |2005 | 64      | x86 (x86-64)   | 230 M | 90 nm   | 2.8 G  | 775  | 2   | ~2 M cache          | 1st desktop multi core CPU              |
| Intel Core i5-2500K           |2011 | 64      | x86 (x86-64)   | 1 B   | 32 nm   | 3.3 G  |      | 4   | ~6 M cache, ME      |                                         |
| PicoRV32                      |2015?| 32      |RISC-V (RV32IMC)|       |         | ~700 M |      |     |                     | simple, free hardware RISV-V core       |
| Apple A9                      |2015 | 64      | ARM (ARMv8)    | 2 B   | 14 nm   | 1.8 G  |      | 2   | ~7 M cache          | iPhones                                 |
|AMD Ryzen Threadrip. PRO 5995WX|2022 | 64      | x86 (x86-64)   | 33 B  | 7 nm    | 4.5 G  | 4094 | 64  | ~300 M cache, PSP   | high end bloat                          |
| [Talos ES](talos_es.md)       |2023 | 8       | own (RISC)     |       |         |        |      |     |                     | simple but usable DIY free hardware CPU |

## See Also

- [GPU](gpu.md)
- [MCU](mcu.md)
- [WPU](wpu.md) (weird processing unit)