# Cache In the world of [hardware](hw.md) a *cache* is a very small but fast [computer](computer.md) [memory](memory.md) that helps make communication between computer components much more efficient (typically by making it much faster or taking less bandwidth) by remembering recent requests and answers so that they don't have to be expensively repeated; this principle is also extended to the world of [software](sw.md) programming where *cache* means a [file](file.md) (or several files) that remembers some recently retrieved data (such as websites) so that they can repeatedly be accessed faster. Here we will primarily focus on the original concept of hardware cache -- software caches basically just generalize them. The concept of cache memory is extremely important and one of the very basics for designing and [optimizing](optimization.md) [hardware](hardware.md) and [software](software.md) (as cache may be implemented both in hardware and software). A cache may also help prevent expensively recomputing results of [function](function.md)s in the same way, by remembering the recent results of the function (we may see this as a more abstract CPU-function communication). Though caches find wide use almost everywhere, without further specifying the context or type of cache the word *cache* most often refers to the [CPU](cpu.md) cache -- cache memory found in a CPU (nowadays in all PC CPUs, however still NOT in all [embedded](embedded.md) CPUs), which is typically further subdivided into multiple levels (L1, L2 etc.) -- here we will be using the term cache the same way, but keep in mind the principles apply everywhere and caches really are used in many places. Cache is not to be confused with a [buffer](buffer.md) (which also helps optimize communication but rather by means of creating bigger chunks to be transferred at once). **Basic principle**: cache can be seen as a [black box](black_box.md), "man in the middle" component that's placed in the line of communication between a CPU and main memory (RAM). (Physically it is nowadays part of the CPU itself, but we may imagine it as a separate component just sitting "on the wire" between CPU and RAM.) When reading from memory, we have a pretty simple situation -- once CPU requests something from the memory, the request first goes to the cache; if the cache has the result stored, it just quickly returns it -- we call this a **cache hit** (this is good, we saved time!). A **cache miss** happens when the cache doesn't have the result stored -- in such case the cache has to expensively forward the request to the memory and retrieve the data; usually the cache retrieves a whole smaller block of memory because it can be expected the CPU will access something in nearby memory in the near future (see the principle of locality below). When writing data to memory the situation is a bit more complex as the cache may choose different [strategies](strategy.md) of behavior: for simplicity it may just write the data through every time, but a more efficient (and also more complicated) approach is to just store the data for itself and write it to the main memory only when necessary (e.g. when it needs to load a different block of memory). Here we get into things such as cache coherence etc., which may cause pretty nasty [bug](bug.md)s and headaches. Programmers often try to [optimize](optimization.md) their programs by making them "cache friendly", i.e. they try to minimize long jumps in memory which causes a lot of cache misses and slows down the program. A typical example is e.g. storing image data in the order by which it will be written to the screen. A cache is related and/or exploits some observations and concepts related to computers such as: - **principle of locality**: Computers (/CPUs) tend to more often than not access data that are close to each other in memory, i.e. a CPU doesn't typically make [random](randomness.md) jumps in memory but rather e.g. reads a sequence of bytes one after another from an [array](array.md) or [struct](struct.md). For this reason when a CPU pulls something out of memory, there is a high probability of accessing an address that is nearby to this memory next time -- a cache helps us get ready for this by prefetching this nearby data and having it ready for very fast access. - **[memory](memory.md) hierarchy**: Mostly because of the principle of locality computer memory is divided into different levels, a chain of memories that get progressively further away from the CPU, increasing their size (decreasing price for capacity) as they get further away but also decreasing their speed. Here a cache can be seen as the closest memory to the CPU (except for the [registers](register.md)), i.e. being the smallest, most expensive but also fastest memory. By extension we can see that RAM can in many cases be seen as a "cache" for the hard drive, hard drive can be seen as "cache" for the network (after all web browsers ARE caching websites into files on the disk) etc. - **[dynamic programming](dynamic_programming.md)**: Dynamic programming is a programming technique revolving around remembering already calculated results so that we don't have to compute them again in the future -- this is basically what caches do, they remember results we obtained in relatively expensive way so that next time we can get them cheaper. - ... ``` _____ _____ ______ __________ | | _______ | | | | | | | CPU | <---> | cache | <---> | RAM | <---> | disk | <---> | Internet | |_____| """"""" |_____| |______| |__________| small big huge gigantic fast slowish super slow extremely slow ``` *Cache resides very close to the CPU within the memory hierarchy.* TODO: code